'Transfom values in dataset more quickly
I need to transform values above than 100 in 0, but, in the dataset that i need make that tranform has a 2 billions of values, and, this is the problem. I speed a lot of time to do that... (i need to do that transfomation 5 times).
I using a loop, for, with the function ".replace".
So, have any another function or idea to solve that problem?
Solution 1:[1]
Not entirely sure what do you want to do. Do you have a single array or tabular data? And if the latter, you want this to apply to all columns or just some of them?
Anyway, in case you have just an array :
a = np.array([10,100,101,301,10,43])
a[a>100] = 0
print(a)
# --> [ 10 100 0 0 10 43]
In case you have a dataframe:
df = pd.DataFrame({'a':np.arange(30,120,10),
'b':np.arange(50,59),
'c':np.arange(95,104),
'd':np.arange(101,110)})
If you want to apply to a single column:
df['a'][df['a'] > 100] = 0
If you want to apply it to more than one columns, one way is:
apply_to_cols = ['a','c']
def all_or_nothing(v):
if v > 100:
return 0
else:
return v
df[apply_to_cols] = np.vectorize(all_or_nothing)(df[apply_to_cols])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Thanos Natsikas |
