'Transfom values in dataset more quickly

I need to transform values above than 100 in 0, but, in the dataset that i need make that tranform has a 2 billions of values, and, this is the problem. I speed a lot of time to do that... (i need to do that transfomation 5 times).

I using a loop, for, with the function ".replace".

So, have any another function or idea to solve that problem?



Solution 1:[1]

Not entirely sure what do you want to do. Do you have a single array or tabular data? And if the latter, you want this to apply to all columns or just some of them?

Anyway, in case you have just an array :

a = np.array([10,100,101,301,10,43]) 
a[a>100] = 0
print(a)
# --> [ 10 100   0   0  10  43]

In case you have a dataframe:

df = pd.DataFrame({'a':np.arange(30,120,10),
               'b':np.arange(50,59),
               'c':np.arange(95,104),
               'd':np.arange(101,110)})

If you want to apply to a single column:

df['a'][df['a'] > 100] = 0

If you want to apply it to more than one columns, one way is:

apply_to_cols = ['a','c']

def all_or_nothing(v):
    if v > 100:
        return 0
    else:
        return v

df[apply_to_cols] = np.vectorize(all_or_nothing)(df[apply_to_cols])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Thanos Natsikas