'How can I make this python code more efficient?
I realize this is an incredibly inefficient way to code this, so I'm hoping someone will have suggestions on a more efficient method.
Essentially I'm trying to create a column ("freq") with values of 0 for NA and "Nothing" objects and 1 otherwise. Sample df:
i obj freq
0. Nothing 0
1. Something 1
2. NaN 0
3. Something 1
for i in range(0,len(df)):
if str(df["obj"].iloc[i]) == "Nothing" or str(df["obj"].iloc[i]) == NaN:
d["freq"].iloc[i] = 0
else:
df["freq"].iloc[i] = 1
Solution 1:[1]
You can use np.where()
import pandas as pd
import numpy as np
df = pd.DataFrame({'obj': {0: 'Nothing', 1: 'Something', 2: np.nan, 3: 'Something'}})
df['freq'] = np.where((df['obj'] == 'Nothing') | (df['obj'].isnull()), 0, 1)
Solution 2:[2]
Without a dataframe is hard to check if works, but it should
indexer = (df['obj'] == 'Nothing') | (df['obj'].astype(str) == 'NaN')
df.loc[indexer, 'freq'] = 0
df.loc[~indexer, 'freq'] = 1
Solution 3:[3]
In this case, it is not even necessary to use numpy
:
df['freq'] = (~(df.obj.isnull() | (df.obj == 'Nothing'))) * 1
Note:
Is it useful to code with '0' and '1'? Can't we stay with the result of the boolean operation keeping the 'False' and True' values? If it is the case the answer would simply be:
df['freq'] = ~(df.obj.isnull() | (df.obj == 'Nothing'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Matthew Borish |
Solution 2 | DecowVR |
Solution 3 |