'Which objects/if statements are slowing down this code the most?
I'm having an issue with scaling the code below. If I have two equal length datasets, one with 20 features the other with 70 features, the speed of processing using the below code takes drastically different amounts of time. I understand that I'm adding potentially millions of data points between the two, they will naturally take longer to process, but from a object/structural point of view is there a way to do the same processes but much quicker?
def generic_calc(df):
ar1 = np.zeros(len(df))
ar2 = np.zeros(len(df))
checks = columns_group1 + columns_group2
changes = ['chg1','chg2']
for c in range(len(columns_group2)):
changes.append('c{}_chg'.format(c+1))
for row in range(len(df)):
if any(df[checks].iloc[row])==0:
ar1[row] = 0
ar2[row] = df['Metric'][row] - df['Metric2'][row]
elif any(abs(df[changes].iloc[row]))>1:
ar1[row] = 0
ar2[row] = df['Metric1'][row] - df['Metric2'][row]
else:
ar1[row] = (df['Metric3'][row]-df['Metric4'][row])*df['Metric5'][row]
ar2[row] = (df['Metric6'][row]-df['Metric4'][row])*df['Metric7'][row]
for c in range(len(columns_group2)):
component = np.zeros(len(df))
for row in range(len(df)):
if df[checks].iloc[row].any()==0:
component[row] = 0
elif any(abs(df[changes].iloc[row]))>1:
component[row] = 0
else:
component[row] = (df['Metric8-{}'.format(c+1)][row]-df['Metric9{}'.format(c+1)][row])*df['QTY_shift'][row]
df[columns_group2[c]] = component
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
