'DataFrame is highly fragmented
I have the following code, but when I run it I receive the error:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert
many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()
std_ltc = df["rolling_"+str(d)].std()
action_limit = [0.5, 1, 2, 3, 4, 5, 6]
df['num_cusum_sh_'+str(d)]=0
df['num_cusum_sl_'+str(d)]=0
for h in action_limit:
#SH
sh = "sh_dummy_ltc_"+str(d)+"_" + str(h)
df[sh] = df["sh_ltc_"+str(d)]
df[sh] = where(df["sh_ltc_"+str(d)] > (std_ltc * h),
int(1), df[sh].astype(int))
df[sh] = where(df["sh_ltc_"+str(d)] < (std_ltc * h),
int(0), df[sh].astype(int))
df['num_cusum_sh_'+str(d)]=df['num_cusum_sh_'+str(d)]+df[sh]
# SL
sl = "sl_dummy_ltc_"+str(d)+"_" + str(h)
df[sl] = df["sl_ltc_"+str(d)]
df[sl] = where(df["sl_ltc_"+str(d)] < (-std_ltc * h),
int(1), df[sl].astype(int))
df[sl] = where(df["sl_ltc_"+str(d)] > (-std_ltc * h),
int(0), df[sl].astype(int))
df['num_cusum_sl_'+str(d)]=df['num_cusum_sl_'+str(d)]+df[sl]
How do I avoid this annoing error?
Solution 1:[1]
Following the comments above
- issue with creating multiple columns
I don't believe you need to create so many new temp columns - instead you can just update based on the data, and have a function which accepts a parameter h
.
action_limit = [0.5, 1, 2, 3, 4, 5, 6]
num_cumsum_sl_col = f'num_cusum_sl_{d}'
num_cumsum_sh_col = f'num_cusum_sh_{d}'
df[num_cumsum_sh_col]=0
df[num_cumsum_sl_col]=0
df["std_ltc"] = df[f"rolling_{d}"].std()
def check_condition(x, h):
ltc_sh_or_sl, std_ltc = x
if std_ltc * h < ltc_sh_or_sl:
res = 1
elif std_ltc * h > ltc_sh_or_sl:
res = 0
else:
# do we want this condition
res = ltc_sh_or_sl # is this an int?
return res
for h in action_limit:
#SH
sh = f"sh_dummy_ltc_{d}_{h}"
sl = f"sl_dummy_ltc_{d}_{h}"
df[sh] = df[[f"sh_ltc_{d}", 'std_ltc']].apply(lambda x: check_condition(x, h), axis=1)
df[sl] = df[[f"sl_ltc_{d}", 'std_ltc']].apply(lambda x: check_condition(x, h), axis=1)
df[num_cumsum_sh_col] = df[num_cumsum_sh_col] + df[sh]
df[num_cumsum_sl_col] = df[num_cumsum_sl_col] + df[sh]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |