'DataFrame is highly fragmented

I have the following code, but when I run it I receive the error:

PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()

std_ltc = df["rolling_"+str(d)].std()
action_limit = [0.5, 1, 2, 3, 4, 5, 6]

df['num_cusum_sh_'+str(d)]=0
df['num_cusum_sl_'+str(d)]=0

for h in action_limit:
     #SH
     sh = "sh_dummy_ltc_"+str(d)+"_" + str(h)
     df[sh] = df["sh_ltc_"+str(d)]
     df[sh] = where(df["sh_ltc_"+str(d)] > (std_ltc * h),
                       int(1), df[sh].astype(int))
     df[sh] = where(df["sh_ltc_"+str(d)] < (std_ltc * h),
                       int(0), df[sh].astype(int))

     df['num_cusum_sh_'+str(d)]=df['num_cusum_sh_'+str(d)]+df[sh]

     # SL
     sl = "sl_dummy_ltc_"+str(d)+"_" + str(h)
     df[sl] = df["sl_ltc_"+str(d)]
     df[sl] = where(df["sl_ltc_"+str(d)] < (-std_ltc * h),
                       int(1), df[sl].astype(int))
     df[sl] = where(df["sl_ltc_"+str(d)] > (-std_ltc * h),
                       int(0), df[sl].astype(int))
        
     df['num_cusum_sl_'+str(d)]=df['num_cusum_sl_'+str(d)]+df[sl]

How do I avoid this annoing error?



Solution 1:[1]

Following the comments above

  • issue with creating multiple columns

I don't believe you need to create so many new temp columns - instead you can just update based on the data, and have a function which accepts a parameter h.

action_limit = [0.5, 1, 2, 3, 4, 5, 6]

num_cumsum_sl_col = f'num_cusum_sl_{d}'
num_cumsum_sh_col = f'num_cusum_sh_{d}'

df[num_cumsum_sh_col]=0
df[num_cumsum_sl_col]=0
df["std_ltc"] = df[f"rolling_{d}"].std()

def check_condition(x, h):
    ltc_sh_or_sl, std_ltc = x
    if std_ltc * h < ltc_sh_or_sl:
        res = 1
    elif std_ltc * h > ltc_sh_or_sl:
        res = 0
    else:
        # do we want this condition
        res = ltc_sh_or_sl  # is this an int?
    return res

for h in action_limit:
     #SH
     sh = f"sh_dummy_ltc_{d}_{h}"
     sl = f"sl_dummy_ltc_{d}_{h}"
     df[sh] = df[[f"sh_ltc_{d}", 'std_ltc']].apply(lambda x: check_condition(x, h), axis=1)
     df[sl] = df[[f"sl_ltc_{d}", 'std_ltc']].apply(lambda x: check_condition(x, h), axis=1)
     
     df[num_cumsum_sh_col] = df[num_cumsum_sh_col] + df[sh]
     df[num_cumsum_sl_col] = df[num_cumsum_sl_col] + df[sh]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1