'How to create multiple dataframes from a single large dataframe using for loop

I have a large dataframe I need to split into many smaller dataframes:

import pandas as pd
from numpy import rec, nan

    a = rec.array([(201901L, 'markers', 'biz', 'ROW', 3715.318182),
     (201901L, 'markers', 'biz', 'US', 10781.5),
     (201901L, 'markers', 'school', 'ROW', 565.0454546),
     (201901L, 'markers', 'school', 'US', 3797.954546),
     (201901L, 'pens', 'biz', 'ROW', 953.6363636),
     (201901L, 'pens', 'biz', 'US', 5164.2727270000005),
     (201901L, 'pens', 'school', 'ROW', 273.2272727),
     (201901L, 'pens', 'school', 'US', 4414.0454549999995),
     (201902L, 'markers', 'biz', 'ROW', 3738.15),
     (201902L, 'markers', 'biz', 'US', 10734.3),
     (201902L, 'markers', 'school', 'ROW', 556.55),
     (201902L, 'markers', 'school', 'US', 3820.6),
     (201902L, 'pens', 'biz', 'ROW', 932.6),
     (201902L, 'pens', 'biz', 'US', 5303.45),
     (201902L, 'pens', 'school', 'ROW', 255.2),
     (201902L, 'pens', 'school', 'US', 4256.25),
     (201903L, 'markers', 'biz', 'ROW', 4166.047619),
     (201903L, 'markers', 'biz', 'US', 11258.33333),
     (201903L, 'markers', 'school', 'ROW', 623.0476191),
     (201903L, 'markers', 'school', 'US', 3522.428571),
     (201903L, 'pens', 'biz', 'ROW', 984.8095238),
     (201903L, 'pens', 'biz', 'US', 5430.857143),
     (201903L, 'pens', 'school', 'ROW', 245.47619049999997),
     (201903L, 'pens', 'school', 'US', 4885.285714),
     (201904L, 'markers', 'biz', 'ROW', 3886.571429),
     (201904L, 'markers', 'biz', 'US', 11340.19048),
     (201904L, 'markers', 'school', 'ROW', 580.7142857000001),
     (201904L, 'markers', 'school', 'US', 3611.666667),
     (201904L, 'pens', 'biz', 'ROW', 1020.0476189999999),
     (201904L, 'pens', 'biz', 'US', 5616.761904999999),
     (201904L, 'pens', 'school', 'ROW', 303.6666667),
     (201904L, 'pens', 'school', 'US', 6005.809524)], 
              dtype=[('shp_dt', '<i8'), ('prod', 'O'), ('customer_type', 'O'), ('cut', 'O'), ('revenue', '<f8')])
    
df_final = pd.DataFrame(a)
df_final.set_index('shp_dt')

enter image description here

I need to generate cuts from this dataframe for each of the fields prod, customer_type and cut thus creating 8 new dataframes but without writing out 8 separate filtering statements. I would also like the dataframe to be named for the specific cuts automatically.

For example:

 markers_biz_US=df_final[(df_final['prod']=='markers') &
 (df_final['customer_type']=='biz') & (df_final['cut']=='US')]

 markers_school_US=df_final[(df_final['prod']=='markers') &
 (df_final['customer_type']=='school') & (df_final['cut']=='US')]
 
 markers_biz_ROW=df_final[(df_final['prod']=='markers') &
 (df_final['customer_type']=='biz') & (df_final['cut']=='ROW')]

 markers_school_ROW=df_final[(df_final['prod']=='markers') &
 (df_final['customer_type']=='school') & (df_final['cut']=='ROW')]
 
 ...

The first dataframe markers_biz_US looks like this:

enter image description here

markers_school_US looks like this:

enter image description here

and so on.
Questions: (1) How do you write a for loop to generate all these cuts while applying multiple filters to the large dataset and naming the dataframes based on the applied filters? (2) How do I create a list of all dataframes I've created?



Solution 1:[1]

for x,y,z in zip(df_final['prod'].unique(),
                 df_final['customer_type'].unique(), 
                 df_final['cut'].unique()) :
    locals()['df_' + x + '_' + y + '_' + z] = df_final[(df_final['prod'] == x) &
                                                       (df_final['customer_type'] == y) & 
                                                       (df_final['cut'] == z)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jack homareau