'How to create multiple dataframes from a single large dataframe using for loop
I have a large dataframe I need to split into many smaller dataframes:
import pandas as pd
from numpy import rec, nan
a = rec.array([(201901L, 'markers', 'biz', 'ROW', 3715.318182),
(201901L, 'markers', 'biz', 'US', 10781.5),
(201901L, 'markers', 'school', 'ROW', 565.0454546),
(201901L, 'markers', 'school', 'US', 3797.954546),
(201901L, 'pens', 'biz', 'ROW', 953.6363636),
(201901L, 'pens', 'biz', 'US', 5164.2727270000005),
(201901L, 'pens', 'school', 'ROW', 273.2272727),
(201901L, 'pens', 'school', 'US', 4414.0454549999995),
(201902L, 'markers', 'biz', 'ROW', 3738.15),
(201902L, 'markers', 'biz', 'US', 10734.3),
(201902L, 'markers', 'school', 'ROW', 556.55),
(201902L, 'markers', 'school', 'US', 3820.6),
(201902L, 'pens', 'biz', 'ROW', 932.6),
(201902L, 'pens', 'biz', 'US', 5303.45),
(201902L, 'pens', 'school', 'ROW', 255.2),
(201902L, 'pens', 'school', 'US', 4256.25),
(201903L, 'markers', 'biz', 'ROW', 4166.047619),
(201903L, 'markers', 'biz', 'US', 11258.33333),
(201903L, 'markers', 'school', 'ROW', 623.0476191),
(201903L, 'markers', 'school', 'US', 3522.428571),
(201903L, 'pens', 'biz', 'ROW', 984.8095238),
(201903L, 'pens', 'biz', 'US', 5430.857143),
(201903L, 'pens', 'school', 'ROW', 245.47619049999997),
(201903L, 'pens', 'school', 'US', 4885.285714),
(201904L, 'markers', 'biz', 'ROW', 3886.571429),
(201904L, 'markers', 'biz', 'US', 11340.19048),
(201904L, 'markers', 'school', 'ROW', 580.7142857000001),
(201904L, 'markers', 'school', 'US', 3611.666667),
(201904L, 'pens', 'biz', 'ROW', 1020.0476189999999),
(201904L, 'pens', 'biz', 'US', 5616.761904999999),
(201904L, 'pens', 'school', 'ROW', 303.6666667),
(201904L, 'pens', 'school', 'US', 6005.809524)],
dtype=[('shp_dt', '<i8'), ('prod', 'O'), ('customer_type', 'O'), ('cut', 'O'), ('revenue', '<f8')])
df_final = pd.DataFrame(a)
df_final.set_index('shp_dt')
I need to generate cuts from this dataframe for each of the fields prod, customer_type and cut thus creating 8 new dataframes but without writing out 8 separate filtering statements. I would also like the dataframe to be named for the specific cuts automatically.
For example:
markers_biz_US=df_final[(df_final['prod']=='markers') &
(df_final['customer_type']=='biz') & (df_final['cut']=='US')]
markers_school_US=df_final[(df_final['prod']=='markers') &
(df_final['customer_type']=='school') & (df_final['cut']=='US')]
markers_biz_ROW=df_final[(df_final['prod']=='markers') &
(df_final['customer_type']=='biz') & (df_final['cut']=='ROW')]
markers_school_ROW=df_final[(df_final['prod']=='markers') &
(df_final['customer_type']=='school') & (df_final['cut']=='ROW')]
...
The first dataframe markers_biz_US looks like this:
markers_school_US looks like this:
and so on.
Questions: (1) How do you write a for loop to generate all these cuts while applying multiple filters to the large dataset and naming the dataframes based on the applied filters?
(2) How do I create a list of all dataframes I've created?
Solution 1:[1]
for x,y,z in zip(df_final['prod'].unique(),
df_final['customer_type'].unique(),
df_final['cut'].unique()) :
locals()['df_' + x + '_' + y + '_' + z] = df_final[(df_final['prod'] == x) &
(df_final['customer_type'] == y) &
(df_final['cut'] == z)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jack homareau |



