'Amending dataframe from a generator that reads multiple excel files
My question ultimately is - is it possible to amend inplace each dataframe of a generator of dataframes?
I have a series of excel files in a folder that each have a table in the same format. Ultimately I want to concatenate each file into 1 large dataframe. They all have unique column headers but share the same indices (historical dates but may be across different time frames) so I want to concatenate the dataframes but aligned by their date. So I first created a generator function to create dataframes from each 'Data1' worksheet in the excel files
all_files = glob.glob(os.path.join(path, "*"))
df_from_each_file = (pd.read_excel(f,'Data1') for f in all_files) #generator comprehension
The below code is the formatting that needs to be done to each dataframe so that I can concatenate them correctly in my final line. I changed the index to the date column but there are also some rows that contain data that is not relevant.
def format_ABS(df):
df.drop(labels=range(0, 9), axis=0,inplace=True)
df.set_index(df.iloc[:,0],inplace=True)
df.drop(df.columns[0],axis=1,inplace=True)
However this doesn't work when I place the function within a generator comphrension (as i am amending all the dataframes inplace). The generator produced has no objects. Why doesn't the below line work? Is it because it can only loop through the generator once?
format_df = (format_ABS(x) for x in df_from_each_file)
but
format_df(next(df_from_each_file)
does work on each individual dataframe
The final product is then the below
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
I have gotten what I wanted by assigning index_col=0 in the pd.read_excel line but it go me thinking about generators and amending the dataframe in general.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
