'How to merge all excel files in subdirectory based on column?

I am trying to merge all excel files into a master file based on the 'id' column within a folder. Inside the main folder there are many subfolders containing the same 15 excel files in each subfolder. The excel files all have different columns (except for 'id') but have the same 15 excel files with different values per each subfolder.

screenshot of folder

I would like iterate through each subfolder and merge all excel files together. How should I merge these files? The id column in each excel file has some values that are the same and some that are different. I would like the same ones to be combined and the ones that aren't the same to be included into the master file.

This is what I have so far. I know I need to replace concat with merge, but not exactly sure how to include all unique 'id' values.

from pathlib import Path
import pandas as pd
import os


input_dir = Path.cwd() / "folder_name"

#store dataframes in a list
parts = []
for path in list(input_dir.rglob("*.xlsx*")):
    part = pd.read_excel(path)
    parts.append(part)
    
df = pd.concat(parts)

output_dir = Path.cwd() / "MasterFile"
output_dir.mkdir(exist_ok=True)
df.to_csv(output_dir / "masterfile.csv",index=False)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source