'Pandas merging/joining tables with multiple key columns and duplicating rows where necessary
I have several tables that contain lab results, with a 'master' table of sample data with things like a description. The results tables are also broken down by specimen (sub-samples). They contain multiple results columns - I'm just showing one here. I want to combine all the results tables into one dataframe, like this:
Table 1:
Location Sample Description
1 A Yellow
1 B Red
2 A Blue
2 B Violet
Table 2
Location Sample Specimen Result1
1 A X 5
1 A Y 6
1 B X 10
2 A X 1
Table 3
Location Sample Specimen Result2
1 A X "Heavy"
1 A Q "Soft"
1 B K "Grey"
2 B Z "Bananas"
Desired Output:
Location Sample Description Specimen Result1 Result2
1 A Yellow X 5 "Heavy"
1 A Yellow Y 6 nan
1 A Yellow Q nan "Soft"
1 B Red X 10 nan
1 B Red K nan "Grey"
2 A Blue X 1 nan
2 B Violet Z nan "Bananas"
I currently have a solution for this using iterrows() and df.append(), but these are both slow operations and when there are thousands of results it takes too long. Is there better way? I have tried using join() and merge() but I can't seem to get the result I want.
Quick code to reproduce my dataframes:
dict1 = {'Location': [1,1,2,2], 'Sample': ['A','B','A','B'], 'Description': ['Yellow','Red','Blue','Violet']}
dict2 = {'Location': [1,1,1,2], 'Sample': ['A','A','B','A'], 'Specimen': ['x', 'y','x', 'x'], 'Result1': [5,6,10,1]}
dict3 = {'Location': [1,1,1,2], 'Sample': ['A','A','B','B'], 'Specimen': ['x', 'q','k', 'z'], 'Result2': ["Heavy","Soft","Grey","Bananas"]}
df1 = pd.DataFrame.from_dict(dict1)
df2 = pd.DataFrame.from_dict(dict2)
df3 = pd.DataFrame.from_dict(dict3)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
