'Pandas merge or join directly from read_csv

I have seen many examples of how to use merge.

Has anyone ever tried doing something like this?

df = pd.read(“data1.csv).merge(pd.read_csv(“data2.csv, how='inner', on='a'))

I’m going to try it but figured I’d ask here too...

If I could this, then I wouldn’t need to read in data1 and data2 separately and then perform the merge (creating 3 data frames....and if data1 and data2 are huge, then it is wasted memory if I can do everything in one step)



Solution 1:[1]

It looks like you can actually do this - I wonder if this can aid in memory management.

See below.

data1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']}) 
data2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
merged_data1 = pd.merge(data1, data2, on=['key1', 'key2'])

data1.to_csv("data1.csv")
data2.to_csv("data2.csv")
merged_data2 = pd.merge(pd.read_csv("data1.csv"), pd.read_csv("data2.csv"), on=['key1', 'key2'])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BuJay