'Concat two CSVs with different number of rows returning warning
I need to concat two CSVs with a different number of rows. I receive a warning when I do so and the results in print(df1.head()) return an empty dataframe when I know by visual inspection that there are matching rows.
Could the warning affect the results reflected in head()? Or is there another issue I need to resolve?
# columns to filter on
columns_to_match = ['column1', 'column2', 'column3', 'column4', 'column5']
# find matching rows based off columns in filter
filter = df1[columns_to_match].eq(df2[columns_to_match]).all(axis=1)
# concat filtered results
df1 = pd.concat([df1[columns_to_match], df2[columns_to_match]])
# print matching results
print(df1.head())
I receive the following error:
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
df1 = pd.concat([df1[filter], df2[filter]])
Thank you.
Solution 1:[1]
Firstly, you have multiple uses for filter, which is confusing and error-prone. I believe that the warning arises because filter = df1[filter].eq(df2[filter]).all(axis=1) is a series aligned to df1. However you then try to subset df2 using the same mask series.
One way to fix it would be to have e.g. filter2 = df2[cols].eq(df1[cols]).all(axis=1) and then do
df1 = pd.concat([df1[filter], df2[filter2]])
(note that I am defining cols = ['column1', 'column2', 'column3', 'column4', 'column5'] here.
Edit - this should work:
# find matching rows based off columns in filter
filter1 = df1[columns_to_match].eq(df2[columns_to_match]).all(axis=1)
filter2 = df2[columns_to_match].eq(df1[columns_to_match]).all(axis=1)
# concat filtered results
df1 = pd.concat([df1[filter1], df2[filter2]])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
