'Pandas filtering rows in one dataframe based on values in another data frame
New to pandas and trying to figure out filtering results from a dataframe based on data in another dataframe.
df1 df2
v1 v2 v1 v2
1 2 1 3
1 3 2 3
1 4 3 4
2 2
2 4
3 2
3 3
3 4
I have two dataframes: df1 and df2. I'm trying to create df3 which filters off results where if df1.v1 = df2.v1 and df1.v2 < df2.v2. This would yield df3 here:
df3
v1 v2
1 3
1 4
2 4
3 4
I'm not sure how to go about doing this. I'm thinking of maybe converting df2 to a dictionary (df.v1 are all unique).
Solution 1:[1]
merge on "v1" and use a boolean mask to filter:
merged_df = df1.merge(df2, on='v1', suffixes=('_df1', ''))
merged_df.loc[merged_df['v2_df1'] < merged_df['v2'], ['v1', 'v2']]
Output:
v1 v2
0 1 3.0
3 2 3.0
5 3 4.0
6 3 4.0
Solution 2:[2]
You can merge on v1, then filter on the v2 values and cleanup the intermediate column:
(df1.merge(df2, on='v1', suffixes=('_', ''))
.query('v2 <= v2_')
.drop(columns='v2_')
)
output:
v1 v2
1 1 3
2 1 3
4 2 3
7 3 4
Solution 3:[3]
You can merge your 2 dataframes on v1 and filter out:
df3 = df1.merge(df2, on='v1', suffixes=('_', '')) \
.query("v2 <= v2_").drop(columns='v2_')
print(df3)
# Output
v1 v2
1 1 3
2 1 3
4 2 3
7 3 4
Update
In case you don't have a strict equality between v1 columns, you can merge with how='cross' to create all combinations:
df3 = df1.merge(df2, how='cross', suffixes=('_', '')) \
.query("(v1 == v1_) & (v2 <= v2_)").drop(columns=['v1_', 'v2_'])
print(df3)
# Output
v1 v2
3 1 3
6 1 3
13 2 3
23 3 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
| Solution 2 | mozway |
| Solution 3 |
