'Get a new dataframe by not matching rows
I have two dataframes organised in this way:
df1
col_1 col_2 col_3
a banana red
b apple blue
c orange green
d
df2
col_1 col_2 col_3
a
b apple blue
c orange green
d
Both come from a dataframe that is complete in all its rows, the two dataframes above are the result of a filter that I applied in the column "col_2" and in the column "col_3" where I label NaN values everything that does not fit in the filter. I would like to compare the two dataframes obtained from the filter by isolating the rows that get the "Nan" label once I have applied a wider filter.
Example of an expected result:
[IN]df1.merge(df2, on= ["col_1", "col_2"])
[OUT]
col_1 col_2 col_3
a banana red
How can I do this? Thank you in advance for your reply
Let me explain better:
When I re-apply the filter on df1 by raising the thresholds, the values of those two columns tend to decrease. The original dataframe has about 50,000 rows without any null values. As I apply the filter to the original dataframe and raise the thresholds more and more, the null values in those two columns tend to increase, reducing the non-null values from 50,000 to 45,000 as I raise the thresholds. I am particularly interested in getting those 5,000 values that I lost from the preceding dataframe before I applied the filter. That is my goal.
Solution 1:[1]
I think you are trying to say : keep the rows that are unique (i.e., do not exactly match in all columns) between the 2 dataframes. In that case, a way to do that is to combine the 2 dataframes, and then remove all duplicate rows.
import pandas as pd
df1 = pd.DataFrame({
'c1': ['a', 'b', 'c', 'd'],
'c2': ['banana', 'apple', 'orange', None],
'c3': ['red', 'blue', 'green', None]})
df2 = pd.DataFrame({
'c1': ['a', 'b', 'c', 'd'],
'c2': [None, 'apple', 'orange', None],
'c3': [None, 'blue', 'green', None]})
print(df1)
print()
print(df2)
df_all_together = pd.concat([df1, df2])
df_unique_rows = df_all_together.drop_duplicates(subset=['c2', 'c3'], keep=False)
print(40*'-')
print(df_unique_rows)
c1 c2 c3
0 a banana red
1 b apple blue
2 c orange green
3 d None None
c1 c2 c3
0 a None None
1 b apple blue
2 c orange green
3 d None None
----------------------------------------
c1 c2 c3
0 a banana red
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
