'pandas.query in a chain not giving expected results
I have a pandas dataframe that looks like this:
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4],
"Name": ["Alpha", "Beta", "Gamma", "Delta"],
"This": ["A|B", "A|B", None, "C"],
"That": ["B", None, "C", None],
}
)
I am trying to isolate the rows where there is a pipe character in This
and where That
isn't in This
. Both This
and That
can be null.
Here's what I tried:
df.assign(
requireThat=lambda x: x["This"].str.contains("|", regex=False).fillna(False)
)
.query("requireThat")
.assign(filledThat=lambda x: x["That"].fillna("WRONG"))
.query("filledThat not in This")
And I get:
ID | Name | This | That | requireThat | filledThat |
---|---|---|---|---|---|
1 | Alpha | A|B | B | True | B |
2 | Beta | A|B | None | True | WRONG |
The first row is there and it shouldn't be. I should only get the second row. I'm not sure why that second query statement is failing. I suspected it was the pipe character, but even if I change it to a semicolon or some other character I get the same two rows in the output.
How can I check if the value in the That
column is in the This
value, row-by-row?
Edited to Add:
I have also tried
df.assign(
requireThat=lambda x: x["This"].str.contains(";", regex=False).fillna(False),
)
.query("requireThat")
.assign(
filledThat=lambda x: x["That"].fillna("WRONG"),
thisList=lambda x: x["This"].str.split(";"),
)
.query("filledThat not in thisList")
and I still get the false positive. I even tried defining a filter function and pipe:
def filter_by_columns(df, itemCol, listCol):
return df[~df[itemCol].isin(df[listCol])]
df.assign(
requireThat=lambda x: x["This"].str.contains("|", regex=False).fillna(False),
)
.query("requireThat")
.assign(
filledThat=lambda x: x["That"].fillna("WRONG"),
thisList=lambda x: x["This"].str.split("|"),
)
.pipe(filter_by_columns, itemCol="filledThat", listCol="thisList")
And the false positive is still in the results.
Solution 1:[1]
This works!
import pandas as pd
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4],
"Name": ["Alpha", "Beta", "Gamma", "Delta"],
"This": ["A|B", "A|B", None, "C"],
"That": ["B", None, "C", None],
}
)
df.fillna('NaN', inplace=True)
df = df[df['This'].str.contains('|')]
df = df[df.apply(lambda x: x['That'] in x['This'], axis=1)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | William Rosenbaum |