'pandas.query in a chain not giving expected results
I have a pandas dataframe that looks like this:
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4],
"Name": ["Alpha", "Beta", "Gamma", "Delta"],
"This": ["A|B", "A|B", None, "C"],
"That": ["B", None, "C", None],
}
)
I am trying to isolate the rows where there is a pipe character in This and where That isn't in This. Both This and That can be null.
Here's what I tried:
df.assign(
requireThat=lambda x: x["This"].str.contains("|", regex=False).fillna(False)
)
.query("requireThat")
.assign(filledThat=lambda x: x["That"].fillna("WRONG"))
.query("filledThat not in This")
And I get:
| ID | Name | This | That | requireThat | filledThat |
|---|---|---|---|---|---|
| 1 | Alpha | A|B | B | True | B |
| 2 | Beta | A|B | None | True | WRONG |
The first row is there and it shouldn't be. I should only get the second row. I'm not sure why that second query statement is failing. I suspected it was the pipe character, but even if I change it to a semicolon or some other character I get the same two rows in the output.
How can I check if the value in the That column is in the This value, row-by-row?
Edited to Add:
I have also tried
df.assign(
requireThat=lambda x: x["This"].str.contains(";", regex=False).fillna(False),
)
.query("requireThat")
.assign(
filledThat=lambda x: x["That"].fillna("WRONG"),
thisList=lambda x: x["This"].str.split(";"),
)
.query("filledThat not in thisList")
and I still get the false positive. I even tried defining a filter function and pipe:
def filter_by_columns(df, itemCol, listCol):
return df[~df[itemCol].isin(df[listCol])]
df.assign(
requireThat=lambda x: x["This"].str.contains("|", regex=False).fillna(False),
)
.query("requireThat")
.assign(
filledThat=lambda x: x["That"].fillna("WRONG"),
thisList=lambda x: x["This"].str.split("|"),
)
.pipe(filter_by_columns, itemCol="filledThat", listCol="thisList")
And the false positive is still in the results.
Solution 1:[1]
This works!
import pandas as pd
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4],
"Name": ["Alpha", "Beta", "Gamma", "Delta"],
"This": ["A|B", "A|B", None, "C"],
"That": ["B", None, "C", None],
}
)
df.fillna('NaN', inplace=True)
df = df[df['This'].str.contains('|')]
df = df[df.apply(lambda x: x['That'] in x['This'], axis=1)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | William Rosenbaum |
