'pandas.query in a chain not giving expected results

I have a pandas dataframe that looks like this:

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4],
        "Name": ["Alpha", "Beta", "Gamma", "Delta"],
        "This": ["A|B", "A|B", None, "C"],
        "That": ["B", None, "C", None],
    }
)

I am trying to isolate the rows where there is a pipe character in This and where That isn't in This. Both This and That can be null.

Here's what I tried:

df.assign(
    requireThat=lambda x: x["This"].str.contains("|", regex=False).fillna(False)
)
.query("requireThat")
.assign(filledThat=lambda x: x["That"].fillna("WRONG"))
.query("filledThat not in This")

And I get:

ID Name This That requireThat filledThat
1 Alpha A|B B True B
2 Beta A|B None True WRONG

The first row is there and it shouldn't be. I should only get the second row. I'm not sure why that second query statement is failing. I suspected it was the pipe character, but even if I change it to a semicolon or some other character I get the same two rows in the output.

How can I check if the value in the That column is in the This value, row-by-row?

Edited to Add:

I have also tried

df.assign(
    requireThat=lambda x: x["This"].str.contains(";", regex=False).fillna(False),
)
.query("requireThat")
.assign(
    filledThat=lambda x: x["That"].fillna("WRONG"),
    thisList=lambda x: x["This"].str.split(";"),
)
.query("filledThat not in thisList")

and I still get the false positive. I even tried defining a filter function and pipe:

def filter_by_columns(df, itemCol, listCol):
    return df[~df[itemCol].isin(df[listCol])]

df.assign(
    requireThat=lambda x: x["This"].str.contains("|", regex=False).fillna(False),
)
.query("requireThat")
.assign(
    filledThat=lambda x: x["That"].fillna("WRONG"),
    thisList=lambda x: x["This"].str.split("|"),
)
.pipe(filter_by_columns, itemCol="filledThat", listCol="thisList")

And the false positive is still in the results.



Solution 1:[1]

This works!

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4],
        "Name": ["Alpha", "Beta", "Gamma", "Delta"],
        "This": ["A|B", "A|B", None, "C"],
        "That": ["B", None, "C", None],
    }
)

df.fillna('NaN', inplace=True)
df = df[df['This'].str.contains('|')]
df = df[df.apply(lambda x: x['That'] in x['This'], axis=1)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 William Rosenbaum