'How to reassign values in column by condition in dataframe?

df = pd.DataFrame([["A", "AA", "AAA", "found"],
                   ["A", "AB", "ABA", "not found"],
                   ["A", "AB", "ABB", "not found"],
                   ["B", "BA", "BAA", "not found"],
                   ["B", "BB", "BBA", "not found"],
                   ["C", "CA", "CAA", "not found"],
                   ["C", "CB", "CBA", "found"],
                   ["C", "CB", "CBA", "not found"]], columns=["col_a", "col_b", "col_c", "where"])
  col_a col_b col_c      where
0     A    AA   AAA      found
1     A    AB   ABA  not found
2     A    AB   ABB  not found

3     B    BA   BAA  not found
4     B    BB   BBA  not found

5     C    CA   CAA  not found
6     C    CB   CBA      found
7     C    CB   CBA  not found
8     C    CB   CBB  not found

I need to reassign values based on col_a and col_b on condition: if in any row with unique combination in col_a and col_b is "found", assign it to every row in that combination independent on other columns (e.g. col_c)

I tried to cycle through it, it kinda works but is inelegant and inefficient.

unique_col_a = df["col_a"].unique()
for bus in unique_col_a:
    unique_pdus = df[df["col_a"] == bus]["col_b"].unique()
    for pdu in unique_pdus:
        bus_pdus_found = df[(df["col_a"] == bus) & (df["col_b"] == pdu) & (df["where"] == "found")]
        if len(bus_pdus_found) > 0:
            df.loc[(df["col_a"] == bus) & (df["col_b"] == pdu) & (df["where"] != "found"), "where"] = "found"

So expected result would be something like:

  col_a col_b col_c      where
0     A    AA   AAA      found
1     A    AB   ABA  not found
2     A    AB   ABB  not found

3     B    BA   BAA  not found
4     B    BB   BBA  not found

5     C    CA   CAA  not found
6     C    CB   CBA      found
7     C    CB   CBA      found
8     C    CB   CBB      found

Thanks for your help! Appreciate it very much



Solution 1:[1]

Assuming column where only contains the values found, not found, you can group the dataframe by col_a, col_b and transform where with min. This approach works because found is always less than not found alphabetically:

df['where'] = df.groupby(['col_a', 'col_b'])['where'].transform('min')

If the column where contains other values as well, here is a more general solution:

m = df['where'].eq('found').groupby([df['col_a'], df['col_b']]).transform('any')
df.loc[m, 'where'] = 'found'

Result

  col_a col_b col_c      where
0     A    AA   AAA      found
1     A    AB   ABA  not found
2     A    AB   ABB  not found
3     B    BA   BAA  not found
4     B    BB   BBA  not found
5     C    CA   CAA  not found
6     C    CB   CBA      found
7     C    CB   CBA      found
8     C    CB   CBB      found

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1