'Same row 2 Matching Values in 2 different columns. Delete the second matching value

I would like to delete the second matching for each row. Column X1 is the column that we will be matching against, it's always the reference, we don't delete values from X1

Example (starting point) DataFrame df_client:

| Index |Name   |email             |city   |X1 |X2 |X3 |X4 |X5 |
---     |---    |---               |---    |---|---|---|---|---|
| 0     |Mary   |[email protected]  |London |AB1|KD2|AB1|   |CM2|
| 1     |john   |[email protected]  |Tokyo  |LK2|LK2|   |IG5|   |
| 2     |karl   |[email protected]  |London |MK6|   |MK6|   |
| 3     |jasmin |[email protected]|Toronto|UH5|FG6|UH5|   |   |
| 4     |Frank  |[email protected] |Paris  |PO4|   |   |PO4|
| 5     |lee    |[email protected]   |Madrid |RT3|RT3|WS1|   |   |

I would like to compare the values X2,X3,X4,X5 always to X1 and that for each row.

When we find a matching value (e.g. row 0 I would like to delete AB1 from X3). In other words, we always keep the value in X1 and delete the matching value from X2 or X3 or X4 or X5.

I would like to add that it's a guarantee that each row will have a value in X2 or X3 or X4 or X5 that matches a value in X1:

The desired result will look like this:

|Index|Name  |email             |city   |X1  |X2|X3 |X4 |X5 |
 ---  |---   |---               |    ---|---|---|---|---|---
| 0   |Mary  |[email protected]  |London |AB1|KD2|   |   |CM2|
| 1   |john  |[email protected]  |Tokyo  |LK2|   |   |IG5|   |
| 2   |karl  |[email protected]  |London |MK6|   |   |   |   |
| 3   |jasmin|[email protected]|Toronto|UH5|FG6|   |   |   |
| 4   |Frank |[email protected] |Paris  |PO4|   |   |   |   |
| 5   |lee   |[email protected]   |Madrid |RT3|WS1|   |   |

It's not important but ideally, I would like to be able to move the values to the left if there are empty cells; something like this :

|Index|Name  |email             |city   |X1 |X2 |X3 |X4 |X5 |
 ---  |---   |               ---|---    |---|---|---|---|---
| 0   |Mary  |[email protected]  |London |AB1|KD2|CM2|   |   |
| 1   |john  |[email protected]  |Tokyo  |LK2|IG5|   |   |   |
| 2   |karl  |[email protected]  |London |MK6|   |   |   |   |
| 3   |jasmin|[email protected]|Toronto|UH5|FG6|   |   |   |
| 4   |Frank |[email protected] |Paris  |PO4|   |   |   |   |
| 5   |lee   |[email protected]   |Madrid |RT3|WS1|   |   |   |

Moving the values to the left is really not important, if you can help me with just deleting the matching values that will be more than enough.

Thank you

Solution 1:^[1]

You can do it with apply, a lambda function, and drop duplicates on axis=1 which operates on rows. This shifts the columns around, but you can store the order beforehand and reassign when you're done.

df = pd.DataFrame({'Name': {0: 'Mary', 1: 'john', 2: 'karl', 3: 'jasmin', 4: 'Frank', 5: 'lee'},
 'email': {0: '[email protected]',
  1: '[email protected]',
  2: '[email protected]',
  3: '[email protected]',
  4: '[email protected]',
  5: '[email protected]'},
 'city': {0: 'London',
  1: 'Tokyo',
  2: 'London',
  3: 'Toronto',
  4: 'Paris',
  5: 'Madrid'},
 'X1': {0: 'AB1', 1: 'LK2', 2: 'MK6', 3: 'FG6', 4: 'PO4', 5: 'RT3'},
 'X2': {0: 'KD2', 1: 'LK2', 2: 'MK6', 3: 'UH5', 4: 'PO4', 5: 'RT3'},
 'X3': {0: 'AB1', 1: 'IG5', 2: None, 3: None, 4: None, 5: 'WS1'},
 'X4': {0: 'CM2', 1: None, 2: None, 3: None, 4: None, 5: None},
 'X5': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}})

col_order = df.columns
df = df.apply(lambda x: x.drop_duplicates(keep='first'), axis=1)
df = df[col_order]

Output df:

    Name    email               city    X1  X2   X3     X4      X5
0   Mary    [email protected]    London  AB1 KD2  NaN    CM2     NaN
1   john    [email protected]    Tokyo   LK2 NaN  IG5    None    NaN
2   karl    [email protected]    London  MK6 NaN  None   NaN     NaN
3   jasmin  [email protected]  Toronto UH5 FG6  None   None    NaN
4   Frank   [email protected]   Paris   PO4 NaN  None   NaN     NaN
5   lee     [email protected]     Madrid  RT3 NaN  WS1    None    NaN

You can do this if you want to shift the data over to the left. You will need to change the indexing on the last line to match how many columns you have left after some get removed when creating shift_vals.

shift_vals = df.apply(lambda x: x.dropna().tolist(), axis=1)
new_df = pd.DataFrame(shift_vals.to_list())
new_df.columns = df.columns[0:-2]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Same row 2 Matching Values in 2 different columns. Delete the second matching value

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]