'Same row 2 Matching Values in 2 different columns. Delete the second matching value
I would like to delete the second matching for each row. Column X1 is the column that we will be matching against, it's always the reference, we don't delete values from X1
Example (starting point) DataFrame df_client:
| Index |Name |email |city |X1 |X2 |X3 |X4 |X5 |
--- |--- |--- |--- |---|---|---|---|---|
| 0 |Mary |[email protected] |London |AB1|KD2|AB1| |CM2|
| 1 |john |[email protected] |Tokyo |LK2|LK2| |IG5| |
| 2 |karl |[email protected] |London |MK6| |MK6| |
| 3 |jasmin |[email protected]|Toronto|UH5|FG6|UH5| | |
| 4 |Frank |[email protected] |Paris |PO4| | |PO4|
| 5 |lee |[email protected] |Madrid |RT3|RT3|WS1| | |
I would like to compare the values X2,X3,X4,X5 always to X1 and that for each row.
When we find a matching value (e.g. row 0 I would like to delete AB1 from X3). In other words, we always keep the value in X1 and delete the matching value from X2 or X3 or X4 or X5.
I would like to add that it's a guarantee that each row will have a value in X2 or X3 or X4 or X5 that matches a value in X1:
The desired result will look like this:
|Index|Name |email |city |X1 |X2|X3 |X4 |X5 |
--- |--- |--- | ---|---|---|---|---|---
| 0 |Mary |[email protected] |London |AB1|KD2| | |CM2|
| 1 |john |[email protected] |Tokyo |LK2| | |IG5| |
| 2 |karl |[email protected] |London |MK6| | | | |
| 3 |jasmin|[email protected]|Toronto|UH5|FG6| | | |
| 4 |Frank |[email protected] |Paris |PO4| | | | |
| 5 |lee |[email protected] |Madrid |RT3|WS1| | |
It's not important but ideally, I would like to be able to move the values to the left if there are empty cells; something like this :
|Index|Name |email |city |X1 |X2 |X3 |X4 |X5 |
--- |--- | ---|--- |---|---|---|---|---
| 0 |Mary |[email protected] |London |AB1|KD2|CM2| | |
| 1 |john |[email protected] |Tokyo |LK2|IG5| | | |
| 2 |karl |[email protected] |London |MK6| | | | |
| 3 |jasmin|[email protected]|Toronto|UH5|FG6| | | |
| 4 |Frank |[email protected] |Paris |PO4| | | | |
| 5 |lee |[email protected] |Madrid |RT3|WS1| | | |
Moving the values to the left is really not important, if you can help me with just deleting the matching values that will be more than enough.
Thank you
Solution 1:[1]
You can do it with apply, a lambda function, and drop duplicates on axis=1 which operates on rows. This shifts the columns around, but you can store the order beforehand and reassign when you're done.
df = pd.DataFrame({'Name': {0: 'Mary', 1: 'john', 2: 'karl', 3: 'jasmin', 4: 'Frank', 5: 'lee'},
'email': {0: '[email protected]',
1: '[email protected]',
2: '[email protected]',
3: '[email protected]',
4: '[email protected]',
5: '[email protected]'},
'city': {0: 'London',
1: 'Tokyo',
2: 'London',
3: 'Toronto',
4: 'Paris',
5: 'Madrid'},
'X1': {0: 'AB1', 1: 'LK2', 2: 'MK6', 3: 'FG6', 4: 'PO4', 5: 'RT3'},
'X2': {0: 'KD2', 1: 'LK2', 2: 'MK6', 3: 'UH5', 4: 'PO4', 5: 'RT3'},
'X3': {0: 'AB1', 1: 'IG5', 2: None, 3: None, 4: None, 5: 'WS1'},
'X4': {0: 'CM2', 1: None, 2: None, 3: None, 4: None, 5: None},
'X5': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}})
col_order = df.columns
df = df.apply(lambda x: x.drop_duplicates(keep='first'), axis=1)
df = df[col_order]
Output df:
Name email city X1 X2 X3 X4 X5
0 Mary [email protected] London AB1 KD2 NaN CM2 NaN
1 john [email protected] Tokyo LK2 NaN IG5 None NaN
2 karl [email protected] London MK6 NaN None NaN NaN
3 jasmin [email protected] Toronto UH5 FG6 None None NaN
4 Frank [email protected] Paris PO4 NaN None NaN NaN
5 lee [email protected] Madrid RT3 NaN WS1 None NaN
You can do this if you want to shift the data over to the left. You will need to change the indexing on the last line to match how many columns you have left after some get removed when creating shift_vals.
shift_vals = df.apply(lambda x: x.dropna().tolist(), axis=1)
new_df = pd.DataFrame(shift_vals.to_list())
new_df.columns = df.columns[0:-2]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
