'Reducing id combinations in pandas dataframe
I have a pandas dataframe that has combinations of two id columns such as this:
| ID1 | ID2 |
|---|---|
| A | B |
| A | C |
| A | D |
| A | E |
| B | C |
| B | D |
| B | E |
| C | D |
| C | E |
| D | E |
| F | H |
| I | K |
| K | J |
| G | F |
| G | H |
| I | J |
Here we have the choose 2 combinations for ABCD, FGH, IJK.
I would like to only keep the rows for the value with the most ID1's for a particular set. For ABCD this would be A, for FGH this would be G, and for IJK this would be I. Resulting in the following:
| ID1 | ID2 |
|---|---|
| A | B |
| A | C |
| A | D |
| A | E |
| I | K |
| G | F |
| G | H |
| I | J |
Solution 1:[1]
Calculate the count of unqiue values in ID1, then inside a list comprehension for each set calculate the index of maximum value, finally use these indices to filter the rows in dataframe
c = df['ID1'].value_counts()
i = [c.reindex([*s]).idxmax() for s in ['ABCB', 'FGH', 'IJK']]
df[df['ID1'].isin(i)]
ID1 ID2
0 A B
1 A C
2 A D
3 A E
11 I K
13 G F
14 G H
15 I J
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Shubham Sharma |
