'For each value in a column, how to count the number of unique values in its row?
Suppose I have a data frame:
ID person_1 person_2
ID_001 Aaron Ben
ID_003 Kate Ben
ID_001 Aaron Lou
ID_005 Lee Ben
ID_006 Aaron Cassie
ID_001 Tim Ben
ID_003 Ben Mal
For every ID in the column "ID", I want to count the number of unique names that were associated with the ID
My desired output:
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2
Code for reproduction:
df = pd.DataFrame({
'ID': ["ID_001", "ID_003", "ID_001", "ID_005", "ID_006", "ID_001", "ID_003"],
'person1': ["Aaron","Kate","Aaron","Lee","Aaron","Tim","Ben"],
'person2': ["Ben","Ben","Lou","Ben","Cassie","Ben","Mal"]
})
Solution 1:[1]
Flat your columns person1 and person2 then remove duplicated names and finally count unique value per ID:
out = df.melt('ID').drop_duplicates(['ID', 'value']) \
.value_counts('ID').rename('Count').reset_index()
print(out)
# Output
ID Count
0 ID_001 4
1 ID_003 3
2 ID_005 2
3 ID_006 2
Solution 2:[2]
Melt your dataframe together, drop duplicates, then group by ID
and aggregate over the count of the variables. At least, rename the column variable to Count.
df.melt(["ID"]).drop_duplicates(["ID","value"]).groupby(
["ID"]).agg({"variable":"count"}).reset_index().rename(
columns={"variable":"Count"})
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 |
