'how to count the duplicates in pandas?
just say I want to check for the duplicates in this df column:
df = pd.DataFrame(
{"column_with_some_duplicates" : ['a', 'b', 'b', 'c', 'c']},
index = [1, 2, 3, 4, 5])
in r I would check for duplicates like:
table(duplicated(df$column_with_some_duplicates))
which gives me a table of true and 'false' for the boolean result of duplicated. How can I view the same thing in pandas? Thanks.
Solution 1:[1]
to check if the column provides duplicate values, i will suggest to do a function
you could use the builtin set class, wich eliminates the duplicates, re-transforming it to a list, and then checking for equality:
def isduplicate(df,col):
return list(set(df[col]))==list(col)
or you could just use the .duplicated() method:
def isduplicate(df,col):
return df[col].duplicated()
a different approach would be to use the lenght of the unique elements.
def is_duplicate(df,col):
return len(df[col].unique())<len(df[col])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
