'how to count the duplicates in pandas?

just say I want to check for the duplicates in this df column:

df = pd.DataFrame(
           {"column_with_some_duplicates" : ['a', 'b', 'b', 'c', 'c']},
         index = [1, 2, 3, 4, 5])

in r I would check for duplicates like:

table(duplicated(df$column_with_some_duplicates))

which gives me a table of true and 'false' for the boolean result of duplicated. How can I view the same thing in pandas? Thanks.



Solution 1:[1]

to check if the column provides duplicate values, i will suggest to do a function

you could use the builtin set class, wich eliminates the duplicates, re-transforming it to a list, and then checking for equality:

def isduplicate(df,col):
    return list(set(df[col]))==list(col)

or you could just use the .duplicated() method:

def isduplicate(df,col):
    return df[col].duplicated()

a different approach would be to use the lenght of the unique elements.

def is_duplicate(df,col):
    return len(df[col].unique())<len(df[col])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1