'how to use list comprehension to subset the dataframe with the valuecounts

make     year
honda    2011
honda    2011
honda    n/a
toyota   2011
toyota   2022

Im trying to get list of the make that has value counts more than 2 below is code:

list = [I for I in df.make.unique() if df.loc[df.make==I, 'make'].value_counts()>2]

for some reason I get following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


Solution 1:[1]

vc = df['make'].value_counts()
vc[vc>2].index.to_list()

o/p:

['honda']

as for your error:

[I for I in df.make.unique() if (df.loc[df.make==I, 'make'].value_counts()>2).values[0]]

Solution 2:[2]

count is enough

lst = [I for I in df.make.unique() if df.loc[df.make==I, 'make'].count()>2]

You can also use

lst = df.value_counts('make')[df.value_counts('make')>2].index.tolist()
print(lst)
['honda']

Solution 3:[3]

here is another way to do it

df = data.groupby("make")['make'].count().to_frame(name='cnt').reset_index()
df[df.cnt > 2]['make'].to_list()

returning a list

['honda']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mayur Dhage
Solution 2 Ynjxsjmh
Solution 3 Naveed