'Appending iterated rows to new dataframe

I'm new to python, and I could use a push in the right direction for what I think should be (?) a pretty simple problem. I've got a dataframe (genres_df) with one column:

              0
0        Horror
1        Comedy
2       Fantasy
3     Adventure
4         Drama
5     Animation
6         Crime
...

and a dataframe (df) with 3 columns--one for each genre associated with the film, and one row for each film I'm looking at):

    0   1   2
0   Horror  Short   None
1   Horror  Short   None
2   Comedy  Horror  Short
3   Comedy  Horror  Short
4   Fantasy Horror  Short
...

I want to count the number of rows in the dataframe genres that contain each item in genres_df. I was able to do this by hand, with a sum line:

sum(df[0] == 'Comedy') + sum(df[1] == 'Comedy') + sum(df[2] == 'Comedy')

I know this works, because there is a Horror item in each row, and there are 78471 rows in df.

I want to get a dataframe that has two columns: the genre (from genres_df) and the count of rows in which that genre appears, across any of the columns in df. Like so

  0      1
0 Horror 78471
1 Comedy 9903
...

Here's what I've got so far:

df_counts = pd.DataFrame(columns = ['genre','count'])
for i in genres_df[0]:
    s_row = pd.Series(i,sum(df[0]==i)+sum(df[1]==i)+sum(df[2]==i))
    df_counts.append(s_row,ignore_index=True)

But this doesn't work. It seems to be the closest I've gotten, though. Help?



Solution 1:[1]

I think your solution is right but append like any other pandas operation does not mutate the dataframe by default, so you have to reassign it (or use inplace=True)

df_counts = pd.DataFrame(columns = ['genre','count'])
for i in genres_df[0]:
    s_row = pd.Series(i,sum(df[0]==i)+sum(df[1]==i)+sum(df[2]==i))
    df_counts = df_counts.append(s_row,ignore_index=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sergio PeƱafiel