'Appending iterated rows to new dataframe
I'm new to python, and I could use a push in the right direction for what I think should be (?) a pretty simple problem. I've got a dataframe (genres_df) with one column:
0
0 Horror
1 Comedy
2 Fantasy
3 Adventure
4 Drama
5 Animation
6 Crime
...
and a dataframe (df) with 3 columns--one for each genre associated with the film, and one row for each film I'm looking at):
0 1 2
0 Horror Short None
1 Horror Short None
2 Comedy Horror Short
3 Comedy Horror Short
4 Fantasy Horror Short
...
I want to count the number of rows in the dataframe genres that contain each item in genres_df. I was able to do this by hand, with a sum line:
sum(df[0] == 'Comedy') + sum(df[1] == 'Comedy') + sum(df[2] == 'Comedy')
I know this works, because there is a Horror item in each row, and there are 78471 rows in df.
I want to get a dataframe that has two columns: the genre (from genres_df) and the count of rows in which that genre appears, across any of the columns in df. Like so
0 1
0 Horror 78471
1 Comedy 9903
...
Here's what I've got so far:
df_counts = pd.DataFrame(columns = ['genre','count'])
for i in genres_df[0]:
s_row = pd.Series(i,sum(df[0]==i)+sum(df[1]==i)+sum(df[2]==i))
df_counts.append(s_row,ignore_index=True)
But this doesn't work. It seems to be the closest I've gotten, though. Help?
Solution 1:[1]
I think your solution is right but append like any other pandas operation does not mutate the dataframe by default, so you have to reassign it (or use inplace=True)
df_counts = pd.DataFrame(columns = ['genre','count'])
for i in genres_df[0]:
s_row = pd.Series(i,sum(df[0]==i)+sum(df[1]==i)+sum(df[2]==i))
df_counts = df_counts.append(s_row,ignore_index=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sergio PeƱafiel |
