'Filtering a Pandas pivot table *during* the pivot
Let's assume we have the following data frame df:
df = pd.DataFrame({'food' : ['spam', 'ham', 'eggs', 'ham', 'ham', 'eggs', 'milk'],
'sales' : [10, 15, 12, 5, 14, 3, 8]})
I'd like to pivot this data to show the sum of sales by food, but only if sales is greater than 12. The resulting pivot table would look as follows:
Unfiltered df:
food sum(sales)
spam 10
ham 34
eggs 15
milk 8
Filtered df:
food sum(sales)
ham 34
eggs 15
I can use groupby() as follows:
df_new.groupby(['food'])['sales'].agg('sum') > 12
But, this only gives me the boolean and not the filtered df.
Is this possible to filter a column "on the fly" when using the pd.pivot_table() function? (i.e. without pre-filtering the df)
Solution 1:[1]
You can pass a lambda function .loc which will filter the dataframe for only rows that match the condition that the lambda function returns:
filtered = df.groupby('food')['sales'].sum().reset_index().loc[lambda x: x['sales'] > 12]
Output:
>>> filtered
food sales
0 eggs 15
1 ham 34
(In case you're wondering, the lambda function gets executed for the whole dataframe, not for each individual row, so yes, it's very efficient :)
Solution 2:[2]
groupby produces a series object.
It's not pretty, but you can subset it dynamically using:
df.groupby(['food'])['sales'].agg('sum')[df.groupby(['food'])['sales'].agg('sum')>12]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
| Solution 2 | Alison |
