'Check one column with strings and get sum of values from second column (pythonic way)
Given this data frame:
d = {'SITE':['AB', 'ON', 'YO', 'YO', 'AB'],
'MARK':['ss', 'ss', 'tt', 'ss', 'tt'],
'SIZE': [4, 5, 2, 3, 4]}
ex_df = pd.DataFrame(data=d)
To get the column['SIZE'] sum for only the column['SITE'] == 'AB' one can slice
the AB containing only df using AB_df = ex_df[ex_df.SITE == 'AB'] and then AB_df.SIZE.sum(), which is 8.
However, given a similar data frame with 10,000+ rows and 12 columns, and over 40 unique column['SITE'] strings.
Q1: How can you get the SIZE sum for each SITE without having to write 40 lines of the same code as above (changing the SITE name).
Q2: How can you add more conditions, such as check that if a condition matches two columns, the SITE & MARK, and then get the SIZE sum, without having to again write 40 lines of repetitive code.
I'd like to save the result either in a list containing the sums or dictionary with the site and sum {AB:8, ON:5, ...} or even a new data frame with that information.
I've tried to use a list of the 40 unique sites to iterate through the data frame column, but without success given length differences, etc.
I'm looking to make this pythonic ideally. Thanks!
Solution 1:[1]
Q1 Can be accomplished with a groupby in Pandas:
grouped_df = ex_df.groupby('SITE').agg({'SIZE': 'sum'}
To accomplish Q2, you likely need to implement a custom function to pass to the .agg call, something like:
def my_filter(df: pd.Dataframe):
# Filters can be modified as needed
return df[df['SITE'].startswith('A') & df['MARK'] == 'tt']['SIZE'].sum()
grouped_df = ex_df.groupby('SITE').agg(my_filter)
However, if your goal for Q2 is simply to group the rows by SITE AND MARK, you can do:
grouped_df = ex_df.groupby(['SITE', 'MARK']).agg({'SIZE': 'sum'})
Then you don't have to worry about writing a custom filtering function.
Solution 2:[2]
IIUC this should give you a sum of each SITE for each row:
ex_df['Max'] = ex_df.groupby(['SITE'])['SIZE'].transform(sum)
If not please clarify further for further assistance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | whege |
| Solution 2 | ArchAngelPwn |

