'Check one column with strings and get sum of values from second column (pythonic way)

Given this data frame:

d = {'SITE':['AB', 'ON', 'YO', 'YO', 'AB'],
 'MARK':['ss', 'ss', 'tt', 'ss', 'tt'], 
 'SIZE': [4, 5, 2, 3, 4]} 

ex_df = pd.DataFrame(data=d) 

enter image description here

To get the column['SIZE'] sum for only the column['SITE'] == 'AB' one can slice the AB containing only df using AB_df = ex_df[ex_df.SITE == 'AB'] and then AB_df.SIZE.sum(), which is 8.

However, given a similar data frame with 10,000+ rows and 12 columns, and over 40 unique column['SITE'] strings.

Q1: How can you get the SIZE sum for each SITE without having to write 40 lines of the same code as above (changing the SITE name).

Q2: How can you add more conditions, such as check that if a condition matches two columns, the SITE & MARK, and then get the SIZE sum, without having to again write 40 lines of repetitive code.

I'd like to save the result either in a list containing the sums or dictionary with the site and sum {AB:8, ON:5, ...} or even a new data frame with that information.

I've tried to use a list of the 40 unique sites to iterate through the data frame column, but without success given length differences, etc.

I'm looking to make this pythonic ideally. Thanks!



Solution 1:[1]

Q1 Can be accomplished with a groupby in Pandas:

grouped_df = ex_df.groupby('SITE').agg({'SIZE': 'sum'}

To accomplish Q2, you likely need to implement a custom function to pass to the .agg call, something like:

def my_filter(df: pd.Dataframe):
    # Filters can be modified as needed
    return df[df['SITE'].startswith('A') & df['MARK'] == 'tt']['SIZE'].sum()

grouped_df = ex_df.groupby('SITE').agg(my_filter)

However, if your goal for Q2 is simply to group the rows by SITE AND MARK, you can do:

grouped_df = ex_df.groupby(['SITE', 'MARK']).agg({'SIZE': 'sum'})

Then you don't have to worry about writing a custom filtering function.

Solution 2:[2]

IIUC this should give you a sum of each SITE for each row:

ex_df['Max'] = ex_df.groupby(['SITE'])['SIZE'].transform(sum)

If not please clarify further for further assistance.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 whege
Solution 2 ArchAngelPwn