'How to apply a function to iterative rows of each pandas groupby object without using a for loop? (or a faster method than for loop)

I have a function

def net_sale(df):
    if df['target'] == -1:
        return float(df['quantity1'] + df['quantity2'])

A groupby object

g = df.sort_values(['date'], ascending=True).groupby('groups-concatenated-string')

I would like to apply transformation "net_sale" to each group in g without using for loop.

The following code is my solution: It works on a smaller dataset (50rows) but takes an infinite amount of time (est years) to run on a dataframe of 800k+ rows.

for name, group in g:
    df['result_column'] = df.apply(net_sale, axis=1)

I am looking for a way to run this function "net_sale" to individual groups rows without having to use a for loop to iterate through the rows.

Sample dataframe:

    group   date    target   quant1   quant2   result_column
0    1      2018      0       10        NaN.      NaN
1    1      2018     -1        2        -2        0
2    2      2019     -1        3        -3        0
3    2      2019     -1        3        -1        2
4    2      2019      0       10        -1        9


Solution 1:[1]

Your current approach calls the apply method for every row in your loop, so the run time is probably at least quadratic to the number of rows since each time apply is called it iterates over the entire DataFrame. You could try something like this:

def net_sale(row):
    if row['target'] == -1:
        return float(row['quantity1'] + row['quantity2'])
    # do you want to return null if row['target'] != -1? otherwise you should define an else case

df["result_column"] = df.apply(net_sale, axis=1)

If you provide a more detailed example of how your data looks and what the required output should look like there probably are faster approaches.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1