'Comparing pandas data frame in groups based on condition in another dataframe

I have a series of grouped data in panada frame 1 like like - print(df_data)

group   id   Person item    amount
group_1 6339    ABC IceCream    16.87
group_1 6339    ABC Candy       71.31
group_1 6339    ABC Pen         9.83
group_1 6422    XYZ Candy       55.46
group_1 6422    XYZ IceCream    10.97
group_1 6422    XYZ Pen         3.53
group_1 6541    LMN Candy       6.16
group_1 6541    LMN IceCream    4.48
group_1 6541    LMN Pen         1.34
group_1 7185    DEF Candy       56.44
group_1 7185    DEF IceCream    8.23
group_1 7185    DEF Pen         5.71
group_2 6422    XYZ Pen         3.53
group_2 6541    LMN Candy       6.16
group_2 6541    LMN IceCream    4.48

another data frame contains rule for grouped data like print(df_rule)

Item      min   max
Candy     100   140
IceCream   20    50
Pen        15    40

So as per the rule the sum of Candy in a group cannot be more than 140 and cannot be less than 100

So when we check group_1 then we see it has

Candy      189.37
IceCream    40.55
Pen         20.41

So group_1 passes and group_2 fails. I want a similar logic that I can apply on df_data based on df_rule and finally, for those groups that pass this criterion I can store them in a data frame and discard those fails



Solution 1:[1]

I'm not sure what is your expected output but you could use a custom function per group to identify the non matching values:

s = df_rule.set_index('Item')
def between(g):
    ref = s.loc[g.name[1]]
    return ref['min'] <= g.sum() <= ref['max']

df_data.groupby(['group', 'item'])['amount'].apply(between)

output:

group    item    
group_1  Candy       False
         IceCream     True
         Pen          True
group_2  Candy       False
         IceCream    False
         Pen         False
Name: amount, dtype: bool

Assuming a group passes if all values are within the bounds, use:

df.groupby(['group', 'item'])['amount'].apply(between).groupby(level=0).all()

output:

group
group_1    False
group_2    False
Name: amount, dtype: bool

But here both fail.

If any value is fine (in this case use any), then group1 passes

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway