'Python: Lambda Function to Return Values Based on Multiple Conditions
I am trying to apply a lambda function to a DataFrame where I check multiple conditions.
The relevant columns in the DataFrame:
| shooter_id | shot_made | player_id_1 | total_points | free_throw_id | free_throw_made | time |
|---|---|---|---|---|---|---|
| NaN | NaN | 42 | NaN | NaN | NaN | 1 |
| 42 | True | 42 | 2 | NaN | NaN | 2 |
| 30 | True | 42 | 2 | NaN | NaN | 3 |
| NaN | NaN | 42 | NaN | 42 | True | 3 |
| NaN | NaN | 42 | NaN | 42 | False | 4 |
| 42 | True | 42 | 5 | NaN | NaN | 5 |
I want to add a column to the DataFrame that has the most recent total_points values, while also adding 1 to the rows where the free_throw_made = True, as the total_points does not reflect these...
| shooter_id | shot_made | player_id_1 | total_points | free_throw_id | free_throw_made | time | player_id_1_total_points |
|---|---|---|---|---|---|---|---|
| NaN | NaN | 42 | NaN | NaN | NaN | 1 | 0 |
| 42 | True | 42 | 2 | NaN | NaN | 2 | 2 |
| 30 | True | 42 | 2 | NaN | NaN | 3 | 2 |
| NaN | NaN | 42 | NaN | 42 | True | 3 | 3 |
| NaN | NaN | 42 | NaN | 42 | False | 4 | 3 |
| 42 | True | 42 | 5 | NaN | NaN | 5 | 5 |
I've tried a few different bits of code, but I can't work out the proper logic or syntax.
For example, I ran this:
def points(x):
if x == df['shooter_id'] & df['shot_made']:
return df['total_points']
elif x == df['free_throw_id'] & df['free_throw_made']:
df['total_points'] += 1
else:
return df['total_points']
df['player_id_1_total_points'] = df['player_id_1'].apply(lambda x: points(x))
Not only did this return an error (unsupported operand types float and bool), as I'm writing this I'm also realizing it would not return the most recent total_points value...
Any guidance would be extremely appreciated!
Solution 1:[1]
First group the dataframe by player_id_1 then for each group forward fill the values in total_points then add with free_throw_made, finally take the cumulative max to calculate most recent total points:
def func(g):
return (
g['total_points'].ffill()
.add(g['free_throw_made'], fill_value=0)
.fillna(0).cummax().to_frame()
)
df['player_id_1_total_points'] = df.groupby('player_id_1').apply(func)
Result
shooter_id shot_made player_id_1 total_points free_throw_id free_throw_made time player_id_1_total_points
0 NaN NaN 42 NaN NaN NaN 1 0.0
1 42.0 True 42 2.0 NaN NaN 2 2.0
2 30.0 True 42 2.0 NaN NaN 3 2.0
3 NaN NaN 42 NaN 42.0 True 3 3.0
4 NaN NaN 42 NaN 42.0 False 4 3.0
5 42.0 True 42 5.0 NaN NaN 5 5.0
Solution 2:[2]
You don't need a function, you can directly use vectorial code:
g = df.assign(free_throw_made=df['free_throw_made'].eq(True)).groupby('player_id_1')
df['player_id_1_total_points'] = (g['total_points'].ffill().fillna(0, downcast='infer')
+g['free_throw_made'].cumsum()
)
output:
shooter_id shot_made player_id_1 total_points free_throw_id \
0 NaN NaN 42 NaN NaN
1 42.0 True 42 2.0 NaN
2 30.0 True 42 2.0 NaN
3 NaN NaN 42 NaN 42.0
4 NaN NaN 42 NaN 42.0
5 42.0 True 42 5.0 NaN
free_throw_made time player_id_1_total_points
0 NaN 1 0
1 NaN 2 2
2 NaN 3 2
3 True 3 3
4 False 4 3
5 NaN 5 6
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Shubham Sharma |
| Solution 2 | mozway |
