'Create conditional column considering previous month's value
I have a dataframe with each customer's sale, basically I need to classify the customer in the month as churn (loss) or not.
For this, the condition is that, if the customer in the reference month (current) is inactive AND in the last month it was active, return "churn" because it is a lost customer, otherwise, return na.
I managed to do that, but the problem is that I can't create a structure so if the value in the previous month is null, go to the next month before.
For example, Paulo was on on 02/2021, and on 10/2021 he was off, so it should return "churn" and not "na".
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import timedelta
df = pd.DataFrame({'month_year':['jan/21', 'feb/21', 'mar/21', 'nov/21', 'oct/21', 'apr/21', 'apr/21', 'jan/21', 'jan/21'],
'name':['joao', 'paulo', 'joao', 'joao', 'paulo', 'joao', 'joao', 'paulo', 'paulo'],
'sale_value':[10000, 15000, 3000, 6000, 3000, 1500, 9000, 2000, 4000],
'status':['on', 'on', 'off', 'off', 'off', 'off', 'on', 'on', 'off']})
list_datetime = [datetime.strptime(x, '%b/%y') for x in list(df['month_year'])]
df['month_year'] = list_datetime
df = df.groupby(['month_year', 'name']).agg({'sale_value':'sum','status':'last'})
for df['name'] in list_datetime:
cond_churn = [
((df['status'] == 'on') & (df['status'].shift(periods=-1) == 'off')
| ((df['status'] == 'off') & (df['status'].shift(periods=-1) == 'off'))
| ((df['status'] == 'on') & (df['status'].shift(periods=-1) == 'on')))]
return_churn = ['na']
df['churn'] = np.select(cond_churn, return_churn, default='churn')
df.head(10)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
