'How can I get the difference between two month-year dates in streamlit?

Here's the thing, I'm building a streamlit app to get the cohorts data. Just like explained here: https://towardsdatascience.com/a-step-by-step-introduction-to-cohort-analysis-in-python-a2cbbd8460ea. So, basically I'm now at the point where I have a dataframe with the cohort date (cohort), the number of customers that belongs to that cohort and are buying in that month (n_customers) and the month of the payment (order month). Now, I have to get a column with respect to the period number. What I mean is, I have this:

cohort        order_month        n_customers
2009-12       2009-12            1045
2009-12       2010-01            392
2009-12       2010-02            358
.
.
.

And I'm trying to get this:

cohort        order_month        n_customers    period_number
2009-12       2009-12            1045           0
2009-12       2010-01            392            1
2009-12       2010-02            358            2
.
.
.

The name of the dataframe is df_cohort.

So, in month 12/2009, there were 1045 customers from cohort 12/2009 buying something. In month 01/2010, there were 392 customers from cohort 12/2009 buying something. And so on. I need to create the column period_number in order to build my heatmap.

I tried running this:

df_cohort["period_number"] = (
        df_cohort - df_cohort
    ).apply(attrgetter("n"))

But I got this error:

AttributeError: 'Timedelta' object has no attribute 'n'

I needed to build the dataframe a little differently from the tutorial, that's why I have this error. Is there any way I can fix this from now on? Without changing something before, but only from this.

Regarding the data types of each column, both order_month and corhort are datetime64[ns].



Solution 1:[1]

have you tried to specify the columns? like

df_cohort['period_number'] = (df_cohort['invoice_month']-df_cohort['cohort']).apply(attrgetter('n'))

Thanks.

Solution 2:[2]

You can try do apply a function that creates those periods for example

def cohort_period(df):
    df['CohortPeriod'] = np.arange(len(df))+1
    return df

cohorts = cohorts.groupby(level=0).apply(cohort_period)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Marx Cerqueira
Solution 2 DharmanBot