'Count rolling window unique values with duplicate dates pandas

If I have a pandas DataFrame like this

date person_active
22/2 John
22/2 Marie
22/2 Mark
23/2 John
24/2 Mark
24/2 Marie

how do I count in a rolling window based on time the unique values in person_active, for example: 2 days rolling window, so it ends up like this:

date person_active people_active
22/2 John 3
22/2 Marie 3
22/2 Mark 3
23/2 John 3
24/2 Mark 3
24/2 Marie 3

The main issue here is that I have duplicate entries on date for each person so a simple df.rolling('2d',on='date').count() won't do the job.

EDIT: Please consider implementation in a big dataset and how the time to compute will scale, the solution needs to be ideally applicable in a real-world environment so if it takes too long to compute it's not that useful.



Solution 1:[1]

IIUC, try:

#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"], format="%d/%m")

#convert string name to categorical codes for numerical aggegation
df["people"] = pd.Categorical(df["person_active"]).codes

#compute the rolling unique count
df["people_active"] = (df.rolling("2D", on="date")["people"]
                         .agg(lambda x: x.nunique())
                         .groupby(df["date"])
                         .transform("max")
                       )

#drop the unneccessary column
df = df.drop("people", axis=1)

>>> df
        date person_active  people_active
0 1900-02-22          John            3.0
1 1900-02-22         Marie            3.0
2 1900-02-22          Mark            3.0
3 1900-02-23          John            3.0
4 1900-02-24          Mark            3.0
5 1900-02-24         Marie            3.0

Solution 2:[2]

Group by date, count unique values and then you're good to go:

df.groupby('date').nunique().rolling('2d').sum()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 not_speshal
Solution 2 Always Right Never Left