'How to keep observations for individuals who showed up for the first time in week t in the data
I have the following data-frame:
ID date X
0 A 2021-12-15 7
1 A 2022-01-30 6
2 A 2022-02-15 2
3 B 2022-01-30 2
4 B 2022-02-15 2
5 B 2022-02-18 7
6 C 2021-12-01 7
7 C 2021-12-15 4
8 C 2022-01-30 2
9 C 2022-02-15 7
10 D 2021-12-16 5
11 D 2022-01-30 4
12 D 2022-03-15 9
I want to keep the observations for those IDs who first showed up in week, say, 51 of the year (I would like to change this parameter in the future).
For example, IDs A and D showed up first in week 51 in the data, B didn't, C showed up in week 51, but not for the first time.
So I want to keep in this example only the data pertaining to A and D.
Solution 1:[1]
Filter if week match variable week and it is first time by ID in DataFrame by Series.duplicated, then get ID values:
week = 50
df['date'] = pd.to_datetime(df['date'])
s = df.loc[df['date'].dt.isocalendar().week.eq(week) & ~df['ID'].duplicated(), 'ID']
Or:
df1 = df.drop_duplicates(['ID'])
s = df1.loc[df1['date'].dt.isocalendar().week.eq(week) ,'ID']
print (s)
0 A
10 D
Name: ID, dtype: object
Last filter by ID with Series.isin and boolean indexing:
df = df[df['ID'].isin(s)]
print (df)
ID date X
0 A 2021-12-15 7
1 A 2022-01-30 6
2 A 2022-02-15 2
10 D 2021-12-16 5
11 D 2022-01-30 4
12 D 2022-03-15 9
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
