'Pandas index match without merge
So I have two dataframes. activity_log records when a client logs in based on their client_id. A client_id can appear multiple times given the client logs in multiple times over a period.
I need to create a third column in this activity_log looks up the date that the client was created. This created_date is calculated by looking at the earliest user_created date in the user_table.
activity_log
| client_id | activity_date | created_date |
|---|---|---|
| 1 | 12/12/2022 | |
| 1 | 11/12/2022 | |
| 1 | 9/12/2022 | |
| 1 | 8/12/2022 | |
| 2 | 12/12/2022 | |
| 2 | 11/12/2022 | |
| 3 | 10/12/2022 | |
| 3 | 9/12/2022 |
user_table
| client_id | user_id | user_created |
|---|---|---|
| 1 | 12asdasd3 | 12/12/2021 |
| 1 | 1sads23 | 11/12/2021 |
| 1 | asasdsa2 | 10/12/2021 |
| 2 | 32asdasd1 | 12/12/2021 |
| 2 | 3asdasd21 | 11/12/2021 |
| 3 | 1asdsaa22 | 2/12/2021 |
I have tried using pandas merge
activity_log.merge(client_table[['client_id','user_created']], how='inner', on='client_id')
The problem with this is that I end up with a table that is grater in size than the original activity_log because the client_id appears multiple times in the activity_log and multiple times in the user_table.
I want to look up the client_id in the user_table, get the earliest user_created value and put that into the created_date column in the activity_log.
Any ideas on what else I need to do to achieve this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
