'Group sorting longitudinal (panel) data depending on data and asset
I've been trying to sort my panel data (pandas DF) which is divided into several assets and different starting dates per each. The idea is to discover which starts the earliest, so that it can be placed in the beginning, and when it ends, the second earliest is stacked right after that.
See the data below (current situation):
| Date (index) | Feature | Asset_id |
|---|---|---|
| 01/01/1999 | feature_asset_1 | Asset_1 |
| 02/01/1999 | feature_asset_1 | Asset_1 |
| 03/01/1999 | feature_asset_1 | Asset_1 |
| 04/01/1999 | feature_asset_1 | Asset_1 |
...
| Date (cont'd) | Feature (cont'd) | Asset_id (cont'd) |
|---|---|---|
| 01/01/2020 | feature_asset_1 | Asset_1 |
| 02/01/2020 | feature_asset_1 | Asset_1 |
| 03/01/2020 | feature_asset_1 | Asset_1 |
| 01/01/1998 | feature_asset_2 | Asset_2 |
| 02/01/1998 | feature_asset_2 | Asset_2 |
| 03/01/1998 | feature_asset_2 | Asset_2 |
And the idea is to basically move Asset_2 to the beginning (since it starts earlier), like this:
| Date (index) | Feature | Asset_id |
|---|---|---|
| 01/01/1998 | feature_asset_2 | Asset_2 |
| 02/01/1998 | feature_asset_2 | Asset_2 |
| 03/01/1998 | feature_asset_2 | Asset_2 |
| 04/01/1999 | feature_asset_2 | Asset_2 |
...
| Date (cont'd) | Feature (cont'd) | Asset_id (cont'd) |
|---|---|---|
| 01/01/2020 | feature_asset_2 | Asset_2 |
| 02/01/2020 | feature_asset_2 | Asset_2 |
| 03/01/2020 | feature_asset_2 | Asset_2 |
| 01/01/1999 | feature_asset_1 | Asset_1 |
| 02/01/1999 | feature_asset_1 | Asset_1 |
| 03/01/1999 | feature_asset_1 | Asset_1 |
There are about 10 different assets, each begin on a different date. How do I make the sort, per date and asset id, so that it also takes into consideration the starting date of each asset? Sort_values by "Date" and "Asset_id", does not work because it will sort alphabetically.
Solution 1:[1]
You need to convert the column "Date" to datetime type before sorting it:
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(by=['Date','Asset_id'])
df
Result:
| index | Date | Feature | Asset_id |
|---|---|---|---|
| 2 | 1998-01-01 00:00:00 | feature_asset_2 | Asset_1 |
| 0 | 1999-01-01 00:00:00 | feature_asset_1 | Asset_1 |
| 1 | 2000-01-01 00:00:00 | feature_asset_1 | Asset_1 |
| 5 | 2001-01-01 00:00:00 | feature_asset_2 | Asset_1 |
| 3 | 2001-01-01 00:00:00 | feature_asset_1 | Asset_2 |
| 4 | 2001-01-01 00:00:00 | feature_asset_1 | Asset_3 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Zoe stands with Ukraine |
