'Pulling start date, end date, and mean quantity for unbalanced dataset

I have a dataset (seen in the image) that consists of cities (column "IBGE"), dates, and quantities (column "QTD"). I am trying to extract three things into a new column: start date per "IBGE", end date per "IBGE", and mean per "code".

Also, before doing so, should I change the index of my dataset?

The panel data is unbalanced, so different "IBGE" values have different start and end dates, and mean. How could I go about creating a new data frame with the following information separated in columns? I want the dataframe to look like this:

CODE Start End Mean QTD
10001 2020-01-01 2022-01-01 604
10002 2019-09-01 2021-10-01 1008
10003 2019-02-01 2020-12-01 568
10004 2020-03-01 2021-05-01 223
... ... ... ...
99999 2020-02-01 2022-04-01 9394

Here is how the dataset looks, namely "iver"

I am thinking that maybe a for while loop could potentially take that info, but I am not sure how to write the code.



Solution 1:[1]

Try with groupby and named aggregations:

#convert DATE column to datetime if needed
df["DATE"] = pd.to_datetime(df["DATE"])

output = df.groupby("IBGE").agg(Start=("DATE","min"),
                                End=("DATE","max"),
                                Mean_QTD=("QTD","mean"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 not_speshal