'Pulling start date, end date, and mean quantity for unbalanced dataset

I have a dataset (seen in the image) that consists of cities (column "IBGE"), dates, and quantities (column "QTD"). I am trying to extract three things into a new column: start date per "IBGE", end date per "IBGE", and mean per "code".

Also, before doing so, should I change the index of my dataset?

The panel data is unbalanced, so different "IBGE" values have different start and end dates, and mean. How could I go about creating a new data frame with the following information separated in columns? I want the dataframe to look like this:

CODE	Start	End	Mean QTD
10001	2020-01-01	2022-01-01	604
10002	2019-09-01	2021-10-01	1008
10003	2019-02-01	2020-12-01	568
10004	2020-03-01	2021-05-01	223
...	...	...	...
99999	2020-02-01	2022-04-01	9394

Here is how the dataset looks, namely "iver"

I am thinking that maybe a for while loop could potentially take that info, but I am not sure how to write the code.

python r pandas dataframe

Solution 1:^[1]

Try with groupby and named aggregations:

#convert DATE column to datetime if needed
df["DATE"] = pd.to_datetime(df["DATE"])

output = df.groupby("IBGE").agg(Start=("DATE","min"),
                                End=("DATE","max"),
                                Mean_QTD=("QTD","mean"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	not_speshal

'Pulling start date, end date, and mean quantity for unbalanced dataset

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]