'Pandas DataFrame - create new rows of data with condition
I have a pandas DataFrame with the following data and columns. I am trying to create new observations and rows of data.
import pandas as pd
df = pd.DataFrame({
'MSA': ['SF','SF','SF','SD','SD','SD','PHX','PHX','PHX'],
'STATE': ['CA','CA','CA','CA','CA','CA','AZ','AZ','AZ'],
'YR': [21, 21, 21, 21, 21, 21, 21, 21, 21],
'MO': [9, 10, 11, 9, 10, 11, 9, 10, 11],
'VR': [7.0, 8.0, 9.0, 5.0, 6.0, 7.0, 6.0, 7.0, 8.0],
'EMP': [10, 11, 12, 9, 10, 11, 7, 8, 9],
'GEO': ['P1','P1','P1','P2','P2','P2','P3','P3','P3']
})
I'd like to create new rows of data with a conditions for each column. Basically, for each MSA in a State, create new rows of data until the last completed month in current calendar year. So, this means create labels for YR, MO until 04 2022. For numeric columns VR and EMP, fill the rows with rolling average of last 3 observations. For GEO or any other cat variable simply repeat the labels for each new observations for the MSA-STATE group.
Note, that the obs. for 12-2021 are missing, so need to create them with 2022 values.
Solution 1:[1]
I am just learning to panda, so it is not an answer, but I was curious to solve the task... Sadly, - I have failed... but maybe my draft will lead to any ideas... this is my most recent test, - I will update it if I will manage to solve the "rolling mean" problem.
df2 = pd.concat([
pd.DataFrame(
{'STATE': row.STATE,
'MSA': row.MSA,
'YR': 22, # original DF is integer anyway
'MO': range(1,4), # original DF is integer anyway
'VR': row.VR, # cannot find the solution to get .rolling_mean() into this line
'EMP': row.EMP, # cannot find the solution to get .rolling_mean() into this line
'GEO': row.GEO
}
) for i, row in df.iterrows()
], ignore_index=True).loc[:, ['MSA', 'STATE', 'YR', 'MO', 'VR', 'EMP', 'GEO']]
df = pd.concat([df,df2], axis=0)
df
if OP will solve the task, please share the result, very curious....
Out:
MSA STATE YR MO VR EMP GEO
0 SF CA 21 10 7.0 10 P1
1 SF CA 21 11 8.0 11 P1
2 SF CA 21 12 9.0 12 P1
3 SD CA 21 10 5.0 9 P2
4 SD CA 21 11 6.0 10 P2
5 SD CA 21 12 7.0 11 P2
6 PHX AZ 21 10 6.0 7 P3
7 PHX AZ 21 11 7.0 8 P3
8 PHX AZ 21 12 8.0 9 P3
0 SF CA 22 1 7.0 10 P1
1 SF CA 22 2 7.0 10 P1
2 SF CA 22 3 7.0 10 P1
3 SF CA 22 1 8.0 11 P1
4 SF CA 22 2 8.0 11 P1
5 SF CA 22 3 8.0 11 P1
6 SF CA 22 1 9.0 12 P1
7 SF CA 22 2 9.0 12 P1
8 SF CA 22 3 9.0 12 P1
9 SD CA 22 1 5.0 9 P2
10 SD CA 22 2 5.0 9 P2
11 SD CA 22 3 5.0 9 P2
12 SD CA 22 1 6.0 10 P2
13 SD CA 22 2 6.0 10 P2
14 SD CA 22 3 6.0 10 P2
15 SD CA 22 1 7.0 11 P2
16 SD CA 22 2 7.0 11 P2
17 SD CA 22 3 7.0 11 P2
18 PHX AZ 22 1 6.0 7 P3
19 PHX AZ 22 2 6.0 7 P3
20 PHX AZ 22 3 6.0 7 P3
21 PHX AZ 22 1 7.0 8 P3
22 PHX AZ 22 2 7.0 8 P3
23 PHX AZ 22 3 7.0 8 P3
24 PHX AZ 22 1 8.0 9 P3
25 PHX AZ 22 2 8.0 9 P3
26 PHX AZ 22 3 8.0 9 P3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | NoobVB |
