'How to concatenate the values of a dataframe along column axis and fill missing values?
I really stuck in this problem for a long time.
I have a data frame, I want to group the data based on the ids and then stick the values for each id together. However, since the number of the ids are not equal, I have to fill the missing values for those ids with same values of the day before (or the day after) of those ids. I previously ask this question and I got the response for that.
Here is the last question : Group the dataframe based on their ids and stick the values of ids to each other with the same values of the day before or the dayafter
Here is the code that I am using for my data.
df = pd.read_csv("df_1.csv")
val_cols = df.filter(like="value_").columns
DF_new = (
df.pivot("id", "date", val_cols)
.groupby(level=0, axis=1)
.apply(lambda x: x.ffill(axis=1).bfill(axis=1))
.sort_index(axis=1, level=1)
)
DF_new.columns = [f"val{i}" for i in range(DF_new.shape[1])]
DF_new = DF_new.reset_index()
DF_new
But, when I want to test this on my real data, it does not provide the result that I want. It should started with 1, 2, 3, 4, ... but, here it started from 1, 10, ...
Here is a sample of my data (also I will put the link in the comment)
I have provided a simple example as below, for example, for id=1, id=2 there are two dates, but for id=3, there are three days.
df = pd.DataFrame()
df['id'] = [1, 1, 2,2, 3, 3, 3]
df['date'] = ['2019-01-02', '2019-01-03', '2019-01-01','2019-01-02', '2019-01-01', '2019-01-02','2019-01-03']
df['val1'] = [10, 100, 20, 30, 40, 50, 60]
df['val2'] = [30, 30, -20, -30, -40,-50, -60 ]
df['val3'] = [50, 10, 120, 300, 140, 150, 160]
print(df)
id date val0 val1 val2
0 1 2019-01-02 10 30 50
1 1 2019-01-03 100 30 10
2 2 2019-01-01 20 -20 120
3 2 2019-01-02 30 -30 300
4 3 2019-01-01 40 -40 140
5 3 2019-01-02 50 -50 150
6 3 2019-01-03 60 -60 160
The desired output is:
val0 val1 val2 val3 val4 val5 val6 val7 val8
id
1 10.0 30.0 50.0 10.0 30.0 50.0 100.0 30.0 10.0
2 20.0 -20.0 120.0 30.0 -30.0 300.0 30.0 -30.0 300.0
3 40.0 -40.0 140.0 50.0 -50.0 150.0 60.0 -60.0 160.0
The provided code can work with this easy example.
Can anybody help me with that? Thanks
Solution 1:[1]
Here is one way to do it:
temp_df = pd.DataFrame(
{
"id": [j for i in sorted(df["id"].unique()) for j in [i] * 3],
"date": sorted(df["date"].unique()) * 3,
}
)
temp_df = (
pd.merge(left=temp_df, right=df, how="left", on=["id", "date"])
.sort_values(by=["id", "date"])
.drop(columns="date")
)
dfs = []
for i in sorted(df["id"].unique()):
dfs.append(
temp_df.loc[temp_df["id"] == i, :].fillna(method="bfill").fillna(method="ffill")
)
temp_df = pd.concat(dfs)
dfs = []
for i in sorted(df["id"].unique()):
rows = {j: row for j, row in temp_df.loc[temp_df["id"] == i, :].iterrows()}
dfs.append(pd.concat([pd.DataFrame(row).T for row in rows.values()], axis=1))
final_df = pd.concat(dfs)
final_df.columns = [
f"val{i}" if col != "id" else col for i, col in enumerate(final_df.columns)
]
final_df = final_df.loc[:, ~final_df.columns.duplicated()].drop_duplicates(keep="last")
final_df["id"] = final_df["id"].fillna(method="ffill").astype(int)
dfs = []
for i in sorted(final_df["id"].unique()):
dfs.append(
final_df.loc[final_df["id"] == i, :]
.fillna(method="bfill")
.fillna(method="ffill")
)
final_df = pd.concat(dfs).drop_duplicates().set_index("id")
final_df.columns = [f"val{i}" for i in range(final_df.shape[1])]
print(final_df)
# Output
val0 val1 val2 val3 val4 val5 val6 val7 val8
id
1 10.0 30.0 50.0 10.0 30.0 50.0 100.0 30.0 10.0
2 20.0 -20.0 120.0 30.0 -30.0 300.0 30.0 -30.0 300.0
3 40.0 -40.0 140.0 50.0 -50.0 150.0 60.0 -60.0 160.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Laurent |

