'Concatenate data frames over finite index otherwise start a new column - pandas
I need to add new data to the last column of a data-frame, if this has any empty cells, or create a new column otherwise. I wonder if there is any pythonic way to achieve this through pandas functionalities (e.g. concact, join, merge, etc.). The example is as follows:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'0':[8, 9, 3, 5, 0], '1':[9, 6, 6, np.nan, np.nan]})
df2 = pd.DataFrame({'2':[2, 9, 4]}, index = [3,4,0])
desired_output = pd.DataFrame({'0':[8, 9, 3, 5, 0],
'1':[9, 6, 6, 2, 9],
'2':[4, np.nan, np.nan, np.nan, np.nan]})
# df1
0 1
0 8 9
1 9 6
2 3 6
3 5 NaN
4 0 NaN
# df 2
2
3 2
4 9
0 4
# desired_output
0 1 2
0 8 9 4
1 9 6 NaN
2 3 6 NaN
3 5 2 NaN
4 0 9 NaN
Solution 1:[1]
Your problem can be broken down into 2 steps:
- Contenate
df1anddf2based on their indexes. - For each row of the concatenated dataframe, move the
nanto the end.
Try this:
# Step 1: concatenate the two dataframes
result = pd.concat([df1, df2], axis=1)
# Step 2a: for each row, sort the elements based on their nan status
# For example: sort [1, 2, nan, 3] based on [False, False, True, False]
# np.argsort will return [0, 1, 3, 2]
# Stable sort is critical here since we don't want to swap elements whose
# sort keys are equal.
arr = result.to_numpy()
idx = np.argsort(np.isnan(arr), kind="stable")
# Step 2b: reconstruct the result dataframe based on the sort order
result = pd.DataFrame(np.take_along_axis(arr, idx, axis=1), columns=result.columns)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Code Different |
