'Remove empty cells and move content to the left Python
I have an Excel file which contains the following structure with no header or index - I've added just the column header from Excel without existing in my table:
| A | B | C | D | E | ... | J | K |
|---|---|---|---|---|---|---|---|
| ABC | XLS | 1231341231 | 123123asdad1923 | NaN | NaN | ... | 123123asdad1923 |
| DEF | XLS | 1231231231 | 1231823asda9123 | NaN | askda213 | ... | 123123asdad1923 |
| XYZ | XLS | 1231231233 | 2138820394832sd | NaN | NaN | ... | asdasdq2ew12332 |
I would need that all the cells that contains NaN to be removed and while NaN, move the cells to the left.
Output should look like:
| A | B | C | D | E | ... | J | K |
|---|---|---|---|---|---|---|---|
| ABC | XLS | 1231341231 | 123123asdad1923 | 123123asdad1923 | |||
| DEF | XLS | 1231231231 | 1231823asda9123 | askda213 | ... | 123123asdad1923 | |
| XYZ | XLS | 1231231233 | 2138820394832sd | asdasdq2ew12332 |
I have found this similar question dropped on Stackoverflow, but the answer is not doing any change:
import pandas as pd
df = pd.read_excel('test.xlsx')
df = df.apply(lambda x: pd.Series(x.dropna().values))
df = df.fillna('')
df.to_excel("test_modified.xlsx", index=False, header=False)
Any ideas how can achieve this?
Thank you.
Solution 1:[1]
Ok, then I propose to transpose the dataset, process it, and transpose it back.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"A": ["ABC", "DEF", "XYZ"],
"B": ["XLS", "XLS", "XLS"],
"C": ["1231341231", "1231231231", "1231231233"],
"D": ["123123asdad1923", "1231823asda9123", "2138820394832sd"],
"E": [np.nan, np.nan, np.nan],
"F": [np.nan, "askda213", np.nan],
"I": ["blabla", "blabla", np.nan],
"K": ["123123asdad1923", "123123asdad1923", "asdasdq2ew12332"],
}
)
print(df)
# transpose
df_transposed = df.T
# move NaN at end of column
df_transposed = df_transposed.apply(lambda x: pd.Series(x.dropna().values))
# replace Nan by empty strings
df_transposed.replace(np.nan, "", inplace=True)
# transpose back
df = df_transposed.T
print()
print(df)
will output:
A B C D E F I K
0 ABC XLS 1231341231 123123asdad1923 NaN NaN blabla 123123asdad1923
1 DEF XLS 1231231231 1231823asda9123 NaN askda213 blabla 123123asdad1923
2 XYZ XLS 1231231233 2138820394832sd NaN NaN NaN asdasdq2ew12332
0 1 2 3 4 5 6
0 ABC XLS 1231341231 123123asdad1923 blabla 123123asdad1923
1 DEF XLS 1231231231 1231823asda9123 askda213 blabla 123123asdad1923
2 XYZ XLS 1231231233 2138820394832sd asdasdq2ew12332
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
