'How to delete empty spaces from pandas DataFrame rows until first populated field?
Lets say I imported a really messy data from a PFD and I´m cleaning it. I have something like this:
| Name | Type | Date | other1 | other2 | other3 | 
|---|---|---|---|---|---|
| Name1 | '' | '' | Type1 | '' | Date1 | 
| Name2 | '' | '' | '' | Type2 | Date2 | 
| Name3 | '' | '' | Type3 | Date3 | '' | 
| Name4 | '' | Type4 | '' | '' | Date4 | 
| Name5 | Type5 | '' | Date5 | '' | '' | 
And so on. As you can see, Type is always before date on each row, but I basically need to delete all '' (currently empty strings on the DataFrame) while moving everything to the left so they align with their respective Type and Date columns. Additionally, there's more columns to the right with the same problem, but for structural reasons I cant remove ALL '', the solution I´m looking for would just move 'everything to the left' so to speak (as it happens with pd.shift).
I appreciate your help.
Solution 1:[1]
What worked for me was:
while '' in df['Type'].unique():
    for i,row in df.iterrows():
        if row['Type'] == '':
            df.iloc[i, 1:] = df.iloc[i, 1:].shift(-1, fill_value='')
And the same for next column
Solution 2:[2]
data = df.values.flatten()
pd.DataFrame(data[data != ""].reshape(-1, 3), columns = ['Name','Type', 'Date'])
or:
pd.DataFrame(df.values[df.values != ""].reshape(-1, 3), columns = ['Name','Type', 'Date'])
output:
    Name    Type    Date
0   Name1   Type1   Date1
1   Name2   Type2   Date2
2   Name3   Type3   Date3
3   Name4   Type4   Date4
4   Name5   Type5   Date5
without reshape:
pd.DataFrame(df.apply(lambda x: (a:=np.array(x))[a != ""] , axis = 1).values.tolist())
or:
s = df[0].copy()
for col in df.columns[1:]:
    s += " " + df[col]
pd.DataFrame(s.str.split().values.tolist(), columns = ['Name','Type', 'Date'])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | |
| Solution 2 | 
