'N consecutive rows to form a dataframe columns in python pandas

I have the following dataframe with consecutive rows that needs to be transformed in columns

Input dataframe looks like the given below DF..

data5 = {
'R_T_D_L_X':["Route for v 0:",
 "0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
 "0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m",
"Load  10",""]
}

df5 = pd.DataFrame.from_dict(data5)

input dataframe output:

    df5
Out[52]: 
               R_T_D_L_X
0         Route for v 0:
1          0 0 C Load(0)
2                 31193m
3   Load of the route: 3
4                       
5         Route for v 1:
6        0 0 C 0 Load(0)
7                     0m
8   Load of the route: 0
9                       
10  Route for vehicle 2:
11        0 0 0 Load(10)
12                13406m
13              Load  10
14                  

Output and expected dataframe as below,

data6 = {
'R':["Route for v 0:","Route for v 1:","Route for v 2:"],
'T':["0 0 C Load(0)","0 0 C 0 Load(0)","0 0 0 Load(10)"],
"D":["31193m"," 0m","13406m",],
'L':["Load : 3","Load: 0","Load  10"],
'X':["","",""]
}

df6 = pd.DataFrame.from_dict(data6)

Results as expected:

    df6
Out[54]: 
                R                T       D         L X
0  Route for v 0:    0 0 C Load(0)  31193m  Load : 3  
1  Route for v 1:  0 0 C 0 Load(0)      0m   Load: 0  
2  Route for v 2:   0 0 0 Load(10)  13406m  Load  10  

Thanking you..



Solution 1:[1]

Use reshape with DataFrame constructor for new DataFrame:

df = pd.DataFrame(df5['R_T_D_L_X'].to_numpy().reshape(-1, 5), columns=['R','T','D','L','X'])
print (df)
                      R                T       D                     L X
0        Route for v 0:    0 0 C Load(0)  31193m  Load of the route: 3  
1        Route for v 1:  0 0 C 0 Load(0)      0m  Load of the route: 0  
2  Route for vehicle 2:   0 0 0 Load(10)  13406m              Load  10  

EDIT: If some values missing in the end of Series use:

data5 = {
'R_T_D_L_X':["Route for v 0:",
 "0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
 "0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m"]
}

df5 = pd.DataFrame.from_dict(data5)

arr = np.arange(len(df5))

df = df5.assign(a1=arr // 5, a2 = arr % 5).pivot('a1','a2','R_T_D_L_X')
print (df)
a2                     0                1       2                     3    4
a1                                                                          
0         Route for v 0:    0 0 C Load(0)  31193m  Load of the route: 3     
1         Route for v 1:  0 0 C 0 Load(0)      0m  Load of the route: 0     
2   Route for vehicle 2:   0 0 0 Load(10)  13406m                   NaN  NaN

Solution 2:[2]

Try this with some fancy dataframe reshaping.

import pandas as pd

data5 = {
'R_T_D_L_X':["Route for v 0:",
 "0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
 "0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m",
"Load  10",""]
}

df5 = pd.DataFrame.from_dict(data5)

grp = df5['R_T_D_L_X'].str.startswith('Route').cumsum()

df5.set_index([grp, df5.groupby(grp).cumcount()])\
   .unstack()\
   .droplevel(0, axis=1)\
   .set_axis(df5.columns[0].split('_'), axis=1)\
   .rename_axis(None)

Output:

                      R                T       D                     L X
1        Route for v 0:    0 0 C Load(0)  31193m  Load of the route: 3  
2        Route for v 1:  0 0 C 0 Load(0)      0m  Load of the route: 0  
3  Route for vehicle 2:   0 0 0 Load(10)  13406m              Load  10  

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Scott Boston