'Is there an optimize way to turn a large dataframe column of lists into multiple rows (Pandas)?

I'm working on a script with Jupyter Notebook to transform a dataframe column composed of lists into rows (for each 9 elements in the list i want a row with each element in a different column) and i need your help. Right now i managed to make it work for a few rows (like the first hundred or in my example with three rows with index 7160, 7161, 7162), but as soon as i do it on the whole dataframe (11535 rows x 6 columns with sometimes long lists) i got major performance issues and the script run indefinitely. Is there a way to optimize my code to make it work on every row of my (large) dataframe and not just a few ? i tried also with .iterrows but the result was the same.

To summarize i would like exactly the same behavior as my code now but on the dataframe in its entirety without running indefinitely or crashing.

Here are two screenshots to help you understand :

Right now my "df" is like this and my result "newdf is like this". My code :

newdf = pd.DataFrame(columns=('index','MCC','MNC','A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'Latitude', 'Longitude', 'Altitude'), dtype=object)

#for z in df.index: <= i want this but runs indefinitely...
for z in [7160,7161,7162]: #<= working smoothly for a few rows
    MCC = df['MCC'][z]
    MNC = df['MNC'][z]
    A = df['Latitude [°]'][z]
    B = df['Longitude [°]'][z]
    C = df['Altitude (m)'][z]
    E = list(chunks(df['List'][z],9))

    i = 0
    while i < len(E):
        j = 0
        values_to_add = {'index':z, 'MCC': MCC, 'MNC': MNC, 'A': E[i][0], 'B': E[i][1], 'C': E[i][2], 'D': E[i][3], 'E': E[i][4], 'F': E[i][5], 'G': E[i][6], 'H': E[i][7], 'I': E[i][8], 'Latitude': A, 'Longitude': B, 'Altitude': C}
        row_to_add = pd.Series(values_to_add, name=i)
        newdf = pd.concat([newdf,pd.DataFrame([row_to_add])])
        i = i + 1
newdf

Thank you so much in advance for your help.



Solution 1:[1]

You can try:

pd.DataFrame(df['List'].tolist()).join(df.drop(columns='List'))

The basic idea here is to use the column with lists as an input to a new dataframe, then put these columns back into the original dataframe (dropping the List column).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 CainĂ£ Max Couto-Silva