'Iterating over results of .itertuples() is too slow

df -> ["user_id", "num_posts", "posts" ...]

My df is made of rows containing data for reddit user-accounts; where for each row "posts" contains a series of separate posts by that user.

The number of posts ranges up to 6000 for certain users.

data = pd.DataFrame(columns=["user_id","posts"])
for row in df.itertuples():         
    for post in row[ : len(row[3])]:  
        new_row = [row[1], post ]
        data.loc[len(data)] = new_row

It seems the inner for-loop, that iterates over results from itertuples makes this terribly slow!

Even if I cap the maximum number of posts to be grabbed for a single user with 100, the code doesn't return for hours even running on a high powered remote server!

Any thoughts on how to improve the runtime?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source