'Pandas: Multiple rows of a data frame into a single cell
I would like to merge multiple rows of a dataframe into a single Cell, I already tried multiple ways to merge the required input into output format but couldn't able to succeed. below is my input format and desired out put format any help is much Appreciated!
Input Data
Desired Output Data
import pandas as pd
data = pd.DataFrame({"Text1": ["1_1" , "1_1" , "1_1" , "1_2" , "1_2" , "1-3" , "1-3" , "1-3" , "", "1-4" , "1-4" ],
"Text2": ["HI" , "HI" , "HI" , "Hello" , "Hello" , "Hiiii" , "" , "" , "How are You?" , "Hellooo" , ""],
"Text3": ["welcome", "Howdy", "greetings", "good day.", "good morning.", "welcome", "Howdy", "greetings", "" , "good day.", "good morning."],
"Text4": ["S11" , "S11" , "S11" , "S12" , "S12" , "S12" , "S13" , "S13" , "" , "S14" , "S14" ]})
df = pd.DataFrame(data, columns = ['Text1', 'Text2','Text4', 'Text4'])
print(df)
Solution 1:[1]
You need to forward fill the Text1 column first to replace the blanks, then group by that column and aggregate the other column.
import pandas as pd
import numpy as np
df = pd.DataFrame({"Text1": ["HI", "HI", "HI", "Hello", "Hello", "Hiiii", "", "", "How are You?", "Hellooo", ""],
"Text2": ["welcome", "Howdy", "greetings", "good day.", "good morning.", "welcome", "Howdy", "greetings", "", "good day.", "good morning."]})
result = df.groupby(df["Text1"].replace("", np.nan).ffill()).agg({"Text2": "/".join})
Solution 2:[2]
This should work
df['Text2'] = df[['Text1','Text2']].groupby(['Text1'])['Text2'].transform(lambda x: ','.join(x))
df[['Text1','Text2']].drop_duplicates()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |


