'Pandas Data frame transformation after certain iterations [duplicate]
I have the following data frame
Col1 Col2 Col3 Col4
A01 CY5 1 2.53204725571834
A01 CY5 2 7.40293083315009
A01 CY5 3 0.089914886654128
A01 FAM 1 -132.859675789172
A01 FAM 2 -43.9435529112998
A01 FAM 3 -19.423769786752
H12 TEX 1 -125.964887745247
H12 TEX 2 -89.0624569094034
H12 TEX 3 -66.0548663619256
I have more than thousand such values. Also, I can get rid of third counting column in the output dataframe. All the values are tab separated. I want to transform this data into the following format:
Col1 Col2 Col3
A01 CY5 2.53204725571834 7.40293083315009 0.089914886654128
A01 FAM -132.859675789172 -43.9435529112998 -19.423769786752
H12 TEX -125.964887745247 -89.0624569094034 66.0548663619256
Solution 1:[1]
Use:
#filter values by `Col3` if necessary
df = df[df['Col3'].lt(4)].copy()
#join values by space
df2 = (df.groupby(['Col1','Col2'])['Col4']
.agg(lambda x: ' '.join(x.astype(str)))
.reset_index())
EDIT:
#filter values by `Col3` if necessary
df = df[df['Col3'].lt(4)].copy()
#pivoting
df3 = df.pivot(index=['Col1','Col2'],columns='Col3', values='Col4')
For oldier pandas versions:
df3 = df.set_index(['Col1','Col2', 'Col3'])['col4'].unstack().reset_index()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
