'How to replace columns values based on substring order in a pandas dataframe [closed]
I have the following pandas DataFrame:
| A | B | C | D |
|---|---|---|---|
| N,M | N | 1 | N-D, M-D |
| N,M | M | 1 | N-D, M-D |
| N,M | N | 2 | N-D, M-D |
| N,M | M | 2 | N-D, M-D |
| X,Y | Y | 1 | X-D, Y-D |
| X,Y | X | 1 | X-D, Y-D |
And I need to simplify it to:
| B | C | D |
|---|---|---|
| N | 1 | N-D |
| M | 1 | M-D |
| N | 2 | N-D |
| M | 2 | M-D |
| Y | 1 | Y-D |
| X | 1 | X-D |
Basically in column D we need to keep the values that corresponds to the B column-index. It seems more like a logical problem. I've read that for iterations are not recomended because are really slow in pandas but that is the only approach I could think of. I accept any suggestions. Column A indicates the index order in D column values (if it is an N or M value in the first 4 rows, or an X/Y value in the last two). Thanks.
Solution 1:[1]
Please check this:
import pandas as pd
df1 = pd.DataFrame([['N,M', 'N', 1,'N-D, M-D'],['N,M', 'M', 1,'N-D, M-D'],
['N,M', 'N', 2,'N-D, M-D'],['N,M', 'M', 2,'N-D, M-D'],
['X,Y', 'Y', 1,'X-D, Y-D'],['X,Y', 'X', 1,'X-D, Y-D']],columns=['A','B','C','D'])
def f(x):
'''
Function to find the matching substring in column D w.r.t column B
'''
b,d = x[0],x[1]
d = d.split(',',1)
d = [d[0] if b in d[0] else d[1] if b in d[1] else None]
return d[0]
df1['D'] = df1[['B','D']].apply(f, axis=1)
df1 = df1.drop(columns='A',axis=1)
print(df1)
Solution 2:[2]
Use df.explode with Series.str.split:
In [1375]: x = df.assign(D=df.D.str.split(',')).explode('D')
In [1370]: output = x[x.B.eq(x.D.str.split('-').str[0])]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | prahasanam_boi |
| Solution 2 |
