'Concatenating two fields which have multiple values separated by delimiter using pandas
I have source csv file with structure as:
Col1, Col2, Col3, Col4
123, John, ID1|ID2|ID3, Lisbon|Kite|Dam
456, Tree, ID4|ID9|ID10|ID11, Chart||Cam|Dare
543, Jam, ID12, Jar
I need to concatenate the col3 and col4 values by maintaining the sequence and order of pipe(delimiter). The CombinedCol should also be a pipe separated column like below:
Col1, Col2, CombinedCol
123, John, ID1:Lisbon|ID2:Kite|ID3:Dam
456, Tree, ID4:Chart|ID9|ID10:Cam|ID11:Dare
543, Jam, ID12:Jar
I am trying this using pandas but I am stuck and not able to iterate the field values at the column
df['CombinedCol']=df['Col3'].str.split('|')+':'+df['Col4'].str.split('|')
Solution 1:[1]
csv file :
Col1,Col2,Col3,Col4
123,John,ID1|ID2|ID3,Lisbon|Kite|Dam
456,Tree,ID4|ID9|ID10|ID11,Chart||Cam|Dare
543,Jam,ID12,Jar
Code:
>>> import pandas as pd
>>> df = pd.read_csv('/Users/rpghosh/python-examples/webscraping/weathermap/test1.csv')
>>> import numpy as np
>>> def f1():
r = np.empty(len(df), dtype=object)
for idx, i in enumerate(list(zip(df.Col3, df.Col4))):
lst_ids = i[0].split('|')
lst_vals = i[1].split('|')
s = []
for item in list(zip(lst_ids, lst_vals)):
s.append( item[0] + ':' + item[1] )
r[idx] = '|'. join(s)
#print(r)
return r
>>> f1()
array(['ID1:Lisbon|ID2:Kite|ID3:Dam', 'ID4:Chart|ID9:|ID10:Cam|ID11:Dare',
'ID12:Jar'], dtype=object)
>>> df['Combined'] = f1()
>>> print(df)
Col1 Col2 Col3 Col4 \
0 123 John ID1|ID2|ID3 Lisbon|Kite|Dam
1 456 Tree ID4|ID9|ID10|ID11 Chart||Cam|Dare
2 543 Jam ID12 Jar
Combined
0 ID1:Lisbon|ID2:Kite|ID3:Dam
1 ID4:Chart|ID9:|ID10:Cam|ID11:Dare
2 ID12:Jar
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rajarshi Ghosh |
