'How to merge rows in pandas with similar data
I got data with columns: startpoint, endpoint, data.
I want to merge the startpoint, endpoint rows if they contain the same data (to both directions) and add another columns of the extra data.
for example starting with:
| startpoint | endpoint | data |
|---|---|---|
| A | B | 1 |
| C | D | 2 |
| B | A | 3 |
| D | C | 4 |
TO:
| startpoint | endpoint | data_1 | data_2 |
|---|---|---|---|
| A | B | 1 | 3 |
| C | D | 2 | 4 |
Is there quick way to do it on pandas?
Thanks.
Solution 1:[1]
If I've understood your question correctly, the following code should do what you want -
data.index = [hash(frozenset([x,y])) for x, y in zip(data["startpoint"], data["endpoint"])]
result = data.groupby(data.index)["data"].apply(list).to_frame()
result = result["data"].apply(pd.Series)
result.columns = ["data1", "data2"]
result = pd.merge(data[["startpoint", "endpoint"]], result, left_index=True, right_index=True)
result = result[~result.index.duplicated(keep='first')]
The variable data is the original DataFrame. Please let me know if anything is unclear.
Solution 2:[2]
Your best bet is to use pd.merge(). Pandas official website shows how to use pd.merge() functions.
Solution 3:[3]
Get matching values values between startpoint and endpoint and do the necessary conversions for your data types. You can then sort the values in your dataframe and use a groupby:
# Matching values between startpoint + endpoint
df['start_end_grouped'] = [sorted(''.join(val).replace(',','')) for val in zip(df['startpoint'], df['endpoint'])]
# Conversions
df['data'] = df['data'].str.replace(',','').astype(float)
df['start_end_grouped'] = df['start_end_grouped'].astype(str)
# Result
df[['data','start_end_grouped']].sort_values(by=['start_end_grouped','data'])\
.groupby('start_end_grouped',as_index=False).agg(list)
start_end_grouped data
0 ['A', 'B'] [1.0, 3.0]
1 ['C', 'D'] [2.0, 4.0]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Green Noob |
| Solution 2 | kazi.tanvir |
| Solution 3 | sophocles |
