'Iterating over a dataframe twice: which is the ideal way?
I am trying to create a dataframe for Sankey chart in Power BI which needs source and destination like this.
id | Source | Destination |
---|---|---|
1 | Starting a | next point b |
1 | next point b | final point c |
1 | final point c | end |
2 | Starting a | next point b |
2 | next point b | |
3 | Starting a | next point b |
3 | next point b | final point c |
3 | final point c | end |
I have a dataframe like this:
ID | flow |
---|---|
1 | Starting a |
1 | next point b |
1 | final point c |
2 | Starting a |
2 | next point b |
3 | Starting a |
3 | next point b |
3 | final point c |
I tried doing by iterating over the dataframe twice like below:
for index, row in df.iterrows():
for j, r in df.iterrows():
if row['ID'] == r['ID']:
if (index + 1 == j) & ("final point c" not in row['flow']):
df['Destination'][index] = df['flow'][j]
elif "final point c" in row['flow']:
df['Destination'][index] = 'End of flow'
Since it is iterating over the same dataframe twice, when the records are huge, it is taking a lot of time to process.
Is there any better way to do this? I tried looking at the all similar questions, but couldn't find anything that relates to my question.
Solution 1:[1]
You could use groupby
+shift
and a bit of masking:
end = df['flow'].str.startswith('final point')
df2 = (df.assign(destination=df.groupby('ID')['flow'].shift(-1)
.mask(end, end.map({True: 'end'}))
)
.rename(columns={'flow': 'source'})
)
output:
ID source destination
0 1 Starting a next point b
1 1 next point b final point c
2 1 final point c end
3 2 Starting a next point b
4 2 next point b NaN
5 3 Starting a next point b
6 3 next point b final point c
7 3 final point c end
Alternative with combine_first
to fill the NaNs:
end = df['flow'].str.startswith('final point').map({True: 'end', False: ''})
df2 = (df.assign(destination=df.groupby('ID')['flow'].shift(-1).combine_first(end))
.rename(columns={'flow': 'source'})
)
output:
ID source destination
0 1 Starting a next point b
1 1 next point b final point c
2 1 final point c end
3 2 Starting a next point b
4 2 next point b
5 3 Starting a next point b
6 3 next point b final point c
7 3 final point c end
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mozway |