'Create and trip report with end latitude and logitude

Please help, I have a data set structured like below

ss={'ride_id': {0: 'ride1',1: 'ride1',2: 'ride1',3: 'ride2',4: 'ride2',
  5: 'ride2',6: 'ride2',7: 'ride3',8: 'ride3',9: 'ride3',10: 'ride3'},
 'lat': {0: 5.616526,1: 5.623686, 2: 5.616555,3: 5.616556,4: 5.613834, 5: 5.612899,
  6: 5.610804,7: 5.616614,8: 5.644431,9: 5.650771, 10: 5.610828},

 'long': {0: -0.231901,1: -0.227248,2: -0.23192,3: -0.23168,4: -0.223812,
  5: -0.22869,6: -0.226193,7: -0.231461,8: -0.237549,9: -0.271337,10: -0.226157},

 'distance': {0: 0.0,1: 90.021,2: 138.0751,3: 0.0,4: 90.0041,5: 180.0293,6: 180.562, 7:0.0,8: 90.004,9: 180.0209,10: 189.0702},}

df=pd.DataFrame(ss)

the ride_id column indicates the number of trips taken in a window to make up the ride.

For example, ride1 consists of 2 trips, the first trip starts at index 0 and ends at index 1, then trip 2 starts at index 1 and ends at index 2.

I want to create a new data frame of trip reports, where each row will have the start coordinates (lat, long) and trip end coordinates(end_lat,end_long) taken from the next row and then distance. The results should look like the data frame below

sf={'ride_id': {0: 'ride1',1: 'ride1',2: 'ride2',3: 'ride2',4: 'ride2',},
 'lat': {0: 5.616526,1: 5.623686,2: 5.616556,3: 3.613834, 4: 5.612899},
 'long': {0: -0.231901,1: -0.227248,2: -0.23168,3: -0.223812,4: -0.22869},
 'end_lat':{0: 5.623686,1: 5.616555,2: 5.613834,3: 5.612899,4: 5.610804},
 'end_long':{0: -0.227248,1: -0.23192,2: -0.223812,3: -0.22869,4: -0.226193},
 'distance': {0: 90.02100,1: 138.07510,2: 90.00410,3: 180.02930,4: 180.5621},}


df_s=pd.DataFrame(sf)
df_s

OUT:
    ride_id   lat        long        end_lat     end_long   distance
0   ride1   5.616526    -0.231901   5.623686    -0.227248   90.0210
1   ride1   5.623686    -0.227248   5.616555    -0.231920   138.0751
2   ride2   5.616556    -0.231680   5.613834    -0.223812   90.0041
3   ride2   3.613834    -0.223812   5.612899    -0.228690   180.0293
4   ride2   5.612899    -0.228690   5.610804    -0.226193   180.5621

I tried to group the data frame by the ride_id to isolate each ride_id, but I'm stuck, any ideas are warmly welcomed.



Solution 1:[1]

We can do groupby with shift then dropna

df['start_lat'] = df.groupby('ride_id')['lat'].shift()
df['start_long'] = df.groupby('ride_id')['long'].shift()
df = df.dropna()
df
Out[480]: 
   ride_id       lat      long  distance  start_lat  start_long
1    ride1  5.623686 -0.227248   90.0210   5.616526   -0.231901
2    ride1  5.616555 -0.231920  138.0751   5.623686   -0.227248
4    ride2  5.613834 -0.223812   90.0041   5.616556   -0.231680
5    ride2  5.612899 -0.228690  180.0293   5.613834   -0.223812
6    ride2  5.610804 -0.226193  180.5620   5.612899   -0.228690
8    ride3  5.644431 -0.237549   90.0040   5.616614   -0.231461
9    ride3  5.650771 -0.271337  180.0209   5.644431   -0.237549
10   ride3  5.610828 -0.226157  189.0702   5.650771   -0.271337

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BENY