'Can I use Pandas groub by while dropping first and last entries in each group?

I have a pandas dataframe that represents the trips I have taken for work. Each row is a single trip, with a column for the date and the number of kilometers traveled.

I get reimbursed on a per kilometer basis for every trip besides the first and the last of each day (these are considered ordinary travel to and from work).

So my data frame looks something like this:

day, distance
1, 5
1, 2
1, 7
2, 11
2, 11
3, 4
3, 10
3, 5
3, 12

I would like to add a column in here that flags all but the first and last trips of the day. Such as:

day, distance, claimable
1, 5, 0
1, 2, 1
1, 7, 0
2, 11, 0
2, 11, 0
3, 4, 0
3, 10, 1
3, 5, 1
3, 12, 0

Assuming I have a dataframe with the columns above is there a way to do something like this:

import pandas as pd

df = pd.DataFrame({'day':(1,1,1,2,2,3,3,3,3),
                   'dist':(5,2,7,11,11,4,10,5,12),
                  },)
df['claim'] = 0

# set the value of the "claimable" column to 1 on all 
# but the first and last trip of the day

df.groupby("day").nth(slice(1,-1)).loc[:, "claim"] = 1

Solution 1:^[1]

You can use transform

df = pd.DataFrame({
    'day':(1,1,1,2,2,3,3,3,3),
    'dist':(5,2,7,11,11,4,10,5,12),
})

def is_claimable(group):
    claim = np.ones(len(group), dtype='int8')
    claim[[0, -1]] = 0
    return claim

df['claim'] = df.groupby("day")['dist'].transform(is_claimable)

Output:

>>> df

   day  dist  claim
0    1     5      0
1    1     2      1
2    1     7      0
3    2    11      0
4    2    11      0
5    3     4      0
6    3    10      1
7    3     5      1
8    3    12      0

Solution 2:^[2]

You can do transform and take the first and last position

g  = df.reset_index().groupby('day')['index']
con = (df.index == g.transform('first')) | (df.index == g.transform('last'))
df['new'] = (~con).astype(int)
df
Out[117]: 
   day  dist  new
0    1     5    0
1    1     2    1
2    1     7    0
3    2    11    0
4    2    11    0
5    3     4    0
6    3    10    1
7    3     5    1
8    3    12    0

Solution 3:^[3]

if you sort your dataframe by day column:

df['claim'] = (df['day'].eq(df['day'].shift()) & 
               df['day'].eq(df['day'].shift(-1))).astype(int)

Diff version @ Rodalm

(df['day'].diff().eq(0) & df['day'].diff(-1).eq(0)).astype(int)

Or use GroupBy.cumcount, then compare with 0 and next

c = df.groupby('day').cumcount()
df['claim'] = (c.ne(0) & c.lt(c.shift(-1))).astype(int)
#df['claim'] = (c.gt(c.shift()) & c.lt(c.shift(-1))).astype(int)
print(df)
   day  dist  claim
0    1     5      0
1    1     2      1
2    1     7      0
3    2    11      0
4    2    11      0
5    3     4      0
6    3    10      1
7    3     5      1
8    3    12      0

Solution 4:^[4]

One option is to pivot and use first_valid_index and last_valid_index; of course this will fail if there are duplicates in the combination of index and columns:

positions = df.pivot(None, 'day', 'distance')
first = positions.apply(pd.Series.first_valid_index)
last = positions.apply(pd.Series.last_valid_index)
positions = np.concatenate([first.ravel(), last.ravel()])
df.assign(claimable = np.where(df.index.isin(positions), 0, 1))

   day  distance  claimable
0    1         5          0
1    1         2          1
2    1         7          0
3    2        11          0
4    2        11          0
5    3         4          0
6    3        10          1
7    3         5          1
8    3        12          0

Using transform, as in the other answers, does not have to worry about duplicates.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Rodalm
Solution 2	BENY
Solution 3
Solution 4	sammywemmy

'Can I use Pandas groub by while dropping first and last entries in each group?

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]