'is there efficient way for pandas to get tail rows with a condition

I want to get tail rows with a condition

For example: I want to get all negative tail rows from a column 'A' like:

    test = pd.DataFrame({'A':[-8, -9, -10, 1, 2, 3, 0, -1,-2,-3]})

I expect a 'method' to get new frame like:

note that, it is not certain of how many 'negative' numbers are in the tail. So I can not run test.tail(3)

It looks like the pandas provided 'tail()' function can only run with a given number.

But my input data frame might be too large that I dont want run a simple loop to check one by one

Is there a smart way to do that?

Solution 1:^[1]

What's the tail for? It seems like you just need the negative numbers

test.query("A < 0")

Update: Find where sign changes, split the array and choose last one

split_points = (test.A.shift(1)<0) == (test.A<0) 
np.split(test, split_points.loc[lambda x: x==False].index.tolist())[-1]

Output:

Solution 2:^[2]

Just share a picture of performance comparing above two given answers Thansk Patry and Macro

Solution 3:^[3]

Is this what you wanted?

test = pd.DataFrame({'A':[-8, -9, -10, 1, 2, 3, 0, -1,-2,-3]})

test = test.iloc[::-1]

test.loc[test.index.max():test[test['A'].ge(0)].index[0]+1]

Output:

edit, if you want to get it back into the original order:

test.loc[test.index.max():test[test['A'].ge(0)].index[0]+1].iloc[::-1]

    A
7   -1
8   -2
9   -3

Optional also .reset_index(drop=True) if you need a index starting at 0.

Solution 4:^[4]

I improved my above test, and did another round test, as I feel the old 'testing sample' size was too small,and afaid the %%time measurement might not accurate.

My new test uses a very big head numbers with size of 10000000 and tail with 3 negative numbers

so the new test can prove how the whole data frame size impact the over all performance.

code is like bellow:

    %%time
    arr = np.arange(1,10000000,1)
    arr = np.concatenate((arr, [-2,-3,-4]))
    test = pd.DataFrame({'A':arr})
    test = test.iloc[::-1]
    test.loc[test.index.max():test[test['A'].ge(0)].index[0]+1].iloc[::-1]

    %%time
    arr = np.arange(1,10000000,1)
    arr = np.concatenate((arr, [-2,-3,-4]))
    test = pd.DataFrame({'A':arr})
    split_points = (test.A.shift(1)<0) == (test.A<0) 
    np.split(test, split_points.loc[lambda x: x==False].index.tolist())[-1]

due to system impacts, I tested 10 times, the above 2 methods are very much performs the similar. In about 50% cases Patryk's code even performs faster

Check out this image bellow

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	yunfei
Solution 3
Solution 4	Marco_CH

'is there efficient way for pandas to get tail rows with a condition

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]