'pandas groupby and get all null rows till the first non null value in multiple columns

I am trying to get all null rows above a value using group by.

So, for example given following dataframe.

+----+------------+-----------+--------+----------+--------+----------+
| ID | Start Date | End Date  | Date_D | D-Values | Date_R | R-Values |
+----+------------+-----------+--------+----------+--------+----------+
| A  | 2/26/2015  | 5/26/2015 | JAN_15 |        - | 15-Jan |        - |
| A  | 2/26/2015  | 5/26/2015 | FEB_15 |        - | 15-Feb |        - |
| A  | 2/26/2015  | 5/26/2015 | MAR_15 |        - | 15-Mar |        - |
| A  | 2/26/2015  | 5/26/2015 | APR_15 |        - | 15-Apr |        - |
| A  | 2/26/2015  | 5/26/2015 | MAY_15 |      -28 | 15-May |    15000 |
| A  | 2/26/2015  | 5/26/2015 | JUN_15 |        - | 15-Jun |        - |
| A  | 2/26/2015  | 5/26/2015 | JUL_15 |        - | 15-Jul |        - |
| A  | 2/26/2015  | 5/26/2015 | AUG_15 |        - | 15-Aug |        - |
+----+------------+-----------+--------+----------+--------+----------+

My desired output looks like this.

+----+------------+-----------+--------+----------+--------+----------+
| ID | Start Date | End Date  | Date_D | D-Values | Date_R | R-Values |
+----+------------+-----------+--------+----------+--------+----------+
| A  | 2/26/2015  | 5/26/2015 | FEB_15 |        - | 15-Feb |        - |
| A  | 2/26/2015  | 5/26/2015 | MAR_15 |        - | 15-Mar |        - |
| A  | 2/26/2015  | 5/26/2015 | APR_15 |        - | 15-Apr |        - |
| A  | 2/26/2015  | 5/26/2015 | MAY_15 |      -28 | 15-May |    15000 |
+----+------------+-----------+--------+----------+--------+----------+

EDIT

There are multiple IDs so need to implement on multiple customers. Want rows based on start date and end date e.g. start selecting rows from Feb_15 and upto the last non-null value within the date range.



Solution 1:[1]

You can do transform with idxmax

idx = df[['D-Values','R-Values']].notna().all(1).groupby(df["ID"]).transform('idxmax')
out = df[df.index <= idx]

Solution 2:[2]

IIUC, you want to remove the last rows with "-", and assuming 'D-values" the reference column.

You can compute a cummax on the reversed boolean array:

mask = df['D-Values'].ne('-').iloc[::-1].cummax()
# or, for NaNs:
# mask = df['D-Values'].notna().iloc[::-1].cummax()

df2 = df[mask]

output:

  ID Start Date   End Date  Date_D D-Values  Date_R R-Values
0  A  1/26/2015  5/26/2015  JAN_15        -  15-Jan        -
1  A  1/26/2015  5/26/2015  FEB_15        -  15-Feb        -
2  A  1/26/2015  5/26/2015  MAR_15        -  15-Mar        -
3  A  1/26/2015  5/26/2015  APR_15        -  15-Apr        -
4  A  1/26/2015  5/26/2015  MAY_15      -28  15-May    15000

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BENY
Solution 2 mozway