'Replace all except first row in a multi index

I am using pandas and have loaded some data into a dataframe. What I would like to do is replace the scenario frequency column in my data for all but the first value in each group.

My data looks like this:

ExplosionID  FireWater  FireID    Scenario Frequency
111          0          213       4.209055e-15
                        214       4.209055e-15
                        215       4.209055e-15
                        217       4.209055e-15
                        219       4.209055e-15
                        220       4.209055e-15
112          0          232       8.388742e-16
                        233       8.388742e-16
                        234       8.388742e-16
                        235       8.388742e-16
                        237       8.388742e-16
                        239       8.388742e-16
                        240       8.388742e-16

I would like to replace all but the first values in scenario frequency column with 0, so that I end up with this:

ExplosionID  FireWater  FireID    Scenario Frequency
111          0          213       4.209055e-15
                        214       0
                        215       0
                        217       0
                        219       0
                        220       0
112          0          232       8.388742e-16
                        233       0
                        234       0
                        235       0
                        237       0
                        239       0
                        240       0

The first three columns (ExplosionI, FireWater, FireID) are the indexes in a multi-index.

I've defined a function:

#function to replace all but first value in group with 0
def replace_all_except_first(group):
    group.iloc[1:] = 0
    return group

and have tried the following:

data_to_sum = HL_df_subset.groupby(level=0).apply(replace_all_except_first)

where HL_df_subset is my dataframe. However, this places all values with 0.

I'm new to python and i know i'm completely misunderstanding how groupby works, but i've been trying all sorts and can't get it to work.

Thanks for your help.



Solution 1:[1]

  • cumcount: to find the ordering. create a boolean series where not equal to 0. This means, not the first row
  • mask: takes true values and masks the relevant parts of the dataframe. In this case, it makes everywhwere the cumcount isn't zero a np.nan.
  • fillna: take those np.nan and fill them with zero

HL_df_subset.mask(HL_df_subset.groupby(level=0).cumcount().ne(0)).fillna(0)

consider df

df = pd.DataFrame(
    dict(A=np.arange(100, 116)),
    pd.MultiIndex.from_product(
        [list('ab'), list('xy'), [1, 2, 3, 4]]))
df

         A
a x 1  100
    2  101
    3  102
    4  103
  y 1  104
    2  105
    3  106
    4  107
b x 1  108
    2  109
    3  110
    4  111
  y 1  112
    2  113
    3  114
    4  115

df.mask(df.groupby(level=[0, 1]).cumcount().ne(0)).fillna(0)

           A
a x 1  100.0
    2    0.0
    3    0.0
    4    0.0
  y 1  104.0
    2    0.0
    3    0.0
    4    0.0
b x 1  108.0
    2    0.0
    3    0.0
    4    0.0
  y 1  112.0
    2    0.0
    3    0.0
    4    0.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1