'Pandas pivot_table replaces nan with 0 aggfunc='sum'

I am using a multi-value pivot_table of this form:

pivot = df.pivot_table(index=[indices], columns=['column'], values=['start_value','end_value','delta','name','unit'], aggfunc='sum')

The dataframe df contains columns ['start_value','end_value','delta','name','unit'] all of dtype object. This is because 'name' & 'unit' are actually string columns, 'start_value', 'end_value' and 'delta' float columns. Object dtype is an attempt to make the pivot_table work, even though dtypes are different (content-wise).

When one of the values is non-nan, any nan value is converted to a 0, instead of a nan.

df:

indices, column, 'start_value','end_value','delta','name','unit'
A,       '1nan',  nan,          1000,      nan,    'test', 'USD'
A,       'other', nan,          nan,       nan,    'test2', 'USD'

Results in pivot:

indices, ('1nan', 'start_value'), ('1nan', 'end_value'), ('1nan', 'delta'),('1nan', 'name'), ('1nan', 'unit'), ('other', 'start_value'), ('other', 'end_value'), ('other', 'delta'), ('other', 'name'), ('other', 'unit')
A, 0 [should be nan], 1000, 0 [should be nan], 'test','USD', nan, nan, nan, 'test2', 'USD'

Any suggestion on how to get nans instead of 0s?



Solution 1:[1]

To get the result you want with a pivot table: You should populate first Nan values by 0:

df.fillna(0)

Then make your pivot_table.

Solution 2:[2]

Pass mysum() function to the aggfunc parameter in pivot_table()

mysum = lambda x: sum(x)

The x being passed to lambda is a pandas series, so another way to do this is as follows:

mysum = lambda x: x.sum(skipna=False)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mehdidrissi
Solution 2