'How to retain NaN values using pandas factorize()?

I have a Pandas data frame with several columns, with some columns comprising categorical entries. I convert (or, encode) these entries to numerical values using factorize() as follows:

for column in df.select_dtypes(['category']):
     df[column] = df[column].factorize(na_sentinel=None)[0]

The columns have several NaN entries, so I let na_sentinel=None to retain the NaN entries. However, the NaN values are not retained (they get converted to numerical entries), which is not what I desire. My Pandas version is 1.3.5. Is there something I am missing?



Solution 1:[1]

Factorize converts NaN values by default to -1. The NaN values are retained in this way since the NaN values can be identified by the -1. You would probably want to keep the default which is:

na_sentinel =-1 

see https://pandas.pydata.org/docs/reference/api/pandas.factorize.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 cinderashes