'How to retain NaN values using pandas factorize()?
I have a Pandas data frame with several columns, with some columns comprising categorical entries. I convert (or, encode) these entries to numerical values using factorize() as follows:
for column in df.select_dtypes(['category']):
df[column] = df[column].factorize(na_sentinel=None)[0]
The columns have several NaN entries, so I let na_sentinel=None to retain the NaN entries. However, the NaN values are not retained (they get converted to numerical entries), which is not what I desire. My Pandas version is 1.3.5. Is there something I am missing?
Solution 1:[1]
Factorize converts NaN values by default to -1. The NaN values are retained in this way since the NaN values can be identified by the -1. You would probably want to keep the default which is:
na_sentinel =-1
see https://pandas.pydata.org/docs/reference/api/pandas.factorize.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | cinderashes |
