'How do I replace missing values with NaN

I am using the IMDB dataset for machine learning, and it contains a lot of missing values which are entered as '\N'. Specifically in the StartYear column which contains the movie year release I want to convert the values to integers. Which im not able to do right now, I could drop these values but I wanted to see why they're missing first. I tried several things but no success.

This is my latest attempt:

Solution 1:^[1]

Here is a way to do it without using replace:

import pandas as pd
import numpy as np
df_basics = pd.DataFrame({'startYear':['\\N']*78760+[2017]*18267 + [2018]*18263+[2016]*17837+[2019]*17769+['1996 ','1993 ','2000 ','2019 ','2029 ']})
print(pd.value_counts(df_basics.startYear))
df_basics.loc[df_basics.startYear == '\\N','startYear'] = np.NaN
print(pd.value_counts(df_basics.startYear, dropna=False))

Output:

NaN      78760
2017     18267
2018     18263
2016     17837
2019     17769
1996         1
1993         1
2000         1
2019         1
2029         1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	constantstranger

'How do I replace missing values with NaN

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]