'Missing values in Categorical Variables in CatBoost (python)
CatBoost can encode categorical variables which is great. However, when categorical features contain missing values in the form np.nan, they can't be processed. This is stated in CatBoost documentation here: cb missing values
However, I read in this GitHub thread that CatBoost can in fact handle categorical variables with missing values. github thread
I tried a mini example to test it:
from catboost import CatBoostClassifier
# Initialize data
cat_features = [0, 1]
train_data = [["a", np.nan, 1, 4, 5, 6],
["a", "b", 4, 5, 6, 7],
["c", "d", 30, 40, 50, 60]]
train_labels = [1, 1, -1]
eval_data = [["a", "b", 2, 4, 6, 8],
["a", "d", 1, 4, 50, 60]]
# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=2,
learning_rate=1,
depth=2)
# Fit model
model.fit(train_data, train_labels, cat_features)
Here we get the error, because column 0has null:
CatBoostError: Invalid type for cat_feature[non-default value idx=0,feature_idx=1]=nan : cat_features must be integer or string, real number values and NaN values should be converted to string.
How can I make this code work without manually filling the null value?
Solution 1:[1]
It actually all works fine if you use Catboost's recommended Pool method that maps the data.
train_data = Pool(data=[[1, np.nan, 5, 6],
[4, 5, 6, 7],
[30, 40, 50, 60]],
label=[1, 1, -1],
weight=[0.1, 0.2, 0.3])
model = CatBoostClassifier(iterations=10)
model.fit(train_data)
Learning rate set to 0.058839
0: learn: 0.6879920 total: 2.32ms remaining: 20.8ms
1: learn: 0.6815428 total: 2.63ms remaining: 10.5ms
2: learn: 0.6765119 total: 2.86ms remaining: 6.67ms
3: learn: 0.6715373 total: 3.86ms remaining: 5.8ms
4: learn: 0.6653022 total: 4.24ms remaining: 4.24ms
5: learn: 0.6591482 total: 5.83ms remaining: 3.88ms
6: learn: 0.6543562 total: 6.11ms remaining: 2.62ms
7: learn: 0.6496176 total: 6.34ms remaining: 1.59ms
8: learn: 0.6436669 total: 6.53ms remaining: 725us
9: learn: 0.6377932 total: 6.75ms remaining: 0us
<catboost.core.CatBoostClassifier at 0x14d60bdd8>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user4718221 |
