'Implementing custom loss function in scikit learn

I want to implement a custom loss function in scikit learn. I use the following code snippet:

def my_custom_loss_func(y_true,y_pred):
   diff3=max((abs(y_true-y_pred))*y_true)
   return diff3

score=make_scorer(my_custom_loss_func,greater_ is_better=False)
clf=RandomForestRegressor()
mnn= GridSearchCV(clf,score)
knn = mnn.fit(feam,labm) 

What should be the arguments passed into my_custom_loss_func? My label matrix is called labm. I want to calculate the difference between the actual and the predicted output (by the model ) multiplied by the true output. If I use labm in place of y_true, what should I use in place of y_pred?



Solution 1:[1]

Okay, there's 3 things going on here:

1) there is a loss function while training used to tune your models parameters

2) there is a scoring function which is used to judge the quality of your model

3) there is hyper-parameter tuning which uses a scoring function to optimize your hyperparameters.

So... if you are trying to tune hyperparameters, then you are on the right track in defining a "loss fxn" for that purpose. If, however, you are trying to tune your whole model to perform well on, lets say, a recall test - then you need a recall optimizer to be part of the training process. It's tricky, but you can do it...

1) Open up your classifier. Let's use an RFC for example: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

2) click [source]

3) See how it's inheriting from ForestClassifier? Right there in the class definition. Click that word to jump to it's parent definition.

4) See how this new object is inheriting from ClassifierMixin? Click that.

5) See how the bottom of that ClassifierMixin class says this?

from .metrics import accuracy_score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

That's your model being trained on accuracy. You need to inject at this point if you want to train your model to be a "recall model" or a "precision model" or whatever model. This accuracy metric is baked into SKlearn. Some day, a better man than I will make this a parameter which models accept, however in the mean time, you gotta go into your sklearn installation, and tweak this accuracy_score to be whatever you want.

Best of luck!

Solution 2:[2]

The documentation for make_scorer goes like this:

sklearn.metrics.make_scorer(score_func, greater_is_better=True, needs_proba=False, 
needs_threshold=False, **kwargs)

So, it dosen't need you to pass arguments while calling the function. Is this what you were asking?

Solution 3:[3]

The arguments of your my_custom_func_loss, does not have any connection with your true labels, which is labm. You can keep the way as it now.

Internally GridSearchCV will call the scoring function hence your true labels does not conflict there. y_pred would be the predicted values, generated from the model's output. y_true will be assigned with the values of labm.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 birdmw
Solution 2 Abhishek
Solution 3 Venkatachalam