'Can't interpret SHAP value output from custom CatBoost model

I have built a sentiment analysis classifier. I want to get feature (word) importance weights from my model, in a format such as

-0.098432 violent

I want these values for each word in a string that is unseen by the model (not training data, so no label). I want to do this on a per-text basis.

I have built a multi-label classifier (with 3 possible labels) using a heavily customized CatBoost model. Because the model is a custom class object (not out-of-the-box CatBoostClassifier), which is then saved and then loaded, it does not have the native attributes/methods of the CatBoostClassifier class. To that end I wrote the get_feature_importance function below for my custom CatBoostPipe class:

def get_feature_importance(self, sent):
    from catboost import Pool
    import pandas as pd
    df = pd.DataFrame(sent, columns=['Content'])
    return self.model.get_feature_importance(data=Pool(data=df[['Content']], 
    text_features=['Content']),type=catboost.EFstrType.ShapValues, thread_count=-1,
                                        prettified=False)

When constructing my model class, I structured the train/val/test sets using Pool() as below:

test = Pool(
    data=df_test[['Content']], 
    label=df_test['Labels'],
    text_features=['Content']
)

Here is what happens when I create an instance of my custom class, then use the get_feature_importance method that I have overwritten:

cat = CatBoostPipe().load()

sent = [f'''When, in 2018, another woman said her husband was physically violent and emotionally abusive, 
        Person2344 accused her of “lying” and asked: “Does he put up with you when you’ve been a crazy ****?”''']

shap_values = cat.get_feature_importance(sent)
print(shap_values)
print(shap_values.shape)

Output:

[[[-2.01463078 -0.71423034]
  [-2.42286171 -1.44455357]
  [ 4.04736766 -1.42623342]]]
(1, 3, 2)

I get that 1 is the number of data samples, 3 the number of classes. But what are the two features for each class? I was expecting an array of weights per word as that is what I have got before for SHAP values from other models (i.e. out of the box CatBoostClassifier + TfIDF Vectorizer fed into a sklearn Pipeline object).

Here is the full class:

class CatBoostPipe:
    def __init__(self):
       self.model = None

    def train(self, train, valid):
        self = self.fit_model(
        train, valid,
        learning_rate=0.1,
        tokenizers=[
            {
                'tokenizer_id': 'Sense',
                'separator_type': 'BySense'}      
        ],
        feature_calcers = [
            'NaiveBayes:top_tokens_count=100000'])

    def fit_model(self, train_pool, test_pool, **kwargs):
        self.model = CatBoostClassifier(
            loss_function='MultiClassOneVsAll',
            iterations=1000,
            eval_metric='Accuracy',
            od_type='Iter',
            **kwargs
        )
        self.model.fit(
                train_pool,
                eval_set=test_pool,
                plot=True,
                use_best_model=True)
        self.model.save_model('CatModel.cbm',
        format="cbm",
        pool=train_pool)
        return self

      
    def predict_proba(self, text):
        return self.model.predict_proba(text)
    
    
    def get_feature_importance(self, sent):
        from catboost import Pool
        import pandas as pd
        df = pd.DataFrame(sent, columns=['Content'])
        return self.model.get_feature_importance(data=Pool(data=df[['Content']], text_features=['Content']),type=catboost.EFstrType.ShapValues, thread_count=-1,
                                            prettified=False)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Can't interpret SHAP value output from custom CatBoost model

Sources

Related Questions