'How to check the understanding of a trained model?

I'm currently training two models (BERT & MPNet) for a semantic textual similarity (STS) task using the SentenceTransformers library.

Now I want to check, if the base models and/or the trained models understand specific words/names which occur within the training dataset. I tried masking or calculating the similarity to specific categories related to the words/names but the results were hardly distinguishable.

Is there any way to check or even prove if a model understands a specific word or sequence?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to check the understanding of a trained model?

Sources

Related Questions