'Is TabNet's mask explanation biased?
I have been reading about TabNet model and it's predictions "explanations" through the attentive transformers' masks values.
However, if the inputs values are not normalized, aren't this masks values simply a normalization scalar (and not a feature's importance value)?
E.g.: A feature Time is expressed in days and has a mask mean value of 1/365, it could mean that the mask is simply normalizing the feature?
Let me know if I wasn't clear in my question.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
