'Is TabNet's mask explanation biased?

I have been reading about TabNet model and it's predictions "explanations" through the attentive transformers' masks values.

However, if the inputs values are not normalized, aren't this masks values simply a normalization scalar (and not a feature's importance value)?

E.g.: A feature Time is expressed in days and has a mask mean value of 1/365, it could mean that the mask is simply normalizing the feature?

Let me know if I wasn't clear in my question.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source