'Batch normalization vs layer normalization

Please illustrate batch normalisation and layer normalisation with a clear notation involving tensors. Also comment on when each one is required/recommended.



Solution 1:[1]

I think what you're looking for is in Group Normalization, by Yuxin Wu, Kaiming He IJCV'20.

Especially Fig. 2:

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ivan