'Batch normalization vs layer normalization
Please illustrate batch normalisation and layer normalisation with a clear notation involving tensors. Also comment on when each one is required/recommended.
Solution 1:[1]
I think what you're looking for is in Group Normalization, by Yuxin Wu, Kaiming He IJCV'20.
Especially Fig. 2:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ivan |

