I have read these threads [1] [2] [3] [4], and this article. I think I got how batch size and epochs works with DDP, but I am not sure about the learning rate.
At every epoch of my training, I need to split my dataset in n batches of t consecutive samples. For example, if my data is [1,2,3,4,5,6,7,8,9,10], n = 2 and t
I am trying to run some example python3 code https://docs.databricks.com/applications/deep-learning/distributed-training/horovod-runner.html on databricks GPU c