'Should ParameterServerStrategy shard input data
I'm confused as to whether if input data should be sharded when using ParameterServerStrategy. Language around distributed training docs suggests all multi-worker training should shard input data:
In multi-worker training, dataset sharding is needed to ensure convergence and performance.
Implies that ParameterServerStrategy should shard data. However, the ParameterServerStrategy tutorial states:
Keras Model.fit with parameter server training assumes that each worker receives the same dataset, except when it is shuffled differently. Therefore, by calling Dataset.shuffle, you ensure more even iterations over the data.
Which implies that I shouldn't shard the input data? How does this work?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
