'How to check the bandwidth usage of nccl backend?
I am using nccl backend to send and recv pytorch tensor on my server. And I am trying to use these command to check the bandwidth usage, is it corrent? Fisrt I use
'tcp://127.0.0.1:1224'
as my dist utils. And then I print
export NCCL_SOCKET_IFNAME=lo
lo is the network interface used for loop. Then I use
sudo iftop -i lo -n -P
But the bandwidth is too small. WHat should I do to check the bandwidth uses of dist.send and dist.recv?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
