'How does torch.cuda.synchronize() behave?

According to the PyTorch documentation torch.cuda.synchronize "Waits for all kernels in all streams on a CUDA device to complete.". Questions:

  1. Should this say "Waits for all kernels in all streams initiated by this Python session on a CUDA device to complete"? In other words, if Python session A is running CUDA operations, and I call torch.cuda.synchronize() in Python session B, that won't care about what's happening in Python session A right?

  2. Surely if we don't call torch.cuda.synchronize(), but try to work with any python code referencing the tensors in the computation graph, then it's like implicitly calling it right?

Q2 in code:

output = model(inputs)  # cuda starts working here
a = 1 + 1  # cuda might still be running the previous line. This line can run at the same time
other_model(output) # This implicitly does the same thing as torch.cuda.synchronize() then does a forward pass of other_model
b = a + a  # This line can't happen until cuda is done and the previous line has been executed


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source