'Display GPU Usage While Code is Running in Colab

I have a program running on Google Colab in which I need to monitor GPU usage while it is running. I am aware that usually you would use nvidia-smi in a command line to display GPU usage, but since Colab only allows one cell to run at once at any one time, this isn't an option. Currently, I am using GPUtil and monitoring GPU and VRAM usage with GPUtil.getGPUs()[0].load and GPUtil.getGPUs()[0].memoryUsed but I can't find a way for those pieces of code to execute at the same time as the rest of my code, thus the usage numbers are much lower than they actually should be. Is there any way to print the GPU usage while other code is running?



Solution 1:[1]

Used wandb to log system metrics:

!pip install wandb
import wandb
wandb.init()

Which outputs a URL in which you can view various graphs of different system metrics.

Solution 2:[2]

If you have Colab Pro, can open Terminal, located on the left side, indicated as '>_' with a black background.

You can run commands from there even when some cell is running

Write command to see GPU usage in real-time:

watch nvidia-smi

Solution 3:[3]

A little more clear explaination.

  1. Go to weights and bias and create your account.
  2. Run the following commands.
!pip install wandb
import wandb
wandb.init()
  1. Go to the link in your notebook for authorization - copy the API key.
  2. Paste the key in notebook input field.
  3. After Authorization you will find another link in notebook - see your Model + System matrices there.

Solution 4:[4]

There is another way to see gpu usage but this method only works for seeing the memory usage. Go to click runtime -> manage sessions. This allows you to see how much memory it takes so that you can increase your batch size.

Solution 5:[5]

You can run a script in background to track GPU usage.

Step 1: Create a file to monitor GPU usage in a jupyter cell.

%%writefile gpu_usage.sh
#! /bin/bash
#comment: run for 10 seconds, change it as per your use
end=$((SECONDS+10))

while [ $SECONDS -lt $end ]; do
    nvidia-smi --format=csv --query-gpu=power.draw,utilization.gpu,memory.used,memory.free,fan.speed,temperature.gpu >> gpu.log
    #comment: or use below command and comment above using #
    #nvidia-smi dmon -i 0 -s mu -d 1 -o TD >> gpu.log
done

Step 2: Execute the above script in the background in another cell.

%%bash --bg

bash gpu_usage.sh

Step 3: Run the inference.

Note that the script will record GPU usage for first 10 seconds, change it as per your model running time.

The GPU utilization results will be saved in gpu.log file.

Solution 6:[6]

You can use Netdata to do this - it's open source and free, and you can monitor a lot more than just gpu usage on your colab instance. Here's me monitoring CPU usage while training a large language model.

Just claim your colab instance as a node, and if you're using Netdata cloud you can monitor multiple colab instances simultaneously as well.

Pretty neat.

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Matthew So
Solution 2 Zaccharie Ramzi
Solution 3 Vinay Verma
Solution 4 Gary Ong
Solution 5 kHarshit
Solution 6 egorulz