'Setting up GPU support in Airflow containers with Docker-compose - (GPU support with Tensorflow)

I am having some difficulties in starting airflow using docker-compose with appropriate GPU libraries to run my machine learning tasks.

The airflow-scheduler throws this error: airflow-scheduler_1 | 2022-03-21 12:33:36.919960: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

Basically, there is no CUDA libraries installed in the /usr/local within the airflow container hence the error. I have installed nvidia-container runtime and set the deamon default runtime in deamon.json file

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \ sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list sudo apt-get update

And I have managed to use the runtime:nvidia in the docker-compose.yaml file. This way within the airflow container I can see nvidia-smi. However CUDA libraries are still missing.

Is there a way to install these libraries automatically (ideally FROM tensorflow/tensorflow:latest-gpu) as these set the CUDA libraries within the container?

On the other hand, if I am not using docker-compose I can start a container with docker: docker run -it --gpus all tensorflow/tensorflow:latest-gpu

This container has all the libraries that I need. However, I would like to use docker-compose as life will be much easier to run multiple containers and setting up all network. So I would like to avoid this approach.

Also I can use the docker in airflow and mount the docker socket to airflow container such that I can initialise a new container from the airflow. This way, I can have all the CUDA libraries also installed however, it sounds very counter-intuitive and I am having difficulties understanding why I can't set all these within the airflow container originally.

     client = docker.from_env()

     # run the container
     response = client.containers.run(

         # The container you wish to call
         'tensorflow/tensorflow:latest-gpu',

         # The command to run inside the container
         'find / -name "libcudart.so.11.0"',

         # Passing the GPU access
         device_requests=[
             docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
         ]
     )

I would appreciate if you can assist me in the right direction.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source