'Setting up GPU support in Airflow containers with Docker-compose - (GPU support with Tensorflow)
I am having some difficulties in starting airflow using docker-compose with appropriate GPU libraries to run my machine learning tasks.
The airflow-scheduler throws this error:
airflow-scheduler_1 | 2022-03-21 12:33:36.919960: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Basically, there is no CUDA libraries installed in the /usr/local within the airflow container hence the error. I have installed nvidia-container runtime and set the deamon default runtime in deamon.json file
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \ sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list sudo apt-get update
And I have managed to use the runtime:nvidia in the docker-compose.yaml file. This way within the airflow container I can see nvidia-smi. However CUDA libraries are still missing.
Is there a way to install these libraries automatically (ideally FROM tensorflow/tensorflow:latest-gpu) as these set the CUDA libraries within the container?
On the other hand, if I am not using docker-compose I can start a container with docker:
docker run -it --gpus all tensorflow/tensorflow:latest-gpu
This container has all the libraries that I need. However, I would like to use docker-compose as life will be much easier to run multiple containers and setting up all network. So I would like to avoid this approach.
Also I can use the docker in airflow and mount the docker socket to airflow container such that I can initialise a new container from the airflow. This way, I can have all the CUDA libraries also installed however, it sounds very counter-intuitive and I am having difficulties understanding why I can't set all these within the airflow container originally.
client = docker.from_env()
# run the container
response = client.containers.run(
# The container you wish to call
'tensorflow/tensorflow:latest-gpu',
# The command to run inside the container
'find / -name "libcudart.so.11.0"',
# Passing the GPU access
device_requests=[
docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
]
)
I would appreciate if you can assist me in the right direction.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
