'InternalError: cudaGetDevice() failed. Status: initialization error when running tensorflow
I recently got a new Windows computer that came with a GPU (NVIDIA Quadpro P4200) for work. I was hoping to run some old code I had, but now taking advantage of the GPU. I am attempting to run a LSTM model for text classification using Tensorflow. Let me apologize in advance for what is probably too much information, I just am at a lost and dont know where the issue is. Also, admittedly, I have very little understanding of some of the more technical aspects of this.
I current have the below versions:
\# Name Version Build Channel
tensorflow 2.6.0 gpu_py39he88c5ba_0
tensorflow-base 2.6.0 gpu_py39hb3da07e_0
tensorflow-estimator 2.6.0 pyh7b7c402_0
tensorflow-gpu 2.6.0 h17022bd_0
cudatoolkit 11.3.1 h59b6b97_2
cudnn 8.2.1 cuda11.3_0
Also, when i run nvidia-smi, I get:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 419.17 Driver Version: 419.17 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P4200 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 51C P8 8W / N/A | 111MiB / 8192MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
and when i run nvcc --version, i get:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:52:33_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
When I look in Windows Device Manager, and look at the driver, it lists version: 25.21.14.1917
When I run my code, I get the following error:
InternalError: cudaGetDevice() failed. Status: initialization error
I have googled for solutions, and found several suggesting I use differing versions of cudatoolkit, cudnn and tensorflow. I have tried several options, including reverting back to cudatoolkit 10.1 and cudnn 7.6.5 which required using an older version of tensorflow and python 3.8 (the above is using 3.9). When I made those changes, tensorflow did not appear to detect my GPU at all.
I think I am going to request that our IT department update my GPU driver from here however, I am a bit worried that might not solve my problem (and may make it worse?).
Update
I got everything to work! I guess it will be useful to keep this here in case others have similar issues.
So, the problem was indeed that I needed to use an older version of cudatoolkit. What I did was to install cudatoolkit 10.1. However, when installing tensorflow-gpu 2.3, you need to use pip install and not conda install. Not sure why.
So, in short, to get the older versions of tensorflow to recognize my GPU, I needed to install all packages with pip (although, I did install cudatoolkit and cudnn with conda...all other packages seemed to require pip).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
