'Why is tensorflow inference time in GPU slower than Matlab inference time in GPU?
I have trained a densenet-based cnn in Matlab using the Deep Learning Toolbox. It takes an input with shape 1x5000x12.I have a training data structure with size 1x5000x12xn, with n being the number of images. The cnn takes the 1x5000x12xi slice (being i the ith image of the training data) and makes a binary clasification.
I have this piece of code to evaluate the cnn with testing data that have the same shape as training data.
When doing the inference with CPU:
%%
len=64;
YPred_1=categorical(zeros(len,1));
scores_1=zeros(len,2);
net=net_fv;
%%
tic;
for i=1:len
[YPred_1(i),scores_1(i,:)]=classify(net,X(:,:,:,i),'ExecutionEnvironment','cpu');
end
toc;
>>Elapsed time is 10.131882 seconds.
When using GPU:
%%
len=64;
YPred_1=categorical(zeros(len,1));
scores_1=zeros(len,2);
net=net_fv;
%%
tic;
for i=1:len
[YPred_1(i),scores_1(i,:)]=classify(net,X(:,:,:,i),'ExecutionEnvironment','gpu');
end
toc;
>>Elapsed time is 1.853006 seconds.
GPU inference is faster than CPU inference in Matlab.
Later, I have exported the model to use Tensorflow in my pc and the results are unexpected for me:
Code:
import os
os.add_dll_directory("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/bin")
#The next two lines disable gpu usage to make cpu work. When comented, gpu is used.
# Otherwise, cpu is used
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
import numpy as np
from tensorflow import keras
import time
if __name__ == "__main__":
labels=np.load(r'D:\tfg\codigo\proyect\final_version\labels_batch_0.npy')
predictions=np.zeros((64,))
pathfile=r'D:\tfg\codigo\proyect\final_version\ECG_net.pb'
model = tf.saved_model.load(pathfile)
inference = model.signatures["serving_default"]
raw_input=np.load(r'D:\tfg\codigo\database\test_data\test_dataset_batch_0.npy')
input=raw_input.astype(np.float32)
start=time.time()
for i in range(64):
tensor=input[np.newaxis,i,:,:,:]
#####result=np.argmax(np.asarray(inference(imageinput=tensor)['softmax']))
#####print(result)
#####predictions[i]=result
#print(inference(imageinput=tensor))
inference(imageinput=tensor)
end=time.time()
print("The time of execution of above program is :", end-start)
When doing inference with CPU:
The time of execution of above program is : 3.842860221862793
When doing inference with GPU:
The time of execution of above program is : 4.381521940231323
Now, inference in gpu is slower than inference in cpu
Finally, running the same code on google colab shows the following:
When doing inference with CPU:
The time of execution of above program is : 12.010186910629272
When doing inference with GPU:
The time of execution of above program is : 1.1063041687011719
Now, in colab, gpu is faster again than cpu.
I honestly do not understand what is happening, because the same code in colab goes as expected but in my pc does a strange thing. The thing is, with the same hardware Matlab goes the same way as google colab.
How can i make Tensorflow go faster with gpu in my pc?
Data:
OS: Windows 10 home, 64-bit
GPU: NVIDIA GeForce MX150
GPU driver: 512.15
CUDA version: 11.2.0
cuDNN version: 8.1.0.7
Tensorflow version: 2.6.0
Python version: 3.9.12
Matlab version: R2021b
Description of my gpu from Matlab:
ans =
CUDADevice with properties:
Name: 'NVIDIA GeForce MX150'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1474e+09
AvailableMemory: 1.6376e+09
MultiprocessorCount: 3
ClockRateKHz: 1531500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Thanks in advance!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
