'Pytorch Model Prediction on GPU or CPU speed improvement
Running a Multi layer perceptron model on CPU is faster then running it on GPU
device = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)
i run it with current code:
for i in data:
v = data[i:256]
v = v[0:1600]
v = np.pad(v,(0,1600-256),'constant')
x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
with torch.no_grad():
out = MODEL(x)
on the same data i have GPU finishing this loop in 3.1798946857452393 seconds and CPU executes in 2.5446364879608154 seconds
now if i load a Convolutional neural network model trained from same data i have GPU executing in 4.280640602111816 seconds and CPU in 8.113759756088257 seconds.
Using multithreading i can split the work when running models on CPU like this:
for i in range threads:
p = multiprocessing.Process(target=my_search_function,parms))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
and just by using 2 CPU cores i have nearly GPU performance.
Running in virtual Machine (Proxmox): 12core cpu 3900x and GTX1060 6G (pass trough) Ubuntu 20.04.4 LTS 8g ram.
Am i doing something wrong or its the correct behaviour? Or any tips to improve performance?
Solution 1:[1]
@Zoom this is my NN model
class MLP(nn.Module):
def __init__(self,num_classes=6):
super(MLP,self).__init__()
hidden_1 = 512
hidden_2 = 512
self.fc1 = nn.Linear(1600, 128)
self.fc2 = nn.Linear(128,256)
self.fc3 = nn.Linear(256,512)
self.fc4 = nn.Linear(512,512)
self.fc5 = nn.Linear(512,num_classes)
self.droput = nn.Dropout(0.2)
self.relu1 = nn.ReLU()
self.relu2 = nn.ReLU()
self.relu3 = nn.ReLU()
self.softmax = nn.Softmax()
def forward(self,x):
# flatten image input
x = x.view(-1,1600)
x = self.fc1(x)
x = self.relu1(x)
x = self.fc2(x)
x = self.relu2(x)
x = self.fc3(x)
x = self.relu3(x)
x = self.droput(x)
x = self.fc5(x)
return x
and i use it like this
device = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)
for i in data:
v = data[i:256]
v = v[0:1600]
v = np.pad(v,(0,1600-256),'constant')
x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
with torch.no_grad():
out = MODEL(x)
by my understaning of batching i can pass a list of "x" tensors, and it will result in a list predictions like this:
L = []
for i in data:
v = data[i:256]
v = v[0:1600]
v = np.pad(v,(0,1600-256),'constant')
x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
L.append(x)
with torch.no_grad():
out = MODEL(L)
where "out" will be a list of tensors of same length as input list?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Adiz |
