'How to calculate theoretical inference time of a network based on GPU?
I am trying to estimate how long would a GPU take to make an inference in a DL network. However, when testing the method, the theoretical and real computing times turn out to be completely different.
Here is what I am currently doing:
I obtained the network's FLOPs by using https://github.com/Lyken17/pytorch-OpCounter as follows:
macs, params = profile(model, inputs=(image, ))
tera_flop = macs * 10 ** -12 * 2
Obtaining 0.0184295 TFLOPs. Then, calculated the FLOPS for my GPU (NVIDIA RTX A3000):
4096 CUDA Cores * 1560 MHz * 2 * 10^-6 = 12.77 TFLOPS
Which gave me a theoretical inference time of:
0.0184 TFLOPs / 12.7795 TFLOPS = 0.00144 s
Then, I measured the real inference time by applying the following:
model.eval()
model.to(device)
image = image.unsqueeze(0).to(device)
start, end = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
reps = 300
timings = np.zeros((reps, 1))
# GPU warmup
for _ in range(10):
_ = model(image)
# Measure performance
with torch.no_grad():
for rep in range(reps):
start.record()
_ = model(image)
end.record()
# Wait for GPU to sync
torch.cuda.synchronize()
curr_time = start.elapsed_time(end)
timings[rep] = curr_time
mean_syn = np.sum(timings) * 10 ** -3 / reps
Which gave me a real computing time of 0.028 s.
Could you please help me figure out what I am doing wrong here?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
