'Big difference in execution time for first and subsequent run of cupy functions

When I run cupy functions on cupy arrays, the first call of a function takes significantly longer than the second run, even if I run it on a different array the second time.

Why is this?

import cupy as cp
cp.__version__
# 7.5.0

A = cp.random.random((1024, 1024))
B = cp.random.random((1024, 1024))

from time import time
def test(func, *args):
    t = time()
    func(*args)
    print("{}".format(round(time() - t, 4)))
    
test(cp.fft.fft2, A)
test(cp.fft.fft2, B)
# 0.129
# 0.001
test(cp.matmul, A, A.T)
test(cp.matmul, B, B.T)
# 0.171
# 0.0
test(cp.linalg.inv, A)
test(cp.linalg.inv, B)
# 0.259
# 0.002


Solution 1:[1]

As per cupy user guide:

Context Initialization: It may take several seconds when calling a CuPy function for the first time in a process. This is because the CUDA driver creates a CUDA context during the first CUDA API call in CUDA applications.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 prafi