'Why is numpy inverse square root "x**(-1/2)" so much slower than "1/np.sqrt(x)"

In numpy, the square root and power by 1/2 is almost indistinguishable in speed. However, when doing the inverse square root vs the power by -1/2, the latter is about 10x slower.

# Python 3.10.2; numpy 1.22.1; clang-1205.0.22.11; macOS 12.1
import numpy as np

arr = np.random.uniform(0, 1, 10000)


%timeit -n 10000 np.sqrt(arr)
%timeit -n 10000 arr**(1/2)

%timeit -n 10000 1 / np.sqrt(arr)
%timeit -n 10000 arr**(-1/2)
10.8 µs ± 472  ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
9.97 µs ± 449  ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

18.2 µs ± 673  ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
187  µs ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Can someone with more familiarity of the source implementation explain the difference?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source