'Numpy: random seed and multithreading causes differing results

Tested on python 3.7, numpy 1.17.3:

it seems, that the random number generation with numpy when using a fixed seed and multithreading is not providing consistent results. This issue does not come up with scipy. The following snippet shows the problem:

import numpy as np
from scipy.stats import nbinom 

from concurrent.futures import ThreadPoolExecutor, as_completed


def load_data_np():
    np.random.seed(0)
    return np.random.negative_binomial(5, 0.3, size=2)
def load_data_scipy():
    return nbinom.rvs(5, 0.3, size=2, random_state=0)

These two methods should thus produce always the same numbers. But when producing the data in threaded loop...

with ThreadPoolExecutor() as executor:
   futures = list(
       (executor.submit(load_data_np)
        for i in range(1000))
   )
   print(np.diff([future.result() for future in as_completed(futures)]))

on can find such values among the output of numpy:

...
 [  4]
 [ -3]
 [-15]
 [ -3]
 [  5]
 [ -6]
 [  0]
 [  6]
 [  1]
 [-13]
 [ -7]
 [  3]
 [  6]
 [ -2]
 [ -1]
 [-11]
 [  3]
...

This must mean, that inbetween subsequent computations for the 2 samples (size=2) the random seed must have been reset by another thread, which throws the other threads off in their rng count. Just to compare this to scipy:

with ThreadPoolExecutor(max_workers=cpu_count()) as executor:
    futures = list(
        (executor.submit(load_data_scipy)
         for i in range(1000))
    )
    print(np.diff([future.result() for future in as_completed(futures)]))

yields the same values every iteration

...
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
...

So what is the proper way of thread-safe rng with a fixed seed in numpy? Googling the issue has lead me back to np.random.seed.

Cheers, Michael



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source