'Why is it thread-safe to perform lazy initialization in python?

I just read this blog post about a recipe to lazily initialize an object property. I am a recovering java programmer and if this code was translated into java, it would be considered a race condition (double check locking). Why does it work in python ? I know there is a threading module in python. Are locks added surreptitiously by the interpreter to make this thread-safe?

How does canonical thread-safe initialisation look in Python?



Solution 1:[1]

  1. No, no locks are added automatically.
  2. That's why this code is not thread-safe.
  3. If it seems to work in a multi-threaded program without problems, it's probably due to the Global Interpreter Lock, which makes the hazard less likely to occur.

Solution 2:[2]

This code is not thread-safe.

Determining thread safety

You could check thread-safety by stepping through the bytecode, like:

from dis import dis

dis('a = [] \n'
    'a.append(5)')
# Here you could see that it's thread safe
##  1           0 BUILD_LIST               0
##              3 STORE_NAME               0 (a)
##
##  2           6 LOAD_NAME                0 (a)
##              9 LOAD_ATTR                1 (append)
##             12 LOAD_CONST               0 (5)
##             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
##             18 POP_TOP
##             19 LOAD_CONST               1 (None)
##             22 RETURN_VALUE

dis('a = [] \n'
    'a += 5')
# And this one isn't (possible gap between 15 and 16)
##  1           0 BUILD_LIST               0
##              3 STORE_NAME               0 (a)
##
##  2           6 LOAD_NAME                0 (a)
##              9 LOAD_CONST               0 (5)
##             12 BUILD_LIST               1
##             15 BINARY_ADD
##             16 STORE_NAME               0 (a)
##             19 LOAD_CONST               1 (None)
##             22 RETURN_VALUE

However, I should warn, that bytecode could change over time and thread-safety could depend on python you use (cpython, jython, ironpython etc)

So, general recommendation, if you ever need thread-safety, use synchronization mechanisms: Locks, Queues, Semaphores, etc.

Thread-safe version of LazyProperty

Thread-safety for descriptor you've mentioned, could be brought like this:

from threading import Lock

class LazyProperty(object):

    def __init__(self, func):
        self._func = func
        self.__name__ = func.__name__
        self.__doc__ = func.__doc__
        self._lock = Lock()

    def __get__(self, obj, klass=None):
        if obj is None: return None
        # __get__ may be called concurrently
        with self.lock:
            # another thread may have computed property value
            # while this thread was in __get__
            # line below added, thx @qarma for correction
            if self.__name__ not in obj.__dict__: 
                # none computed `_func` yet, do so (under lock) and set attribute
                obj.__dict__[self.__name__] = self._func(obj)
        # by now, attribute is guaranteed to be set,
        # either by this thread or another
        return obj.__dict__[self.__name__]

Canonical thread-safe initialization

For a canonical thread-safe initialization, you need to code a metaclass, which acquires lock at creation time, and releases after the instance has been created:

from threading import Lock

class ThreadSafeInitMeta(type):
    def __new__(metacls, name, bases, namespace, **kwds):
        # here we add lock to !!class!! (not instance of it)
        # class could refer to its lock as: self.__safe_init_lock
        # see namespace mangling for details
        namespace['_{}__safe_init_lock'.format(name)] = Lock()
        return super().__new__(metacls, name, bases, namespace, **kwds)

    def __call__(cls, *args, **kwargs):
        lock = getattr(cls, '_{}__safe_init_lock'.format(cls.__name__))
        with lock:
            retval = super().__call__(*args, **kwargs)
        return retval


class ThreadSafeInit(metaclass=ThreadSafeInitMeta):
    pass

######### Use as follows #########
# class MyCls(..., ThreadSafeInit):
#     def __init__(self, ...):
#         ...
##################################

'''
class Tst(ThreadSafeInit):
    def __init__(self, val):
        print(val, self.__safe_init_lock)
'''

Something completely different from metaclasses solution

And finally, if you need simpler solution, just create common init lock and create instances using it:

from threading import Lock
MyCls._inst_lock = Lock()  # monkey patching | or subclass if hate it
...
with MyCls._inst_lock:
   myinst = MyCls()

However, it's easy to forget which may bring a very interesting debugging times. Also possible to code a class decorator, but in my opinion, it would be no better, than metaclass solution.

Solution 3:[3]

To expand on @thodnev's answer, here's how to protect lazy property initialisation:

class LazyProperty(object):

    def __init__(self, func):
        self._func = func
        self.__name__ = func.__name__
        self.__doc__ = func.__doc__
        self.lock = threading.Lock()

    def __get__(self, obj, klass=None):
        if obj is None: return None
        # __get__ may be called concurrently
        with self.lock:
            # another thread may have computed property value
            # while this thread was in __get__
            if self.__name__ not in obj.__dict__:
                # none computed `_func` yet, do so (under lock) and set attribute
                obj.__dict__[self.__name__] = self._func(obj)
        # by now, attribute is guaranteed to be set,
        # either by this thread or another
        return obj.__dict__[self.__name__]

Solution 4:[4]

A more performant solution based on @DimaTisnek: (See comments for reasons)

class LazyProperty(object):
    def __init__(self, func):
        self._func = func
        self.__name__ = func.__name__
        self.__doc__ = func.__doc__
        self.lock = threading.Lock()

    def __get__(self, obj, klass=None):
        if obj is None:
            return None

        # if the value is already there, we do not need to fetch the lock
        # for example, suppose `self.a` is already there but `self.b` is not. Then it can happen that
        # `self.b` is slowly initializing, occupying the lock, while someone calls `self.a`.
        # Without this additional `if`, the `self.a` will block until `self.b` finishes initializing,
        # which is unnecessarily slow.
        if self.__name__ not in obj.__dict__:
            # __get__ may be called concurrently
            with self.lock:
                # another thread may have computed property value
                # while this thread was in __get__
                if self.__name__ not in obj.__dict__:
                    # none computed `_func` yet, do so (under lock) and set attribute
                    obj.__dict__[self.__name__] = self._func(obj)

        # by now, attribute is guaranteed to be set,
        # either by this thread or another
        return obj.__dict__[self.__name__]

Another version:

class LazyGetter(object):
    def __init__(self, func):
        self._func = func
        self._data_map = {}
        self.lock = threading.Lock()

    def get(self, obj):
        # if the value is already there, we do not need to fetch the lock
        # for example, suppose `self.a` is already there but `self.b` is not. Then it can happen that
        # `self.b` is slowly initializing, occupying the lock, while someone calls `self.a`.
        # Without this additional `if`, the `self.a` will block until `self.b` finishes initializing,
        # which is unnecessarily slow.
        if obj not in self._data_map:
            with self.lock:
                # another thread may have computed property value
                # while this thread was in __get__
                if obj not in self._data_map:
                    # none computed `_func` yet, do so (under lock) and set attribute
                    self._data_map[obj] = self._func(obj)

        # by now, attribute is guaranteed to be set,
        # either by this thread or another
        return self._data_map[obj]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Dima Tisnek
Solution 4