'Why is it thread-safe to perform lazy initialization in python?
I just read this blog post about a recipe to lazily initialize an object property. I am a recovering java programmer and if this code was translated into java, it would be considered a race condition (double check locking). Why does it work in python ? I know there is a threading module in python. Are locks added surreptitiously by the interpreter to make this thread-safe?
How does canonical thread-safe initialisation look in Python?
Solution 1:[1]
- No, no locks are added automatically.
- That's why this code is not thread-safe.
- If it seems to work in a multi-threaded program without problems, it's probably due to the Global Interpreter Lock, which makes the hazard less likely to occur.
Solution 2:[2]
This code is not thread-safe.
Determining thread safety
You could check thread-safety by stepping through the bytecode, like:
from dis import dis
dis('a = [] \n'
'a.append(5)')
# Here you could see that it's thread safe
## 1 0 BUILD_LIST 0
## 3 STORE_NAME 0 (a)
##
## 2 6 LOAD_NAME 0 (a)
## 9 LOAD_ATTR 1 (append)
## 12 LOAD_CONST 0 (5)
## 15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
## 18 POP_TOP
## 19 LOAD_CONST 1 (None)
## 22 RETURN_VALUE
dis('a = [] \n'
'a += 5')
# And this one isn't (possible gap between 15 and 16)
## 1 0 BUILD_LIST 0
## 3 STORE_NAME 0 (a)
##
## 2 6 LOAD_NAME 0 (a)
## 9 LOAD_CONST 0 (5)
## 12 BUILD_LIST 1
## 15 BINARY_ADD
## 16 STORE_NAME 0 (a)
## 19 LOAD_CONST 1 (None)
## 22 RETURN_VALUE
However, I should warn, that bytecode could change over time and thread-safety could depend on python you use (cpython, jython, ironpython etc)
So, general recommendation, if you ever need thread-safety, use synchronization mechanisms: Locks, Queues, Semaphores, etc.
Thread-safe version of LazyProperty
Thread-safety for descriptor you've mentioned, could be brought like this:
from threading import Lock
class LazyProperty(object):
def __init__(self, func):
self._func = func
self.__name__ = func.__name__
self.__doc__ = func.__doc__
self._lock = Lock()
def __get__(self, obj, klass=None):
if obj is None: return None
# __get__ may be called concurrently
with self.lock:
# another thread may have computed property value
# while this thread was in __get__
# line below added, thx @qarma for correction
if self.__name__ not in obj.__dict__:
# none computed `_func` yet, do so (under lock) and set attribute
obj.__dict__[self.__name__] = self._func(obj)
# by now, attribute is guaranteed to be set,
# either by this thread or another
return obj.__dict__[self.__name__]
Canonical thread-safe initialization
For a canonical thread-safe initialization, you need to code a metaclass, which acquires lock at creation time, and releases after the instance has been created:
from threading import Lock
class ThreadSafeInitMeta(type):
def __new__(metacls, name, bases, namespace, **kwds):
# here we add lock to !!class!! (not instance of it)
# class could refer to its lock as: self.__safe_init_lock
# see namespace mangling for details
namespace['_{}__safe_init_lock'.format(name)] = Lock()
return super().__new__(metacls, name, bases, namespace, **kwds)
def __call__(cls, *args, **kwargs):
lock = getattr(cls, '_{}__safe_init_lock'.format(cls.__name__))
with lock:
retval = super().__call__(*args, **kwargs)
return retval
class ThreadSafeInit(metaclass=ThreadSafeInitMeta):
pass
######### Use as follows #########
# class MyCls(..., ThreadSafeInit):
# def __init__(self, ...):
# ...
##################################
'''
class Tst(ThreadSafeInit):
def __init__(self, val):
print(val, self.__safe_init_lock)
'''
Something completely different from metaclasses solution
And finally, if you need simpler solution, just create common init lock and create instances using it:
from threading import Lock
MyCls._inst_lock = Lock() # monkey patching | or subclass if hate it
...
with MyCls._inst_lock:
myinst = MyCls()
However, it's easy to forget which may bring a very interesting debugging times. Also possible to code a class decorator, but in my opinion, it would be no better, than metaclass solution.
Solution 3:[3]
To expand on @thodnev's answer, here's how to protect lazy property initialisation:
class LazyProperty(object):
def __init__(self, func):
self._func = func
self.__name__ = func.__name__
self.__doc__ = func.__doc__
self.lock = threading.Lock()
def __get__(self, obj, klass=None):
if obj is None: return None
# __get__ may be called concurrently
with self.lock:
# another thread may have computed property value
# while this thread was in __get__
if self.__name__ not in obj.__dict__:
# none computed `_func` yet, do so (under lock) and set attribute
obj.__dict__[self.__name__] = self._func(obj)
# by now, attribute is guaranteed to be set,
# either by this thread or another
return obj.__dict__[self.__name__]
Solution 4:[4]
A more performant solution based on @DimaTisnek: (See comments for reasons)
class LazyProperty(object):
def __init__(self, func):
self._func = func
self.__name__ = func.__name__
self.__doc__ = func.__doc__
self.lock = threading.Lock()
def __get__(self, obj, klass=None):
if obj is None:
return None
# if the value is already there, we do not need to fetch the lock
# for example, suppose `self.a` is already there but `self.b` is not. Then it can happen that
# `self.b` is slowly initializing, occupying the lock, while someone calls `self.a`.
# Without this additional `if`, the `self.a` will block until `self.b` finishes initializing,
# which is unnecessarily slow.
if self.__name__ not in obj.__dict__:
# __get__ may be called concurrently
with self.lock:
# another thread may have computed property value
# while this thread was in __get__
if self.__name__ not in obj.__dict__:
# none computed `_func` yet, do so (under lock) and set attribute
obj.__dict__[self.__name__] = self._func(obj)
# by now, attribute is guaranteed to be set,
# either by this thread or another
return obj.__dict__[self.__name__]
Another version:
class LazyGetter(object):
def __init__(self, func):
self._func = func
self._data_map = {}
self.lock = threading.Lock()
def get(self, obj):
# if the value is already there, we do not need to fetch the lock
# for example, suppose `self.a` is already there but `self.b` is not. Then it can happen that
# `self.b` is slowly initializing, occupying the lock, while someone calls `self.a`.
# Without this additional `if`, the `self.a` will block until `self.b` finishes initializing,
# which is unnecessarily slow.
if obj not in self._data_map:
with self.lock:
# another thread may have computed property value
# while this thread was in __get__
if obj not in self._data_map:
# none computed `_func` yet, do so (under lock) and set attribute
self._data_map[obj] = self._func(obj)
# by now, attribute is guaranteed to be set,
# either by this thread or another
return self._data_map[obj]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | Dima Tisnek |
| Solution 4 |
