'Serializing lambdas and functions with dill. Is there a better (faster) way?
Im writing a aio scrapper called scrubber upon lxml, httpx and asyncio. From beginning I wanted to outsource the cpu bounded parsing duty to a separate process. After failing several times, learning about pickle and other issues concerning multiprocessing, I now found a way to let the process consume all that work. And Im a little proud of this solution also.
But it turns out, that the serialization I need is "terrible" slow, and at the end im not sure if this will have any advantage (I know - everybody warned me that serialization for mp could be a show stopper) One of the problems is now of course the serialization with dill. dill package is needed because of serialization of lambdas. This I need to keep the namespace clear and give s smart and easy to use interface.
The test Ive done looks like this
import pickle
import dill
import timeit
# emulates the lambda a bit
def serialize_me(x):
return 42 + x
p_t = timeit.timeit("pickle.dumps(serialize_me)",globals=globals(),number=10_000)
d_t = timeit.timeit("dill.dumps(serialize_me)",globals=globals(),number=10_000)
dl_t = timeit.timeit("dill.dumps(lambda x: 42)", globals=globals(), number=10_000)
dumped_p = pickle.dumps(serialize_me)
dumped_d = dill.dumps(serialize_me)
dumped_dl = dill.dumps(lambda x: 42)
ls_p = timeit.timeit("pickle.loads(dumped_p)", globals=globals(), number=10_000)
ls_d = timeit.timeit("dill.loads(dumped_d)", globals=globals(), number=10_000)
ls_dl = timeit.timeit("dill.loads(dumped_dl)", globals=globals(), number=10_000)
print("--serialize a function--")
print("pickle: ",p_t)
print("dill: ", d_t)
print("dill pure lambda: ", dl_t)
print("relative p/d: ", str(d_t/p_t))
print("--Compose that function--")
print("pickle: ", ls_p)
print("dill: ", ls_d)
print("dill lambda: ", ls_dl)
Resulting to
--serialize a function--
pickle: 0.018466599998646416
dill: 5.7938681000014185
dill pure lambda: 5.5841117999989365
relative p/d: 313.7485027252501
--Compose that function--
pickle: 0.017203299998072907
dill: 0.19668400000227848
dill lambda: 0.1924564000000828
Now, under real condition using pytest-benchmark the gap is not that big and lays 3-10 times slower but will not scale better than this.
But still far away from what one could await? Yes I read this Why is dill slow? and yes this was clear from the beginning, thats why I avoided dill, but at the very end the lambdas kicked in and become very useful.
Now the question is if I should kick that and hope, that parsing will not hang the event loop, dive down to realms of python to write a pickler for lambdas or what else?
Meanwhile Im using the `asyncio.run_in_executor` and removed the producer/consumer design.
I also found a solution to serialize lambdas but did not find a way to register it at pickle.
I will update this post as soon as I have it ready on github.
Possible Answer
I made a small package. Available through github at the moment only. This package works like a proxy using pythons marshall serializer. It is about 100 times faster than using dill for this isolated case.
https://github.com/cloasdata/lambdser
pip install lambdser
Solution 1:[1]
I'm the dill author. Writing a new "python" serializer isn't the answer. In fact, when you compare dill to python's pickle, the speed is roughly the same.
As you can see, they are about the same:
Python 3.9.10 (main, Jan 15 2022, 12:09:07)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> import dill
>>> import timeit
>>>
>>> def doit(x): return x
...
>>> timeit.timeit("dill.dumps(doit)", globals=globals(), number=10000)
1.315841937000016
>>> timeit.timeit("pickle._dumps(doit)", globals=globals(), number=10000)
1.1476000689999637
You'll note that I've used pickle._dumps which uses the python-based pickler. In recent versions of python, pickle uses a C-based pickler by default (formerly known as cPickle)... and it's roughly 100x faster, as you experienced.
>>> timeit.timeit("pickle.dumps(doit)", globals=globals(), number=10000)
0.01674603300000399
So, you'll need to write a C-based pickler to compete with the speed of pickle. That also means, that all of the registered pickling solutions need to be written in C as well, if you want roughly the same speed-up. And then, you'll need to modify/fork multiprocessing to use your new serializer instead of pickle.
There is currently an effort in dill (and other serializers like cloudpickle) to better leverage the C pickling interface. If you'd like to contribute to either, you are more than welcome to do so.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mike McKerns |
