'Convert python pickle from protocol 5 to protocol 4
Problem:
Under Python 3.8 MLflow logs Scikit-Learn models using pickle protocol 5. However, python 3.7 only supports pickle protocol 4, so the models cannot be loaded by scikit-learn.
I wonder, whether it's possible for a python 3.8 script to automatically convert pickle v5 data to pickle v4. Basically, I want to do pickle.dump(picle.load(data), protocol=4). The problem with this is that this requires all the packages and modules used inside the pickled data to be installed. But maybe there is a way to do a lower-level to do transcoding without reconstructing all objects. Just converting the pickle data stream.
Is it possible to convert pickle v5 data to pickle v4 data stream without recontructing all pickled objects?
Update 1:
I've checked the pickle disassembly for mlflow autologging from python 3.7 and python 3.8:
In python 3.7:
# 0: \x80 PROTO 4
# 2: \x95 FRAME 65694
# 11: \x8c SHORT_BINUNICODE 'sklearn.ensemble._forest'
# 37: \x94 MEMOIZE (as 0)
# 38: \x8c SHORT_BINUNICODE 'RandomForestRegressor'
...
In python 3.8+:
# 0: \x80 PROTO 5
# 2: \x95 FRAME 70185
# 11: \x8c SHORT_BINUNICODE 'sklearn.ensemble._forest'
# 37: \x94 MEMOIZE (as 0)
# 38: \x8c SHORT_BINUNICODE 'RandomForestRegressor'
...
The python 3.8 pickle uses BYTEARRAY8.
// I wonder why does BYTEARRAY8 need to exist when there is already BINBYTES8. Is it just to save time and memory converting from bytes to bytearray?
Maybe it's possible to convert the BYTEARRAY8 opcode to BINBYTES8 plus call to the bytearray constructor.
Update 2:
It looks like NumPy arrays are pickled/unpickled differently:
Protocol 4 uses numpy.core.multiarray._reconstruct
Protocol 5 uses numpy.core.numeric._frombuffer
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
