'How to pickle a namedtuple instance correctly
I'm learning how to use pickle. I've created a namedtuple object, appended it to a list, and tried to pickle that list. However, I get the following error:
pickle.PicklingError: Can't pickle <class '__main__.P'>: it's not found as __main__.P
I found that if I ran the code without wrapping it inside a function, it works perfectly. Is there an extra step required to pickle an object when wrapped inside a function?
Here is my code:
from collections import namedtuple
import pickle
def pickle_test():
P = namedtuple("P", "one two three four")
my_list = []
abe = P("abraham", "lincoln", "vampire", "hunter")
my_list.append(abe)
with open('abe.pickle', 'wb') as f:
pickle.dump(abe, f)
pickle_test()
Solution 1:[1]
Create the named tuple outside of the function:
from collections import namedtuple
import pickle
P = namedtuple("P", "one two three four")
def pickle_test():
my_list = []
abe = P("abraham", "lincoln", "vampire", "hunter")
my_list.append(abe)
with open('abe.pickle', 'wb') as f:
pickle.dump(abe, f)
pickle_test()
Now pickle can find it; it is a module global now. When unpickling, all the pickle module has to do is locate __main__.P again. In your version, P is a local, to the pickle_test() function, and that is not introspectable or importable.
Note that pickle stores just the module and the class name, as taken from the class's __name__ attribute. Make sure that the first argument of the namedtuple() call matches the global variable you are assigning to; P.__name__ must be "P"!
It is important to remember that namedtuple() is a class factory; you give it parameters and it returns a class object for you to create instances from. pickle only stores the data contained in the instances, plus a string reference to the original class to reconstruct the instances again.
Solution 2:[2]
I found this answer in another thread. This is all about the naming of the named tuple. This worked for me:
group_t = namedtuple('group_t', 'field1, field2') # this will work
mismatched_group_t = namedtuple('group_t', 'field1, field2') # this will throw the error
Solution 3:[3]
After I added my question as a comment to the main answer I found a way to solve the problem of making a dynamically created namedtuple pickle-able. This is required in my case because I'm figuring out its fields only at runtime (after a DB query).
All I do is monkey patch the namedtuple by effectively moving it to the __main__ module:
def _CreateNamedOnMain(*args):
import __main__
namedtupleClass = collections.namedtuple(*args)
setattr(__main__, namedtupleClass.__name__, namedtupleClass)
namedtupleClass.__module__ = "__main__"
return namedtupleClass
Mind that the namedtuple name (which is provided by args) might overwrite another member in __main__ if you're not careful.
Solution 4:[4]
Alternatively, you can use cloudpickle or dill for serialization:
from collections import namedtuple
import cloudpickle
import dill
def dill_test(dynamic_names):
P = namedtuple('P', dynamic_names)
my_list = []
abe = P("abraham", "lincoln", "vampire", "hunter")
my_list.append(abe)
with open('deleteme.cloudpickle', 'wb') as f:
cloudpickle.dump(abe, f)
with open('deleteme.dill', 'wb') as f:
dill.dump(abe, f)
dill_test("one two three four")
Solution 5:[5]
The issue here is the child processes aren't able to import the class of the object -in this case, the class P-, in the case of a multi-model project the Class P should be importable anywhere the child process get used
a quick workaround is to make it importable by affecting it to globals()
globals()["P"] = P
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Michael |
| Solution 3 | |
| Solution 4 | |
| Solution 5 | rachid el kedmiri |
