'Changing the function of a Pickled Object

As I understand pickle, you can send objects between files and projects, as long as those object's classes exist in both namespaces. I have two applications that will pass around a Prime object.

class Prime():
    def __init__(self):
        self.a = 1
    def func(self):
        print(self.a)

Both applications start with the above version of Prime. But the first application will change the functionality of Prime, such that func will print("hello world"). The second application will then receive the first's version of Prime through pickle and use it such that:

Second.py:

i = Prime
i.func()
with open("temp.txt", "r") as text:
    o = pickle.load(text)
    o.func()

Output:

1
hello world

My two-part question is this. If the second application only has the original version of Prime in its namespace will it be able to work with first's, as long as the class and method names are not changed? If so, how can I go about changing the functionality of first's Prime.func?



Solution 1:[1]

The only right way to do this is to make both applications use the exact same version of whichever module defines the Prime class, importable under the exact same qualified name. As What can be pickled and unpickled? explains:

… functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.

In other words, when you unpickle a Prime object in the second app, it will be an instance of the second app's version of the Prime class, even if it came from the first app.


More generally, pickling is designed to serialize data to be read back in the same application, or at least in a very tightly coupled application that shares all of the relevant code. If you want more a decoupled interchange mechanism, consider JSON or YAML.


But let's say you know all of that, and you really do want to pickle a method implementation for some reason. Can you do that?

Sure you can. It's just going to be a lot of work, and a little hacky. Since you're trying to do something Python explicitly tries not to do, you kind of have to expect that.

First, you need to write a code pickler that passes enough information that you can call the types.CodeType constructor with it. That's on the borderline between implementation details vs. just a deep part of the language, so the only place you can see the constructor arguments is by typing help at the interactive console, and the only way you can tell what those arguments mean is by looking at the table in the inspect docs and guessing which argument goes with while member. (That's pretty simple—argcount goes with co_argcount, etc.)

You'll note that some of those code members may not actually make sense. For example, do you really want to pass co_filename and co_firstlineno if the receiver isn't going to have those files at the same path in the filesystem? (That would lead to errors generating tracebacks, instead of just tracebacks without source info.)

Anyway, the pickler just creates and pickles a tuple of whichever members you want, and the unpickler does the reverse. But you probably want to cram sys.version_info or some other marker in there so you don't try to unpickle bytecode that you can't run. (Look at how .pyc files work for details.)

Next you have to do the same thing with function types—which will of course call the code pickler for their code object.

So, now you have code that can pickle and unpickle a function (including its code). What good does that do you?

When you write your Primes.__getstate__ and/or Primes.__reduce__, you can override the normal pickling mechanism to consider the implementation of the func method as part of the object's state.

If you really want, you can pickle the metaclass, so the func method can be pickled as part of the class's state instead, which means you'll end up with a normal instance-method descriptor on the class dict instead of a bound method crammed into the object's dict. But that won't work if you can have both "local" and unpickled Primes objects at the same time.

Solution 2:[2]

To the first question: No. That's not how pickling works. But to your second question, there is an easy way to do this. You use a module called marshal which allows you to serialize the code of a function, and you can just store that with your copy of the object somewhere.

import marshal
import pickle
class A:
    def __init__(self):
        self.a = 1
    def func(self):
        print self.a
firstA = A()
s = pickle.dumps(firstA)
sf = marshal.dumps(A.func.func_code)

Somewhere else, you have a different A class:

class A:
    def __init__(self):
        self.a = 2
    def func(self):
        print "Hello world"

When you load just the object, it will still be using the incorrect function:

secondA = pickle.loads(s)
secondA.func() #prints "Hello world"

You can then use the value of the function's code to call the correct function:

#change the function to match what was stored
secondA.func.__func__.func_code = marshal.loads(sf)
secondA.func() #prints 1

Solution 3:[3]

This is 8 years late, but this question still came up in a google search, so someone may benefit from this answer.

By default, pickling an object will store that object's class's __module__ and __name__, and that object's own __dict__. Unpickling finds and imports a local module with the same name (which may or may not be the module that you used when pickling!), then finds a class within that module with the right __name__, creates a new instance of that class, and gives it the stored __dict__ attributes.

The key point here is that the unpickler won't necessarily open up exactly the same module that the original object's class was defined in, just whatever local module it can find with the same name. Let's pretend you called the module mod that contains your class Prime. So, to get the OP's second application to have different functionality for its Primes, all you'll need to do is to locate the second application in a different location than the first application, so that when the second application tries to look for a local module called mod what it will find is a like-named module whose func will have the second "Hello World" sort of functionality, rather than the original module's functionality. This would make the original Primes in the first app have one sort of functionality, and the unpickled replicas of those Primes in the second app have another sort of functionality.

(If you ever want both sorts in the very same app, as the OP sometimes seems to want, then you'd need to resort to other trickery, and I doubt it'd be worth using pickle for it. E.g. You could just create a subclass of Prime with a new 'func', and then change an object's __class__ to switch which sort of functionality it will have. Or you could use copy() to clone an object, and morph the __class__ of the clone.)

Anyway, unpickling with like-named modules can let you switch functionality across apps, but of course you'd be inviting confusion having two parallel file structures with like-named modules, so I'm not sure I'd actually encourage it! Probably the most common way for something like this to happen is that you save the pickle, then edit your module, then unpickle the saved file into what is now effectively a new module with the old name! So this question may be best to bear in mind as an object lesson for why you usually need to be careful to reload pickles using the same modules as you created them with, not other like-named modules, and not even significantly altered versions of the original module.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 abarnert
Solution 2
Solution 3 JustinFisher