'How to load a Spacy object (pickled in Ubuntu) on a Windows machine?

I use a Ubuntu machine for developing code and then use Windows for deployment. So, I have pickled a Spacy Vectorizer object using dill on my Ubuntu machine. Now I am trying to load it back on a Windows machine (un-pickle it) but I am getting this error on every try.

I have tried converting paths into PurePath, PurePosixPath etc and then pickling it but I am still getting the same error.

On Ubuntu Machine:

import dill as pickle
import spacy
...

class SpacyVectorizer(object):

    def __init__(self):
        self.nlp = spacy.load('en_core_web_md')

    def fit(self, X, y=None):
        return self    

    def transform(self, X):
        doc_vector = [self.nlp(doc).vector for doc in X]
        doc_vector = np.array(doc_vector)
        return doc_vector

    def fit_transform(self, X, y=None):
        return self.transform(X)



MyVectorizer = SpacyVectorizer()

# here I have tried PurePath, & other pathlib functions but none works
pickle.dump(MyVectorizer, open(r'some_path/path/Vect.pkl', 'wb'))

On Windows Machine:

import dill as pickle

obj = pickle.load(open(r'somepath/path/Vect.pkl', 'rb'))

ERROR

NotImplementedError

Traceback (most recent call last)
<ipython-input-32-8553881dccfa> in <module>
----> 1 obj = pickle.load(open(r'somepath/path/Vect.pkl', 'rb'))

D:\Python\envs\Python_BASE\lib\site-packages\dill\_dill.py in load(file, ignore)
    303     # apply kwd settings
    304     pik._ignore = bool(ignore)
--> 305     obj = pik.load()
    306     if type(obj).__module__ == getattr(_main_module, '__name__', '__main__'):
    307         if not ignore:

D:\Python\envs\Python_BASE\lib\pathlib.py in __new__(cls, *args, **kwargs)
    970         if not self._flavour.is_supported:
    971             raise NotImplementedError("cannot instantiate %r on your system"
--> 972                                       % (cls.__name__,))
    973         self._init()
    974         return self

NotImplementedError: cannot instantiate 'PosixPath' on your system

I know doing this can be avoided using a new spacy model in Windows as there is no fitting in the training data, but I wish to know what is causing the error and how it can be fixed. Though, doing this way for any other Vectorizer (which involves fitting on training data like TFIDF etc) works in this way but not just this.

I found some ref on here pathlib.py: Instantiating 'PosixPath' on Windows but doesn't helps.



Solution 1:[1]

They reason why you have received the error is explained in @Pranzells answer.

The solution I used is to create a "make_pickle_able" function to the class

import dill as pickle
import spacy
...

class SpacyVectorizer(object):

    def __init__(self):
        self.nlp = spacy.load('en_core_web_md')

    def fit(self, X, y=None):
        return self    

    def transform(self, X):
        # reinitiate_spacy-function which will be run the first time you use transform-function.
        if self.nlp is None:
            self.reinitiate_spacy()
        doc_vector = [self.nlp(doc).vector for doc in X]
        doc_vector = np.array(doc_vector)
        return doc_vector

    def fit_transform(self, X, y=None):
        return self.transform(X)

    def make_pickle_able(self):
        self.nlp = None

    def reinitiate_spacy(self)
        self.nlp = spacy.load('en_core_web_md')

And Then use it the following way:

MyVectorizer = SpacyVectorizer()
MyVectorizer = MyVectorizer.make_pickle_able()
# here I have tried PurePath, & other pathlib functions but none works
pickle.dump(MyVectorizer, open(r'some_path/path/Vect.pkl', 'wb'))

Note also the reinitiate_spacy-function which will be run the first time you use transform-function.

if self.nlp is None:
    self.reinitiate_spacy()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Joel