'How to retrieve and apply a Catboost binary model file saved with GridFS in MongoDB
I am able to save a large Catboost model in MongoDB using GridFS with the following syntax:
# Export and save model as a file
model.save_model(file_path_model, format='cbm')
# Load exported file and insert in Mongo
with open(file_path_model, mode='rb') as file: # b is important -> binary
file_content_model = file.read()
binary_model = Binary(file_content_model)
# Get the gridfs object (will be save in fs.chunks)
fs = gridfs.GridFS(self.database)
# Store binary file in Mongo DB using grid_fs
new_id = fs.put(binary_model)
I am also able to retrieve the binary file from MongoDB using GridFS objectid:
db = modelDBStorageManager.database
fs = gridfs.GridFS(db)
bin_model = fs.get( ObjectId(document['_id'])).read()
But what I want to do is to convert the retrieve binary model, so that I can reapply it on some data.
I tried to save the model and load it using .load_model() Catboost function:
# Saving the model
def save_binary_file(bin_model):
model1 = str(bin_model)
fo = open("./Catboost_binary_files/binary.cbm", "w")
fo.write(model1)
fo.close()
save_binary_file(bin_model)
# Trying to load back the model
from_file = CatBoostClassifier()
model = from_file.load_model("./Catboost_binary_files/binary.cbm", format = 'cbm')
I get the following error:
---------------------------------------------------------------------------
CatBoostError Traceback (most recent call last)
<ipython-input-21-35e2109c72ed> in <module>
1 from_file = CatBoostClassifier()
2
----> 3 model = from_file.load_model("./Catboost_binary_files/binary.cbm", format = 'cbm')
~/opt/anaconda2/envs/fsbo-fraud-catboost-py37/lib/python3.7/site-packages/catboost/core.py in load_model(self, fname, format)
2587 if not isinstance(fname, STRING_TYPES):
2588 raise CatBoostError("Invalid fname type={}: must be str().".format(type(fname)))
-> 2589 self._load_model(fname, format)
2590 return self
2591
~/opt/anaconda2/envs/fsbo-fraud-catboost-py37/lib/python3.7/site-packages/catboost/core.py in _load_model(self, model_file, format)
1313
1314 def _load_model(self, model_file, format):
-> 1315 self._object._load_model(model_file, format)
1316 self._set_trained_model_attributes()
1317 for key, value in iteritems(self._get_params()):
_catboost.pyx in _catboost._CatBoost._load_model()
_catboost.pyx in _catboost._CatBoost._load_model()
CatBoostError: catboost/libs/model/model.cpp:648: Incorrect model file descriptor
There seems to be an issue with the file format.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
