'Saving trained pyspark pipeline
Pipeline consists of one-hot encoding and min_max scaler
stages = asmbler + mm_scaler + str_indexer + ohe
pp_pl = Pipeline(stages=stages).fit(X)
After fitting the model, I'm trying to save it for later use.
Following the documentation https://spark.apache.org/docs/latest/ml-pipeline.html#ml-persistence-saving-and-loading-pipelines, tells me I can do it however no guide. From Pyspark ML - How to save pipeline and RandomForestClassificationModel It says I can save it by executing following
pp_pl.save(path)
But no matter which path I try I cannot save it.(I've tried multiple paths, some output error saying file already exists.)
Py4JJavaError: An error occurred while calling o6061.save.
: java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset
I'm not understanding why we are not giving type of file like .pkl. Also where does path start from?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
