'RuntimeError: Java gateway process exited before sending its port number when Deploying Pyspark model to Azure Container Instance
I am trying to deploy a PySpark model trained in Azure Databricks with MLflow to an ACI in Azure Machine Learning.
I am following the steps in this link:
but I get this error:
SPARK_HOME not set. Skipping PySpark Initialization.
Initializing logger
2022-02-21 09:29:30,269 | root | INFO | Starting up app insights client
logging socket was found. logging is available.
logging socket was found. logging is available.
2022-02-21 09:29:30,270 | root | INFO | Starting up request id generator
2022-02-21 09:29:30,270 | root | INFO | Starting up app insight hooks
2022-02-21 09:29:30,270 | root | INFO | Invoking user's init function
JAVA_HOME is not set
2022-02-21 09:29:31,267 | root | ERROR | User's init function failed
2022-02-21 09:29:31,268 | root | ERROR | Encountered Exception Traceback (most recent call last):
File "/var/azureml-server/aml_blueprint.py", line 191, in register
main.init()
File "/var/azureml-app/execution_script.py", line 15, in init
model = load_model(model_path)
File "/azureml-envs/azureml_5d25bdfadca034daea176336163db1e0/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py", line 667, in load_model
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/azureml-envs/azureml_5d25bdfadca034daea176336163db1e0/lib/python3.8/site-packages/mlflow/spark.py", line 703, in _load_pyfunc
pyspark.sql.SparkSession.builder.config("spark.python.worker.reuse", True)
File "/azureml-envs/azureml_5d25bdfadca034daea176336163db1e0/lib/python3.8/site-packages/pyspark/sql/session.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/azureml-envs/azureml_5d25bdfadca034daea176336163db1e0/lib/python3.8/site-packages/pyspark/context.py", line 392, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/azureml-envs/azureml_5d25bdfadca034daea176336163db1e0/lib/python3.8/site-packages/pyspark/context.py", line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/azureml-envs/azureml_5d25bdfadca034daea176336163db1e0/lib/python3.8/site-packages/pyspark/context.py", line 339, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/azureml-envs/azureml_5d25bdfadca034daea176336163db1e0/lib/python3.8/site-packages/pyspark/java_gateway.py", line 108, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number
My code looks like this:
from mlflow.deployments import get_deploy_client
# set the tracking uri as the deployment client
client = get_deploy_client(mlflow.get_tracking_uri())
# set the model path
model_path = "k_means_model"
# define the model path and the name is the service name
# the model gets registered automatically and a name is autogenerated using the "name" parameter below
client.create_deployment(model_uri='runs:/{}/{}'.format(run_id, model_path), name = 'k-means-model-ml-flow')
While my model settings are:
artifact_path: k_means_model
databricks_runtime: 10.3.x-cpu-ml-scala2.12
flavors:
python_function:
data: sparkml
env: conda.yaml
loader_module: mlflow.spark
python_version: 3.8.10
spark:
model_data: sparkml
pyspark_version: 3.2.1
model_uuid: 76ba9dfb01e1428ab8145a161ec3cf32
run_id: c0090fa9-b382-45b8-be08-d05e16f3cd62
utc_time_created: '2022-02-21 08:47:34.967167'
Can someone help please?
Solution 1:[1]
There are two issues given in logs.
Error – SPARK_HOME not set.
Error – JAVA_HOME is not set.
Make sure SPARK and JAVA settings are configured.
Try to run code with previous versions of SPARK.
If problems still persist, you can raise ticket here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | AbhishekKhandave-MT |