'Unable to run hop pipelines on Spark running on Kubernetes

I am looking for help in running hop pipelines on Spark cluster, running on kubernetes.

  1. I have spark master deployed with 3 worker nodes on kubernetes
  2. I am using hop-run.sh command to run pipeline on spark running on kubernetes.

Facing Below exception -java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder

Looks like fat.jar is not getting associated with the spark when running hop-run.sh command.


I tried running same with spark-submit command too but not sure how to pass references of pipelines and workflows to Spark running on kubernetes, though I am able to add fat jar to the classpath (can be seen in logs)

Any kind of help is appreciated. Thanks like



Solution 1:[1]

Could it be that you are using version 1.0? We had a missing jar for S3 VFS which has been resolved in 1.1 https://issues.apache.org/jira/browse/HOP-3327

For more information on how to use spark-submit you can take a look at the following documentation: https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-spark-pipeline-engine.html#_running_with_spark_submit

The location to the fat-jar the pipeline and the required metadata-export can all be VFS locations so no need to place those on the cluster itself.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HansVA