'No Filesystem for scheme 'abfss' with spark-on-k8s Operator
I am trying to run a very simple spark job that will Extract some data from my Azure Data Lake and print it on screen using the spark-on-k8s operator. For that I have built an image using a Dockerfile that looks like this:
FROM gcr.io/spark-operator/spark-py:v3.1.1
USER root:root
RUN mkdir -p /app
WORKDIR /app
COPY jars/ /opt/spark/jars
COPY simple-etl-job.py /app
WORKDIR /app
USER 1001
And when I launch it as a job on Kubernetes it returns me an error saying:
py4j.protocol.Py4JJavaError: An error occurred while calling o56.load.
: java.io.IOException: No FileSystem for scheme: abfss
The strange thing is, I am copying to the /opt/spark/jars directory the same jars used for a local spark-submit job that does the same as my K8s code and runs successfully.
Those jars are:
- hadoop-azure-3.2.0.jar
- wildfly-openssl-1.0.4.Final.jar
- hadoop-azure-datalake-3.2.0.jar
What else could I possibly be doing wrong?
P.S.: Here is my spark CRD:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: simple-spark-etl-job
namespace: spark-operator
spec:
type: Python
mode: cluster
image: "<my-org>/<my-image>:<my-tag>"
imagePullPolicy: Always
mainApplicationFile: "local:///app/simple-etl-job.py"
sparkVersion: "3.1.1"
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: default
executor:
cores: 1
instances: 2
memory: "512m"
labels:
version: 3.1.1
Solution 1:[1]
The issue here maybe occuring due openssl installed versions being not compatible with wildfly-openssl-*.jar in new machine or environment or when adding the hadoop-azure package in the Docker image .
Please check if Upgrading wildfly-openssl-*.final.jar to latest version helps . Also Check for JDK versioning mismatch
Also See if order of jars is making any difference as this
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | kavyasaraboju-MT |
