'No Filesystem for scheme 'abfss' with spark-on-k8s Operator

I am trying to run a very simple spark job that will Extract some data from my Azure Data Lake and print it on screen using the spark-on-k8s operator. For that I have built an image using a Dockerfile that looks like this:

FROM gcr.io/spark-operator/spark-py:v3.1.1

USER root:root

RUN mkdir -p /app
WORKDIR /app

COPY jars/ /opt/spark/jars
COPY simple-etl-job.py /app
WORKDIR /app

USER 1001

And when I launch it as a job on Kubernetes it returns me an error saying:

py4j.protocol.Py4JJavaError: An error occurred while calling o56.load.
: java.io.IOException: No FileSystem for scheme: abfss

The strange thing is, I am copying to the /opt/spark/jars directory the same jars used for a local spark-submit job that does the same as my K8s code and runs successfully. Those jars are:

  • hadoop-azure-3.2.0.jar
  • wildfly-openssl-1.0.4.Final.jar
  • hadoop-azure-datalake-3.2.0.jar

What else could I possibly be doing wrong?

P.S.: Here is my spark CRD:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: simple-spark-etl-job
  namespace: spark-operator
spec:
  type: Python
  mode: cluster
  image: "<my-org>/<my-image>:<my-tag>"
  imagePullPolicy: Always
  mainApplicationFile: "local:///app/simple-etl-job.py"
  sparkVersion: "3.1.1"
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: default
  executor:
    cores: 1
    instances: 2
    memory: "512m"
    labels:
      version: 3.1.1


Solution 1:[1]

The issue here maybe occuring due openssl installed versions being not compatible with wildfly-openssl-*.jar in new machine or environment or when adding the hadoop-azure package in the Docker image .

Please check if Upgrading wildfly-openssl-*.final.jar to latest version helps . Also Check for JDK versioning mismatch

Also See if order of jars is making any difference as this

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kavyasaraboju-MT