'how to connect spark workers to spark driver in kubernetes (standalone cluster)

I created a Dockerfile with just debian and apache spark downloaded from the main website. I then created a kubernetes deployment to have 1 pod running spark driver, and another spark worker

NAME                            READY   STATUS    RESTARTS      AGE
spark-driver-54446998ff-2rz5h   1/1     Running   0             45m
spark-worker-5d55b54d8d-9vfs7   1/1     Running   2 (69m ago)   16h

tested to work

I am able to launch spark driver with ./start-master.sh located in /spark-dir/sbin/. This is the log generated from the start-master.sh file

22/04/28 04:34:25 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
22/04/28 04:34:25 INFO Master: Starting Spark master at spark://10.244.1.148:7077
22/04/28 04:34:25 INFO Master: Running Spark version 3.2.1
22/04/28 04:34:26 INFO Utils: Successfully started service 'MasterUI' on port 8080.
22/04/28 04:34:27 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://spark-driver-54446998ff-2rz5h:8080
22/04/28 04:34:28 INFO Master: I have been elected leader! New state: ALIVE

what spark-driver /etc/hosts produce:

# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.244.1.148    spark-driver-54446998ff-2rz5h

Now I can connect my worker on another pod in the same namespace with: ./start-worker.sh spark://10.244.1.148:7077 and this shows success because in the spark-driver log file, it shows:

22/04/28 04:34:52 INFO Master: Registering worker 10.244.2.134:44413 with 4 cores, 1024.0 MiB RAM

My question is: In order for me to do this dynamically, I need the worker pod to be able to pull the ip address of spark-driver for it to connect. I read that a potential way for doing this is to make use of dns service to achieve this, but so far i've been unsuccessful in getting it to work.

This is my deployment.yaml file which contains the service as well. But i'm unable to understand how it'll work together.

apiVersion: v1
kind: Service
metadata:
  name: spark-driver
spec:
  type: ClusterIP
#  type: NodePort
  selector:
    app.kubernetes.io/name: spark-3.2.1
    app.kubernetes.io/instance: spark-driver
  ports:
    - name: service
      protocol: TCP
      port: 80
      targetPort: service-port
    - name: spark-master
      protocol: TCP
      port: 8080
      targetPort: spark-ui-port
    - name: spark-worker
      protocol: TCP
      port: 7077
      targetPort: spark-wkr-port
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-driver
  labels:
    app.kubernetes.io/name: spark-3.2.1
    app.kubernetes.io/instance: spark-driver
    app.kubernetes.io/version: 0.0.4
    app.kubernetes.io/managed-by: kubernetes-standalone-cluster
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spark-3.2.1-driver
  template:
    metadata:
      labels:
        app: spark-3.2.1-driver
    spec:
      containers:
        - name: spark-driver
          image: zzzzzzzzzzz
          ports:
            - containerPort: 80
              name: service-port
            - containerPort: 8080
              name: spark-ui-port
            - containerPort: 7077
              name: spark-wkr-port
          resources:
            requests:
              cpu: "2"
              memory: "2Gi"
            limits:
              cpu: "4"
              memory: "3Gi"
          env:
            - name: SPARK_MASTER_HOST
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: SPARK_MASTER_PORT
              value: "7077"
            - name: SPARK_MODE
              value: driver
            - name: TERM
              value: xterm
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-worker
  labels:
    app.kubernetes.io/name: spark-3.2.1-worker
    app.kubernetes.io/instance: spark-worker
    app.kubernetes.io/version: 0.0.4
    app.kubernetes.io/managed-by: kubernetes-standalone-cluster
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spark-3.2.1-worker
  template:
    metadata:
      labels:
        app: spark-3.2.1-worker
    spec:
      containers:            
        - name: spark-worker
          image: zzzzzzzzzzz
          resources:
            requests:
              cpu: "2"
              memory: "1Gi"
            limits:
              cpu: "4"
              memory: "2Gi"
          env:
            - name: SPARK_MODE
              value: worker
            - name: TERM
              value: xterm
---

How should I be configuring the service, or spark env such that the dns can be used by spark worker to connect to spark driver?



Solution 1:[1]

I am trying to accomplish the same as you, so far unsuccessful. However, this may be useful to you:

You can expose the hostIP of the driver pod as an env variable like so:

-env:
- name: "SPARK_DRIVER_HOST_IP"
      valueFrom:
        fieldRef:
          apiVersion: "v1"
          fieldPath: "status.hostIP"

This works for me, as I go ahead and set the spark.host.driver property to be the SPARK_DRVIER_HOST_IP. I actually am doing this from within my container that is supposed to run the spark application, in my SparkConf settings.

However my issue is that the executor is getting Connection Refused when trying to connect to [driverHostIP]:[PORT]. I suspect this is maybe because I need a service to expose this IP like you have in your yaml? but I am not sure.

I am hoping that we both have the 2 pieces of this solution and that using the exposed driver IP address and the spark driver service will work. Let me know if having the driver IP handy it helpful or not.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 9945