'Minio in docker cluster is not reachable from spark container
I have created network
docker network create app-tier --driver bridge
and used this docker compose file
networks:
default:
external:
name: app-tier
services:
minio:
image: 'bitnami/minio:latest'
container_name: my-minio-server
environment:
- MINIO_ROOT_USER=theroot
- MINIO_ROOT_PASSWORD=theroot123
ports:
- '9000:9000'
- '9001:9001'
volumes:
- ${HOME}/minio/data:/data
spark:
image: docker.io/bitnami/spark:3
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '8080:8080'
- '7077:7077'
volumes:
- ./conf/spark-defaults.conf:/opt/bitnami/spark/conf/spark-defaults.conf
spark-worker1:
image: docker.io/bitnami/spark:3
links:
- "spark:spark"
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '7181:8081'
volumes:
- ./work1:/opt/bitnami/spark/work
- ./conf/spark-defaults.conf:/opt/bitnami/spark/conf/spark-defaults.conf
spark-worker2:
image: docker.io/bitnami/spark:3
links:
- "spark:spark"
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '7182:8082'
volumes:
- ./work2:/opt/bitnami/spark/work
- ./conf/spark-defaults.conf:/opt/bitnami/spark/conf/spark-defaults.conf
I connected to minio at http://127.0.0.1:9001 with the above credentials and I created a service account and an "asiatrip" bucket.
It has the following
s3accessKeyAws = "n1Z8USynE2uOBJmc"
s3secretKeyAws = "RjK4uL35tFNTROo2WsPVZhA77AJ5qJEx"
I can successfully connect to it via the minio client
docker run -it --rm --name minio-client \
--env MINIO_SERVER_HOST="my-minio-server" \
--env MINIO_SERVER_ACCESS_KEY="theroot" \
--env MINIO_SERVER_SECRET_KEY="theroot123" \
--network app-tier --volume $HOME/mcconf:/.mc \
bitnami/minio-client alias set minio http://my-minio-server:9000 n1Z8USynE2uOBJmc RjK4uL35tFNTROo2WsPVZhA77AJ5qJEx --api S3v4
and
docker run -it --rm --name minio-client \
--env MINIO_SERVER_HOST="my-minio-server" \
--env MINIO_SERVER_ACCESS_KEY="theroot" \
--env MINIO_SERVER_SECRET_KEY="theroot123" \
--network app-tier --volume $HOME/mcconf:/.mc \
bitnami/minio-client ls minio
I also can use minio via a docker jupyter in that network
docker run -it --network app-tier -p 8888:8888 jupyter/scipy-notebook:latest
after installing minio package with
!pip install minio
and execute python script
from minio import Minio
from minio.error import S3Error
client = Minio(
"my-minio-server:9000",
access_key="n1Z8USynE2uOBJmc",
secret_key="RjK4uL35tFNTROo2WsPVZhA77AJ5qJEx",
secure=False,
)
# Make 'asiatrip' bucket if not exist.
found = client.bucket_exists("asiatrip")
if not found:
client.make_bucket("asiatrip")
else:
print("Bucket 'asiatrip' already exists")
list(client.list_objects("asiatrip"))
So everything seems set
I installed hadoop-3.3.2 and spark-3.2.1-bin-without-hadoop
I setup my env as follows
export HADOOP_HOME=$HOME/Downloads/hadoop-3.3.2
export SPARK_HOME=$HOME/Downloads/spark-3.2.1-bin-without-hadoop
export PATH=$SPARK_HOME/bin:$HADOOP_HOME/bin:$PATH
export HADOOP_OPTIONAL_TOOLS="hadoop-aws"
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
when I run this python file as
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("Test json")\
.getOrCreate()
s3accessKeyAws = "n1Z8USynE2uOBJmc"
s3secretKeyAws = "RjK4uL35tFNTROo2WsPVZhA77AJ5qJEx"
connectionTimeOut = "1000"
s3endPointLoc = "http://127.0.0.1:9000"
sourceBucket = "asiatrip"
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.endpoint", s3endPointLoc)
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.access.key", s3accessKeyAws)
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.secret.key", s3secretKeyAws)
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.connection.timeout", connectionTimeOut)
spark.sparkContext._jsc.hadoopConfiguration().set("spark.sql.debug.maxToStringFields", "100")
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.path.style.access", "true")
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.connection.ssl.enabled", "false")
inputPath = f"s3a://{sourceBucket}/addresses.csv"
outputPath = f"s3a://{sourceBucket}/output_survey.csv"
df = spark.read.option("header", "true").format("s3selectCSV").csv(inputPath)
df.write.mode("overwrite").parquet(outputPath)
spark.stop()
as
spark-submit miniospark.py
it works fine for the addresses.csv file
a,b
1,2
3,4
6,7
8,9
in asiatrip bucket.
When I submit as
spark-submit --master spark://127.0.0.1:7077 miniospark.py
with
s3endPointLoc = "http://my-minio-server:9000"
It gives up after some time because it cannot resolve my-minio-server.
2022-05-18 15:12:32,246 WARN streaming.FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3a://asiatrip/addresses.csv.
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://asiatrip/addresses.csv: com.amazonaws.SdkClientException: Unable to execute HTTP request: my-minio-server: nodename nor servname provided, or not known: Unable to execute HTTP request: my-minio-server: nodename nor servname provided, or not known
I am on a Mac x64 with Docker Desktop
Solution 1:[1]
To address this issue, you need to use the jinja filter 'replace.' This will override the encoding step and put what you want to the rendered preset.
In the above example use this jinja substitution in the preset:
{{DYNAMIC_PRESET_DATA['audioTracks'] | tojson | replace("\\u003c","<") | replace("\\u003e",">")}}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | CodeChanger |
