'How can I use snowflake jar in Bitnami Spark Docker container?
I was able to create docker based bitnami stand alone spark instance and run spark jobs on it. However I'm not able not able to write data to snowflake from the the spark dataframe.
I created a Dockerfile to copy the snowflake jar to the image but it still doesn't find the snowflake plugin. However if I check the jars folder the jar file is in there. I get the following error:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 8.0 (TID 10) (172.19.0.3 executor 0): java.lang.ClassNotFoundException:
net.snowflake.spark.snowflake.io.SnowflakeResultSetPartition
Here's my Dockerfile:
FROM docker.io/bitnami/spark
USER root
COPY *.jar /opt/bitnami/spark/jars
What other settings should I be setting to get it to snowflake plugin to be recognized?
Here are my maven dependencies:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.test</groupId>
<artifactId>test-spark</artifactId>
<version>1.0.0</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bom</artifactId>
<version>1.11.837</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>net.snowflake</groupId>
<artifactId>spark-snowflake_2.12</artifactId>
<version>2.10.0-spark_3.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.3.1</version>
</dependency>
</dependencies>
Solution 1:[1]
It is a dependancy issue based on versions. You should use the spark docker related to the version you are referencing and not spark docker latest.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | jamescampbell |