'How to add Hadoop to dependencies via maven?
I'm trying to build an apache flink job which has to access files via HDFS. It runs fine locally, but when I submit my job to a flink cluster, I get the error:
Hadoop is not in the classpath/dependencies.
I'm using the Maven shade plugin to build my job.jar. The Flink cluster has no Hadoop jars, so I have to add all of them to the job itself.
Locally, I had to add the option in my IDE settings "Add dependencies with "provided" scope to class path to make it work but I have no idea how to do that with maven.
pom.xml
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${dep.flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>${dep.flink.version}</version>
<scope>provided</scope>
</dependency>
...
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${plugin.maven-compiler.version}</version>
<configuration>
<source>${project.build.targetJdk}</source>
<target>${project.build.targetJdk}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>${plugin.maven-shade.version}</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>true</minimizeJar>
<relocations>
<relocation>
<pattern>org.apache.commons.cli</pattern>
<shadedPattern>org.test.examples.thirdparty.commons_cli</shadedPattern>
</relocation>
</relocations>
<filters>
<!-- Filters out signed files to avoid SecurityException when integrating a signed jar in the resulting jar. -->
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
Solution 1:[1]
If you on YARN read yarn deployment.
Make sure that the HADOOP_CLASSPATH environment variable is set up (it can be checked by running echo $HADOOP_CLASSPATH). If not, set it up using
export HADOOP_CLASSPATH=`hadoop classpath`
To add maven dependency you can go to maven repo search artifact you want, choose your version and you will have a dependency code on th page to add it in to your .pom file:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_2.11</artifactId>
<version>1.13.1</version>
<scope>test</scope>
</dependency>
you add it between the tags <dependencies> </dependencies>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Niko |
