'How to add Hadoop to dependencies via maven?

I'm trying to build an apache flink job which has to access files via HDFS. It runs fine locally, but when I submit my job to a flink cluster, I get the error:

Hadoop is not in the classpath/dependencies.

I'm using the Maven shade plugin to build my job.jar. The Flink cluster has no Hadoop jars, so I have to add all of them to the job itself.

Locally, I had to add the option in my IDE settings "Add dependencies with "provided" scope to class path to make it work but I have no idea how to do that with maven.

pom.xml


        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${dep.flink.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>${dep.flink.version}</version>
            <scope>provided</scope>
        </dependency>
...
<build>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>${plugin.maven-compiler.version}</version>
                <configuration>
                    <source>${project.build.targetJdk}</source>
                    <target>${project.build.targetJdk}</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>${plugin.maven-shade.version}</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>true</minimizeJar>
                            <relocations>
                                <relocation>
                                    <pattern>org.apache.commons.cli</pattern>
                                    <shadedPattern>org.test.examples.thirdparty.commons_cli</shadedPattern>
                                </relocation>
                            </relocations>
                            <filters>
                                <!-- Filters out signed files to avoid SecurityException when integrating a signed jar in the resulting jar. -->
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>



Solution 1:[1]

If you on YARN read yarn deployment.

Make sure that the HADOOP_CLASSPATH environment variable is set up (it can be checked by running echo $HADOOP_CLASSPATH). If not, set it up using

    export HADOOP_CLASSPATH=`hadoop classpath`

To add maven dependency you can go to maven repo search artifact you want, choose your version and you will have a dependency code on th page to add it in to your .pom file:

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-hadoop-compatibility_2.11</artifactId>
    <version>1.13.1</version>
    <scope>test</scope>
</dependency>

you add it between the tags <dependencies> </dependencies>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Niko