'Wrong JAVA_HOME in hadoop for spark-shell

I needed to install Hadoop in order to have Spark running on my WSL2 Ubuntu for school projects. I installed Hadoop 3.3.1 and Spark 3.2.1 follow those two tutorials :

Hadoop Tutorial on Kontext.tech Spark Tutorial on Kontext.tech

I correctly set up env variables in my .bashrc :

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
export PATH=$PATH:$JAVA_HOME
export HADOOP_HOME=~/hadoop/hadoop-3.3.1
export SPARK_HOME=~/hadoop/spark-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:/usr/local/hadoop/bin/
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$SPARK_HOME/bin:$PATH
# Configure Spark to use Hadoop classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

As well as the ~/hadoop/spark-3.2.1/conf/spark-env.sh.template :

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/

However when I launch spark-shell, I get this error :

/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 71: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin//bin/java: No such file or directory
/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 96: CMD: bad array subscript

There seems to be a mess up in a redefinition of the $PATH variable but I can't figure out where it can be. Can you help me solve it please ? I don't know Hadoop and know spark well but I never had to install them.



Solution 1:[1]

First, certain Spark packages come with Hadoop, so you don't need to download them separately. More specifically, Spark is built against Hadoop 3.2 for now, so using the latest version might cause its own problems

For your problem, JAVA_HOME should not end in /bin or /bin/java. Check the linked post again...

If you used apt install for java, you shouldn't really need to set JAVA_HOME or the PATH for Java, either, as the package manager will do this for you. Or you can use https://sdkman.io

Note: Java 11 is preferred


You also need to remove .template from any config files for them to actually be used... However, JAVA_HOME is automatically detected by spark-submit, so it's completely optional in spark-env.sh

Same applies for hadoop-env.sh


Also remove /usr/local/hadoop/bin/ from your PATH since it doesn't appear you've put anything in that location

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1