'Windows Spark Error java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils
Downloaded apache 3.2.0 the latest one as well as the hadoop file java Java SE Development Kit 17.0.1 is installed too
i am not even able to initialize
input :
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()
Output#
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
Solution 1:[1]
Spark only supports Java version 8-11. I had the same issue on Linux and switching to Java 11 instead of 17 helped in my case.
Solution 2:[2]
I faced the same issue today, but fixed it by changing JDK from 17 to 8 (only for spark start) as below.
- spark-3.2.1
- hadoop3.2
- python 3.10
File "D:\sw.1\spark-3.2.1-bin-hadoop3.2\python\lib\py4j-0.10.9.3-src.zip\py4j\protocol.py", line 326, in get_return_value raise Py4JJavaError(py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.: > java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
Env variable was having %JAVA_HOME% to jdk17
Quick fix (incase you want to keep env. variable same but use jdk8 for spark only):
(1) create a batch file (start-pyspark.bat) in d:
(2) add below lines:
set JAVA_HOME=D:\sw.1\jdk1.8.0_332
set PATH=%PATH%;%JAVA_HOME%\bin;%SPARK_HOME%\bin;%HADOOP_HOME%\bin;
pyspark
(3) on cmd, type <start-pyspark.bat> and enter.
d:\>start-pyspark.bat
d:\>set JAVA_HOME=D:\sw.1\jdk1.8.0_332
d:\>set PATH=D:\sw.1\py.3.10\Scripts\;D:\sw.1\py.3.10\;C:\Program Files\Zulu\zulu-17\bin;C:\Program Files\Zulu\zulu-17-jre\bin;C:\windows\system32;....;D:\sw.1\jdk1.8.0_332\bin;D:\sw.1\spark-3.2.1-bin-hadoop3.2\bin;D:\sw.1\hadoop\bin;
d:\>pyspark
Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/05/27 18:29:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.2.1
(4) If you close this spark prompt and cmd and restart, it will be in clean state as having JDK-17 set as JAVA_HOME from env.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tomasz |
| Solution 2 | ramindroid |
