'Windows Spark Error java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils

Downloaded apache 3.2.0 the latest one as well as the hadoop file java Java SE Development Kit 17.0.1 is installed too

i am not even able to initialize

input :

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()

Output#

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)


Solution 1:[1]

Spark only supports Java version 8-11. I had the same issue on Linux and switching to Java 11 instead of 17 helped in my case.

Solution 2:[2]

I faced the same issue today, but fixed it by changing JDK from 17 to 8 (only for spark start) as below.

  • spark-3.2.1
  • hadoop3.2
  • python 3.10
 File "D:\sw.1\spark-3.2.1-bin-hadoop3.2\python\lib\py4j-0.10.9.3-src.zip\py4j\protocol.py", line 326, in get_return_value
   raise Py4JJavaError(py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.: > java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$

Env variable was having %JAVA_HOME% to jdk17

Quick fix (incase you want to keep env. variable same but use jdk8 for spark only):

(1) create a batch file (start-pyspark.bat) in d:
(2) add below lines:

set JAVA_HOME=D:\sw.1\jdk1.8.0_332
set PATH=%PATH%;%JAVA_HOME%\bin;%SPARK_HOME%\bin;%HADOOP_HOME%\bin;
pyspark 

(3) on cmd, type <start-pyspark.bat> and enter.

d:\>start-pyspark.bat

d:\>set JAVA_HOME=D:\sw.1\jdk1.8.0_332

d:\>set PATH=D:\sw.1\py.3.10\Scripts\;D:\sw.1\py.3.10\;C:\Program Files\Zulu\zulu-17\bin;C:\Program Files\Zulu\zulu-17-jre\bin;C:\windows\system32;....;D:\sw.1\jdk1.8.0_332\bin;D:\sw.1\spark-3.2.1-bin-hadoop3.2\bin;D:\sw.1\hadoop\bin;

d:\>pyspark
Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/05/27 18:29:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.2.1

(4) If you close this spark prompt and cmd and restart, it will be in clean state as having JDK-17 set as JAVA_HOME from env.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tomasz
Solution 2 ramindroid