'Pycharm and Pyspark link

I am trying to get Pyspark to run in my Pycharm IDE (MacOS). It had been running ok in terminal, but now wont. I have read the other Pycharm threads, so I have added in the ContentRoot and defined SPARKHOME in the interpreter, but I still receive this error and I am struggling to decypher the issue:

22/03/15 20:14:43 WARN Utils: Your hostname, Toms-MacBook-Air-3.local resolves to a loopback address: 127.94.0.1; using 192.168.0.6 instead (on interface en0)
22/03/15 20:14:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/03/15 20:14:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-1fb8b80eea10>", line 3, in <module>
    runfile('/Users/Tom/PycharmProjects/SparkCourse/total-spend-per-customer.py', wdir='/Users/Tom/PycharmProjects/SparkCourse')
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/Tom/PycharmProjects/SparkCourse/total-spend-per-customer.py", line 5, in <module>
    sc = SparkContext(conf = conf)
  File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 146, in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1585, in __call__
    return_value = get_return_value(
  File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
    at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
    at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
    at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
    at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
    at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:833)

Frustratingly, it has also adversely impacted it running on my terminal aswell, the error I get there is:

22/03/15 20:15:44 WARN Utils: Your hostname, Toms-MacBook-Air-3.local resolves to a loopback address: 127.0.0.1; using 192.XXX.X.X instead (on interface en0)
22/03/15 20:15:44 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/03/15 20:15:45 INFO SparkContext: Running Spark version 3.2.1
22/03/15 20:15:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/03/15 20:15:45 INFO ResourceUtils: ==============================================================
22/03/15 20:15:45 INFO ResourceUtils: No custom resources configured for spark.driver.
22/03/15 20:15:45 INFO ResourceUtils: ==============================================================
22/03/15 20:15:45 INFO SparkContext: Submitted application: TotalCustomerSpend
22/03/15 20:15:45 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/03/15 20:15:45 INFO ResourceProfile: Limiting resource is cpu
22/03/15 20:15:45 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/03/15 20:15:45 INFO SecurityManager: Changing view acls to: Tom
22/03/15 20:15:45 INFO SecurityManager: Changing modify acls to: Tom
22/03/15 20:15:45 INFO SecurityManager: Changing view acls groups to: 
22/03/15 20:15:45 INFO SecurityManager: Changing modify acls groups to: 
22/03/15 20:15:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(Tom); groups with view permissions: Set(); users  with modify permissions: Set(Tom); groups with modify permissions: Set()
22/03/15 20:15:46 INFO Utils: Successfully started service 'sparkDriver' on port 59597.
22/03/15 20:15:46 INFO SparkEnv: Registering MapOutputTracker
22/03/15 20:15:46 INFO SparkEnv: Registering BlockManagerMaster
22/03/15 20:15:46 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/03/15 20:15:46 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
Traceback (most recent call last):
  File "/Users/Tom/PycharmProjects/SparkCourse/total-spend-per-customer.py", line 5, in <module>
    sc = SparkContext(conf = conf)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 146, in __init__
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1585, in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
    at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
    at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
    at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
    at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
    at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:833)

22/03/15 20:15:46 INFO ShutdownHookManager: Shutdown hook called
22/03/15 20:15:46 INFO ShutdownHookManager: Deleting directory /private/var/folders/k1/g12y2m210zlb4bsh_f3w6tm40000gn/T/spark-cdf0bd15-d0cf-446c-8136-1e9ee768ce9e

Any help would be really appreciated.



Solution 1:[1]

I managed to eventually solve this. I needed to add JAVA_HOME path to both my terminal and my Pycharm Environment Variables. Once that was done, it seems to run ok!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tom_Scott