'Pycharm and Pyspark link
I am trying to get Pyspark to run in my Pycharm IDE (MacOS). It had been running ok in terminal, but now wont. I have read the other Pycharm threads, so I have added in the ContentRoot and defined SPARKHOME in the interpreter, but I still receive this error and I am struggling to decypher the issue:
22/03/15 20:14:43 WARN Utils: Your hostname, Toms-MacBook-Air-3.local resolves to a loopback address: 127.94.0.1; using 192.168.0.6 instead (on interface en0)
22/03/15 20:14:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/03/15 20:14:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-1fb8b80eea10>", line 3, in <module>
runfile('/Users/Tom/PycharmProjects/SparkCourse/total-spend-per-customer.py', wdir='/Users/Tom/PycharmProjects/SparkCourse')
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/Tom/PycharmProjects/SparkCourse/total-spend-per-customer.py", line 5, in <module>
sc = SparkContext(conf = conf)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 146, in __init__
self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1585, in __call__
return_value = get_return_value(
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
Frustratingly, it has also adversely impacted it running on my terminal aswell, the error I get there is:
22/03/15 20:15:44 WARN Utils: Your hostname, Toms-MacBook-Air-3.local resolves to a loopback address: 127.0.0.1; using 192.XXX.X.X instead (on interface en0)
22/03/15 20:15:44 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/03/15 20:15:45 INFO SparkContext: Running Spark version 3.2.1
22/03/15 20:15:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/03/15 20:15:45 INFO ResourceUtils: ==============================================================
22/03/15 20:15:45 INFO ResourceUtils: No custom resources configured for spark.driver.
22/03/15 20:15:45 INFO ResourceUtils: ==============================================================
22/03/15 20:15:45 INFO SparkContext: Submitted application: TotalCustomerSpend
22/03/15 20:15:45 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/03/15 20:15:45 INFO ResourceProfile: Limiting resource is cpu
22/03/15 20:15:45 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/03/15 20:15:45 INFO SecurityManager: Changing view acls to: Tom
22/03/15 20:15:45 INFO SecurityManager: Changing modify acls to: Tom
22/03/15 20:15:45 INFO SecurityManager: Changing view acls groups to:
22/03/15 20:15:45 INFO SecurityManager: Changing modify acls groups to:
22/03/15 20:15:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Tom); groups with view permissions: Set(); users with modify permissions: Set(Tom); groups with modify permissions: Set()
22/03/15 20:15:46 INFO Utils: Successfully started service 'sparkDriver' on port 59597.
22/03/15 20:15:46 INFO SparkEnv: Registering MapOutputTracker
22/03/15 20:15:46 INFO SparkEnv: Registering BlockManagerMaster
22/03/15 20:15:46 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/03/15 20:15:46 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
Traceback (most recent call last):
File "/Users/Tom/PycharmProjects/SparkCourse/total-spend-per-customer.py", line 5, in <module>
sc = SparkContext(conf = conf)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 146, in __init__
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1585, in __call__
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
22/03/15 20:15:46 INFO ShutdownHookManager: Shutdown hook called
22/03/15 20:15:46 INFO ShutdownHookManager: Deleting directory /private/var/folders/k1/g12y2m210zlb4bsh_f3w6tm40000gn/T/spark-cdf0bd15-d0cf-446c-8136-1e9ee768ce9e
Any help would be really appreciated.
Solution 1:[1]
I managed to eventually solve this. I needed to add JAVA_HOME path to both my terminal and my Pycharm Environment Variables. Once that was done, it seems to run ok!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Tom_Scott |