'Import Error for SparkSession in Pyspark
I have version 2.0 of Spark installed. I am using Pyspark, on Python 2.7. I seem to have no difficulties creating a SparkContext, but for some reason I am unable to import the SparkSession. Does anyone know what I am doing wrong?
import pyspark
import pyspark.sql
from pyspark.sql import SparkSession
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name SparkSession
Solution 1:[1]
Oddly enough this worked perfectly from a different directory. Running the files from this path did not result in an error!
/Users/.../spark-2.1.0-bin-hadoop2.7/python/
Solution 2:[2]
SparkSession was introduced in Apache Spark 2. To use it, you should specify the right version of spark before running pyspark:
export SPARK_MAJOR_VERSION=2
Solution 3:[3]
export the correct spark version of spark installed by you, it worked for me for my version 2.3
export SPARK_VERSION=2.3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Emily.SageMaker.AWS |
| Solution 2 | Rania ZYANE |
| Solution 3 | Abhijeet Sondkar |
