'How do i create a single SparkSession in one file and reuse it other file
I have two py files
com/demo/DemoMain.py
com/demo/Sample.py
In both of the above files i am recreating the SparkSession object , In Pyspark,how do i create a Sparksession in one file and reuse it in other py files . In Scala it is easily possible by creating in one object and import that it everywhere
DemoMain.py
from pyspark.sql.types import StringType, StructType, StructField
from pyspark.sql import SparkSession
from pyspark.sql import Row
def main():
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.getOrCreate()
sc = spark.sparkContext
data=["surender,34","ajay,21"]
lines = sc.parallelize(data)
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))
df=spark.createDataFrame(people)
df.show()
if __name__ == '__main__':
main()
sample.py
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.getOrCreate()
rdd = spark.sparkContext.parallelize(["surender","raja"])
rdd.collect()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
