'structured spark streaming with parquet file
I want to execute this code https://towardsdatascience.com/sentiment-analysis-on-streaming-twitter-data-using-spark-structured-streaming-python-fc873684bfe3 but the problem is i don't know how can i do with parquet! the first time i try spark streaming. spark version: 3.2.1 the error is shown below:
StreamingQueryException Traceback (most recent call last)
<ipython-input-1-f061cfa92c68> in <module>
47 .option("checkpointLocation", "./check")\
48 .trigger(processingTime='60 seconds').start()
---> 49 query.awaitTermination()
C:\spark\spark\python\pyspark\sql\streaming.py in awaitTermination(self, timeout)
99 return self._jsq.awaitTermination(int(timeout * 1000))
100 else:
--> 101 return self._jsq.awaitTermination()
102
103 @property
C:\spark\spark\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py in __call__(self, *args)
1319
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1323
C:\spark\spark\python\pyspark\sql\utils.py in deco(*a, **kw)
115 # Hide where the exception came from that shows a non-Pythonic
116 # JVM exception message.
--> 117 raise converted from None
118 else:
119 raise
StreamingQueryException: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
=== Streaming Query ===
Identifier: all_tweets [id = 541f763b-861e-4964-8bbc-c0076dc08e1f, runId = 8fa35c87-c7af-4139-90b2-20c7513845ed]
Current Committed Offsets: {}
Current Available Offsets: {}
Current State: ACTIVE
Thread State: RUNNABLE
Logical Plan:
Repartition 1, true
+- Project [word#17, polarity#20, subjectivity_detection(word#17) AS subjectivity#24]
+- Project [word#17, polarity_detection(word#17) AS polarity#20]
+- Project [regexp_replace(word#15, :, , 1) AS word#17]
+- Project [regexp_replace(word#13, RT, , 1) AS word#15]
+- Project [regexp_replace(word#11, #, , 1) AS word#13]
+- Project [regexp_replace(word#9, @\w+, , 1) AS word#11]
+- Project [regexp_replace(word#6, http\S+, , 1) AS word#9]
+- Filter atleastnnonnulls(1, word#6)
+- Project [CASE WHEN (word#3 = ) THEN cast(null as string) ELSE word#3 END AS word#6]
+- Project [word#3]
+- Generate explode(split(value#0, t_end, -1)), false, [word#3]
+- StreamingDataSourceV2Relation [value#0], org.apache.spark.sql.execution.streaming.sources.TextSocketTable$$anon$1@69457fe6, TextSocketV2[host: 127.0.0.1, port: 5555]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
