'structured spark streaming with parquet file

I want to execute this code https://towardsdatascience.com/sentiment-analysis-on-streaming-twitter-data-using-spark-structured-streaming-python-fc873684bfe3 but the problem is i don't know how can i do with parquet! the first time i try spark streaming. spark version: 3.2.1 the error is shown below:

StreamingQueryException                   Traceback (most recent call last)
<ipython-input-1-f061cfa92c68> in <module>
     47         .option("checkpointLocation", "./check")\
     48         .trigger(processingTime='60 seconds').start()
---> 49     query.awaitTermination()

C:\spark\spark\python\pyspark\sql\streaming.py in awaitTermination(self, timeout)
     99             return self._jsq.awaitTermination(int(timeout * 1000))
    100         else:
--> 101             return self._jsq.awaitTermination()
    102 
    103     @property

C:\spark\spark\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-> 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 

C:\spark\spark\python\pyspark\sql\utils.py in deco(*a, **kw)
    115                 # Hide where the exception came from that shows a non-Pythonic
    116                 # JVM exception message.
--> 117                 raise converted from None
    118             else:
    119                 raise

StreamingQueryException: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
=== Streaming Query ===
Identifier: all_tweets [id = 541f763b-861e-4964-8bbc-c0076dc08e1f, runId = 8fa35c87-c7af-4139-90b2-20c7513845ed]
Current Committed Offsets: {}
Current Available Offsets: {}

Current State: ACTIVE
Thread State: RUNNABLE

Logical Plan:
Repartition 1, true
+- Project [word#17, polarity#20, subjectivity_detection(word#17) AS subjectivity#24]
   +- Project [word#17, polarity_detection(word#17) AS polarity#20]
      +- Project [regexp_replace(word#15, :, , 1) AS word#17]
         +- Project [regexp_replace(word#13, RT, , 1) AS word#15]
            +- Project [regexp_replace(word#11, #, , 1) AS word#13]
               +- Project [regexp_replace(word#9, @\w+, , 1) AS word#11]
                  +- Project [regexp_replace(word#6, http\S+, , 1) AS word#9]
                     +- Filter atleastnnonnulls(1, word#6)
                        +- Project [CASE WHEN (word#3 = ) THEN cast(null as string) ELSE word#3 END AS word#6]
                           +- Project [word#3]
                              +- Generate explode(split(value#0, t_end, -1)), false, [word#3]
                                 +- StreamingDataSourceV2Relation [value#0], org.apache.spark.sql.execution.streaming.sources.TextSocketTable$$anon$1@69457fe6, TextSocketV2[host: 127.0.0.1, port: 5555]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'structured spark streaming with parquet file

Sources

Related Questions