'spark.readStream vs Kafkautils.createDirectStream
I was wondering if anyone knew what the difference between the two syntax is? I know both are used to read data from Kafka but what differentiates them?
- spark.readStream.format("kafka")
- KafkaUtils.createDirectStream(__)
Solution 1:[1]
They are part of different dependencies, for one.
The first is for Structured Streaming, and returns Dataframes, and is considered the preferred API for Spark
The second is for RDD Spark Streaming operations where the data might not have any consistency to it (a structure), or if you did want more direct access to the lower level ConsumerRecord object of Spark
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | OneCricketeer |
