Category "spark-structured-streaming"

Structured Streaming to Save JSON to HDFS

My Structured Spark Streaming program is to read JSON data from Kafka and write to HDFS in JSON format. I am able to save JSON to HDFS but it saves the JSON st

Distinct Count on Column in Dataset in Structured Streaming

I am New in Structure Streaming Topic. so facing issue while calculating distinct count in column in Dataset/Dataframe. //DataFrame val readFromKafka = sparks

How to stream data from mongodb in Structured Streaming?

Is it possible to use spark structured streaming to read data from mongo db with a readStream ? For standard use of structured streaming, I usually do so: va

Pyspark UDF monitoring with prometheus

I am am trying to monitor some logic in a udf using counters. i.e. counter = Counter(...).labels("value") @ufd def do_smthng(col): if col: counter.label(