I use spark struture streaming 3.1.2. I need to use s3 for storing checkpoint metadata (I know, it's not optimal storage for checkpoint metadata). Compaction in
problem screenshot :14: error: not found: value spark import spark.implicits._ ^ :14: error: not found: value spark import spark.sql ^ here is my enviroment con
I'm querying data from Cassandra in Spark using SCC 2.5 and Spark 2.4.7 (pyspark). The table I'm reading from has a composite partition key (partition_key_1, pa
I have a dataframe look like this below id pub_date version unique_id c_id p_id type source lni001 20220301 1
I need help in converting the below function into an SQL query: start_time :- 1649289600end_time :- 1649375999 test_data = df.withColumn("from_timestamp",to_t
Does computeSVD() use map , reduce since it is a predefined function? i couldn't know the code of the function. from pyspark.mllib.linalg import Vectors from py
I'm storing in a delta table the prices of products. The schema of the table is like this: id | price | updated 1 | 3 | 2022-03-21 2 | 4 | 2022-03-20
I have a log file in csv which has a column contains a list of filepaths separated by comma. I want to split those filepaths into new rows using pyspark(or exce
I am running a python project through DAG in airflow, and I encounter the following exception when the dag runs this line from the project - df = spark.sql(quer
I am trying to extract a value from an array in SparkSQL, but getting the error below: Example column customer_details {"original_customer_id":"ch_382820","fi
I have two Dataframes facts: columns: data, start_date and end_date holidays: column: holiday_date What I want is a way to produce another Dataframe that has
I have seen methods for inserting into Hive table, such as insertInto(table_name, overwrite =True, but I couldn't work out how to handle the scenario below. For
I am using spark 2.4.4 and hive 2.3 ... Using spark, I am loading a dataframe as Hive table using DF.insertInto(hiveTable) if new table is created during run (o
Getting The Following Issue In PySpark to perform display()/collect() operation on top of a generated dataframe. The df contains single column & Row (JSON d
I found similar question link , but no answer provided how to fix the issue. I want to make a UDF, that would extract for me words from column. So, I want to cr
This question although may seem previously answered it is not. All transposing seem to relate to one column and pivoting the data in that column. I want to ma
I am trying to create schema to parse json into spark dataframe I have column value in json which could be either struct or string "value": { "entity-type":
This is my piece of code . There is a good lot of business logic happening here. I have tried to explain it in understandable manner as much as possible. I have
I have a program that runs every hour, it receives streaming data and writes it in parquet format in batches into a datalake every time it runs, to be later pro
I am having this error Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkConte