I am using spark 2.4.4 and hive 2.3 ... Using spark, I am loading a dataframe as Hive table using DF.insertInto(hiveTable) if new table is created during run (o
Getting The Following Issue In PySpark to perform display()/collect() operation on top of a generated dataframe. The df contains single column & Row (JSON d
I found similar question link , but no answer provided how to fix the issue. I want to make a UDF, that would extract for me words from column. So, I want to cr
This question although may seem previously answered it is not. All transposing seem to relate to one column and pivoting the data in that column. I want to ma
I am trying to create schema to parse json into spark dataframe I have column value in json which could be either struct or string "value": { "entity-type":
This is my piece of code . There is a good lot of business logic happening here. I have tried to explain it in understandable manner as much as possible. I have
I have a program that runs every hour, it receives streaming data and writes it in parquet format in batches into a datalake every time it runs, to be later pro
I am having this error Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkConte
I have this Existing table tb1 in my database Now new data comes and new data is stored in another table tb2 Earlier Account_Number 9988 was Level 2, But now
I have created a glue table which converts the the json to parquet files .In one of the column which is defined as Map<String,String> having a nested json
I have 12 spark streaming jobs and it receives a small size data at any time. These scripts has spark transformations and joins. What is the right memory alloca
I have an application does the following: Reads URLs from a Hive table Creates HTTP requests from those URLs, hits a server with them and parses the responses W
When I tried to search in Spark to Elasticsearch an error ocurred The code that i use is the following: from pyspark import SparkContext from pyspark.sql impor
I'm trying to create a spark dataframe from a dictionary which has data in the format {'33_45677': 0, '45_3233': 25, '56_4599': 43524} .. etc. dict_pairs={'33
I have a dataset as below col1 extension_col1 2345 2246 2246 2134 2134 2091 2091 Null 1234 1111 1111 Null I need to find the number of extensions available fo
I am working on writing a framework that basically does a data sanity check. I have a set of inputs like { "check_1": [ sql_query_1, sql_query_2 ], "check_2":
I'm working on an ETL job with an SageMaker notebook that uses spark 2.4.0. After joining a couple of tables I keep getting the following errors: Update-- I was
I am new to oozie and trying to understand dataset.xml. I have following dataset and trying to understand what exactly oozie is trying to validate here. what is
Can someone help me with the below. I have an input dataframe. ID process_type STP_stagewise 1 loan_creation Manual 1 loan creation NSTP 1 reimbursement STP 2
this is what cmd said and I don't know how to fix this I saw similar cases like this in the stackoverflow but their suggestion didn't fix my problem I hope you