i am trying to join columns values to a list of values df1= name | department| state | id| -----+-----------+-------+---+ James|Sales |NY |101 Maria|F
I need help in converting the below function into an SQL query: start_time :- 1649289600end_time :- 1649375999 test_data = df.withColumn("from_timestamp",to_t
Does computeSVD() use map , reduce since it is a predefined function? i couldn't know the code of the function. from pyspark.mllib.linalg import Vectors from py
I'm storing in a delta table the prices of products. The schema of the table is like this: id | price | updated 1 | 3 | 2022-03-21 2 | 4 | 2022-03-20
I'm using Glue 3.0 data = [("Java", "6241499.16943521594684385382059800664452")] rdd = spark.sparkContext.parallelize(data) df = rdd.toDF() df.show() df.select(
I have a log file in csv which has a column contains a list of filepaths separated by comma. I want to split those filepaths into new rows using pyspark(or exce
I have two Dataframes facts: columns: data, start_date and end_date holidays: column: holiday_date What I want is a way to produce another Dataframe that has
I have seen methods for inserting into Hive table, such as insertInto(table_name, overwrite =True, but I couldn't work out how to handle the scenario below. For
Getting The Following Issue In PySpark to perform display()/collect() operation on top of a generated dataframe. The df contains single column & Row (JSON d
I found similar question link , but no answer provided how to fix the issue. I want to make a UDF, that would extract for me words from column. So, I want to cr
This question although may seem previously answered it is not. All transposing seem to relate to one column and pivoting the data in that column. I want to ma
I have a program that runs every hour, it receives streaming data and writes it in parquet format in batches into a datalake every time it runs, to be later pro
I am having this error Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkConte
I have 12 spark streaming jobs and it receives a small size data at any time. These scripts has spark transformations and joins. What is the right memory alloca
Trying to process JSON data in a column on Databricks. Below is the sample data from a table (its a weather device records info) JSON_Info {"sampleData":"dataD
When I tried to search in Spark to Elasticsearch an error ocurred The code that i use is the following: from pyspark import SparkContext from pyspark.sql impor
Is there an elegant, easy and fast way to move data out of HBase into MongoDB? I want to migrate HBase to mongoDB. I am new to mongoDB. Could someone please hel
I'm a green hand of python and pyspark. When I run the code of pyspark in pycharm, it always generate the information below. I want to know the reason and solut
I have a table that looks like this common_id table1_address table2_address table3_address table4_address 123 null null stack building12 null 157 123road stree
I'm trying to create a spark dataframe from a dictionary which has data in the format {'33_45677': 0, '45_3233': 25, '56_4599': 43524} .. etc. dict_pairs={'33