Not able to remove white space from SQL query output used in pyspark code. I tried, trim,ltrim,rtrim,replace (multiple nested also) and regex replace. Any other
I'm trying to filter the data frame by values of salary then saving them as CSV files using pyspark. spark = SparkSession.builder.appName('SparkByExamples.com')
I am trying to validate date received in file against configured date format(using to_timestamp /to_date). schema = StructType([ \ StructField("date",String
Wanted to create a spark dataframe from json string without using schema in Python. The json is mutlilevel nested which may contain array. I had used below for
I stuck with a spark.sql error that I couldn't solve with answers in stackoverflow, the point is I tried "first_value, collected_list" and they not solving erro
Python doesn't like the ampersand below. I get the error:& is not a supported operation for types str and str. Please review your code. Any idea how to get
Question: In Apache Spark Dataframe, using Python, how can we get the data type and length of each column? I'm using latest version of python. Using pandas data
It occured duplicate records when spark-sql overwrite hive table . when spark job has failure stages,but dateframe has no duplicate records? when I run the jo
problem screenshot :14: error: not found: value spark import spark.implicits._ ^ :14: error: not found: value spark import spark.sql ^ here is my enviroment con
i am trying to join columns values to a list of values df1= name | department| state | id| -----+-----------+-------+---+ James|Sales |NY |101 Maria|F
I need help in converting the below function into an SQL query: start_time :- 1649289600end_time :- 1649375999 test_data = df.withColumn("from_timestamp",to_t
I have a log file in csv which has a column contains a list of filepaths separated by comma. I want to split those filepaths into new rows using pyspark(or exce
I am trying to extract a value from an array in SparkSQL, but getting the error below: Example column customer_details {"original_customer_id":"ch_382820","fi
I have two Dataframes facts: columns: data, start_date and end_date holidays: column: holiday_date What I want is a way to produce another Dataframe that has
I have seen methods for inserting into Hive table, such as insertInto(table_name, overwrite =True, but I couldn't work out how to handle the scenario below. For
I found similar question link , but no answer provided how to fix the issue. I want to make a UDF, that would extract for me words from column. So, I want to cr
I have Glue DBs(db1 and db2) and tables(tbl1 and tbl2) available in different AWS regions(eu-west-1 and us-east-1) respectively. My glue job in eu-west-1, needs
This is my piece of code . There is a good lot of business logic happening here. I have tried to explain it in understandable manner as much as possible. I have
I have this Existing table tb1 in my database Now new data comes and new data is stored in another table tb2 Earlier Account_Number 9988 was Level 2, But now
Trying to process JSON data in a column on Databricks. Below is the sample data from a table (its a weather device records info) JSON_Info {"sampleData":"dataD