I need to split a column in several columns depending on the number of fields that each record has, for example, if I have the following DF: +---+--------------
HIVE has a metastore and HIVESERVER2 listens for SQL requests; with the help of metastore, the query is executed and the result is passed back. The Thrift frame
I have a dataframe having a column of type MapType<StringType, StringType>. |-- identity: map (nullable = true) | |-- key: string | |-- value: st
We have a table 1 Day table aggregated with group by call_date ,tdlinx_id ,work_request_id ,category_name another table we have 1 week level data aggregated w
Having a date, I create a column with ISO 8601 week date format: from pyspark.sql import functions as F df = spark.createDataFrame([('2019-03-
I am working on a Pyspark using the flatMap function and I am using the split within the function. But I am getting an error which says: AttributeError: 'NoneTy
I'm looking for an inexpensive way to distinguish duplicates and/or uniquely identify rows. I've been looking at the Spark built-ins monotonically_increasing_id
Below table would be the input dataframe col1 col2 col3 1 12;34;56 Aus;SL;NZ 2 31;54;81 Ind;US;UK 3 null Ban 4 Ned null Expected output dataframe [values of c
I have the following PySpark dataframe and I want to find percentile row-wise. value col_a col_b col_c row_a 5.0 0.0 11.0 row_b 3394.0 0
I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pa
I am trying to add a new column to my Spark Dataframe. New column added will be of a size based on a variable (say salt) post which I will use that column to ex
I am stuck in a very odd situation related to Hbase design i would say. Hbase version >> Version 2.1.0-cdh6.2.1 So, the problem statement is, in Hbase, w
I am currently trying to get a flatten a data in databricks table. Since some of the columns are deeply nested and is of 'String' type, i couldn't use explode f
val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.master", "local").getOrCreate() import spark.implicits._ case class Someth
I'm using databricks notebook to extract all field occurrences from a text column using the regexp_extract_all function. Here is the input: field_map#'IFDSIMP.7
I have an huge Oracle table to process, so I define a list of where clauses to read by each Spark node. In the middle of the processing I need to join the data
Given a PySpark SQL such as park.sql('''select 10%4 as hello ''') what is the best way to throw an exception anytime an operator % is used?
Long story short, I'm tasked with converting files from SparkSQL to PySpark as my first task at my new job. However, I'm unable to see many differences outside
I can use: show columns in table_name but this does not allow me to use the output in a query? This throws an error: SELECT * FROM show columns in table_name
I am reading the mariadb table from spark which has date and datetime fields. Spark is throwing error while reading. Below is the schema of mariadb table: Spar