Category "azure-databricks"

Is there an elegant, easy and fast way to move data out of HBase into MongoDB?

Is there an elegant, easy and fast way to move data out of HBase into MongoDB? I want to migrate HBase to mongoDB. I am new to mongoDB. Could someone please hel

Azure Databricks - Write to parquet file using spark.sql with union and subqueries

Issue: I'm trying to write to parquet file using spark.sql, however I encounter issues when having unions or subqueries. I know there's some syntax I can't seem

Databricks REST API call for updating branch error : User Settings > Git Integration to set up an Azure DevOps personal access token

I am getting below error for updating the repo to a different branch using databricks rest api as mentioned at https://docs.databricks.com/dev-tools/api/latest/

Can I iterate through the widgets in a databricks notebook?

Can I iterate through the widgets in a databricks notebook? Something like this pseudocode? # NB - not valid inputs = {widget.name: widget.value for widget in

Add comments to delta

If a pyspark dataframe is reading some data from a table and writing it to azure delta lake Can we add comments to this newly written file? For e.g Df = sql("se

Reading Databricks tables in Azure

Please clarify my confusion as I keep hearing we need read every Parquet file created by Databricks Delta tables to get to latest data in case of a SCD2 table.

How to get list of all leaf folders from ADLS Gen2 path via Scala code?

We have folders and subfolders in it with year,month, day folders in it. How can we get only the last leaf level folder list using dbutils.fs.ls utility? Exampl

In a pyspark dataframe, when I rename a column, the previous name can still be used for filtering. Bug or feature?

I work on DataBricks with PySpark dataframe containing string-type columns. I use .withColumnRenamed() to rename one of them. Later in the process I use a .filt

How to flatten a nested Json struct using Python databricks

Trying to flatten a nested json response using Python databricks dataframe. I was able to flatten the "survey" struct successfully but getting errors when i try

SQL Azure Data Bricks

We have a table 1 Day table aggregated with group by call_date ,tdlinx_id ,work_request_id ,category_name another table we have 1 week level data aggregated w

Databricks Error: AnalysisException: Incompatible format detected. with Delta

I'm getting the following error when I attempt to write to my data lake with Delta on Databricks fulldf = spark.read.format("csv").option("header", True).option

Split corresponding column values in pyspark

Below table would be the input dataframe col1 col2 col3 1 12;34;56 Aus;SL;NZ 2 31;54;81 Ind;US;UK 3 null Ban 4 Ned null Expected output dataframe [values of c

Azure Databricks Delta Table modifies the TIMESTAMP format while writing from Spark DataFrame

I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pa

How can I execute and schedule Databricks notebook from Azure Devops Pipeline using YAML

I wanted to do CICD of my azure Databricks notebook using YAML file. I have followed the below flow Pushed my code from Databricks notebook to Azure Repos. Crea

Databricks- ConcurrentAppendException:

I'm running like 20 notebooks concurrently and they all update the same Delta table (however, different rows). I'm getting the below exception if any two notebo

Azure ADLS Gen2 file created by Azure Databricks doesn't inherit ACL

I have a databricks notebook that is writing a dataframe to a file in ADLS Gen2 storage. It creates a temp folder, outputs the file and then copies that file to

How to loop through folders in Azure Blob Containers

I have the following code which is written in Visual Studio Code. Now I want to run this in Azure Databricks. I have uploaded the entire folder to my Azure Blob

Spark binary file and Delta Table

I have batches of binary files (~3mb each) that I receive in batches of ~20000 files at a time. These files are used downstream for further processing, but I wa

Load Data Using Azure Batch Service and Spark Databricks

I have File Azure Blob Storage that I need to load daily into the Data Lake. I am not clear on which approach I should use(Azure Batch Account, Custom Activity

Read outlook emails in databricks

I would like to read mails from microsoft outlook using python and run the script using a databricks cluster. I'm using win32com on my local machine and able to