Category "azure-databricks"

SQL order of execution

I wonder how this query is executing successfully. As we know 'having' clause execute before the select one then here how alias name used in 'select' statement

What is the best way to cleanup and recreate databricks delta table?

I am trying to cleanup and recreate databricks delta table for integration tests. I want to run the tests on devops agent so i am using JDBC (Simba driver) bu

How to give input to prompt asked in cells in Databricks Notebook?

As you can see the library I'm using is asking to make an entry but there's no box/window where I can make the entry. How do I make an entry here amongst y/n/u/

How to Check Which Record is non-numeric in a String Column in Delta Table

I am working on Delta table using Databricks on Azure. The Delta table contains about 100 million records with many columns. One column data type of which is S

Delete multiple rows from a delta table/pyspark data frame givien a list of IDs

I need to find a way to delete multiple rows from a delta table/pyspark data frame given a list of ID's to identify the rows. As far as I can tell there isn't a

Error while running Scala code - Databricks 7.3LTS and above

I am running databricks 7.3LTS and having errors while trying to use scala bulk copy. The error is: object sqldb is not a member of package com.microsoft. I hav

Spark job as a web service?

A peer of mine has created code that opens a restful api web service within an interactive spark job. The intent of our company is to use his code as a means o

Azure Databricks keep long-running notebook alive when closing browser

I am working with Azure Databricks jupyter notebooks and have time-consuming jobs (complex queries, model training, loops over many items, etc.). Every time I c

cURL request not working in GitHub actions, works locally

I’m attempting to make a cURL patch request via a GitHub action that executes when I make a push. The cURL call works perfectly when I execute it in Windo

How to sharing code between two projects on Azure Databricks

I have two ML projects on Azure Databricks that work almost the same except that they are for different clients. Essentially I want to use some management syste

Py4JJavaError in an Azure Databricks notebook pipeline

I have a curious issue, when launching a databricks notebook from a caller notebook through dbutils.notebook.run (I am working in Azure Databricks). One intere

Spark SQL - org.apache.spark.sql.AnalysisException

The error described below occurs when I run Spark job on Databricks the second time (the first less often). The sql query just performs create table as select

Azure Purview Data Lineage with Databricks

I am using Azure Purview for Data Governance, and Data Lineage. We use Databricks in our Data Architecture, but there isn't any native support for capturing Dat

Databricks init script is failing to install packages but reporting as "Succeeded" regardless?

I have been attempting to setup 'init scripts' on databricks, so I can install all of my python libraries and keep the environment controlled. Tried yesterday u

Databricks Spark: java.lang.OutOfMemoryError: GC overhead limit exceeded i

I am executing a Spark job in Databricks cluster. I am triggering the job via a Azure Data Factory pipeline and it execute at 15 minute interval so after the su

How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql.

how to pass parameter to python script from a pipeline [closed]

I am building an Azure Data Factory pipeline and I would like to know how to get this parameter into the python script. The python script is l

Databricks Cluster terminated. Reason: Cloud Provider Launch Failure

I'm using Azure Databricks with a custom configuration that uses vnet injection and I am unable to start a cluster in my workspace. The error message being give

define environment variable in databricks init script

I want to define an environment variable in Databricks init script and then read it in Pyspark notebook. I wrote this: dbutils.fs.put("/databricks/scripts/i

Azure Databricks workspace using terraform

Trying to create Databricks workspace using terraform but unsupported arguments: resource "azurerm_databricks_workspace" "workspace" { name = "