Category "azure-databricks"

How to Check Which Record is non-numeric in a String Column in Delta Table

I am working on Delta table using Databricks on Azure. The Delta table contains about 100 million records with many columns. One column data type of which is S

Delete multiple rows from a delta table/pyspark data frame givien a list of IDs

I need to find a way to delete multiple rows from a delta table/pyspark data frame given a list of ID's to identify the rows. As far as I can tell there isn't a

Error while running Scala code - Databricks 7.3LTS and above

I am running databricks 7.3LTS and having errors while trying to use scala bulk copy. The error is: object sqldb is not a member of package com.microsoft. I hav

Spark job as a web service?

A peer of mine has created code that opens a restful api web service within an interactive spark job. The intent of our company is to use his code as a means o

Azure Databricks keep long-running notebook alive when closing browser

I am working with Azure Databricks jupyter notebooks and have time-consuming jobs (complex queries, model training, loops over many items, etc.). Every time I c

cURL request not working in GitHub actions, works locally

I’m attempting to make a cURL patch request via a GitHub action that executes when I make a push. The cURL call works perfectly when I execute it in Windo

How to sharing code between two projects on Azure Databricks

I have two ML projects on Azure Databricks that work almost the same except that they are for different clients. Essentially I want to use some management syste

Py4JJavaError in an Azure Databricks notebook pipeline

I have a curious issue, when launching a databricks notebook from a caller notebook through dbutils.notebook.run (I am working in Azure Databricks). One intere

Spark SQL - org.apache.spark.sql.AnalysisException

The error described below occurs when I run Spark job on Databricks the second time (the first less often). The sql query just performs create table as select

Azure Purview Data Lineage with Databricks

I am using Azure Purview for Data Governance, and Data Lineage. We use Databricks in our Data Architecture, but there isn't any native support for capturing Dat

Databricks init script is failing to install packages but reporting as "Succeeded" regardless?

I have been attempting to setup 'init scripts' on databricks, so I can install all of my python libraries and keep the environment controlled. Tried yesterday u

Databricks Spark: java.lang.OutOfMemoryError: GC overhead limit exceeded i

I am executing a Spark job in Databricks cluster. I am triggering the job via a Azure Data Factory pipeline and it execute at 15 minute interval so after the su

How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql.

how to pass parameter to python script from a pipeline [closed]

I am building an Azure Data Factory pipeline and I would like to know how to get this parameter into the python script. The python script is l

Databricks Cluster terminated. Reason: Cloud Provider Launch Failure

I'm using Azure Databricks with a custom configuration that uses vnet injection and I am unable to start a cluster in my workspace. The error message being give

define environment variable in databricks init script

I want to define an environment variable in Databricks init script and then read it in Pyspark notebook. I wrote this: dbutils.fs.put("/databricks/scripts/i

Azure Databricks workspace using terraform

Trying to create Databricks workspace using terraform but unsupported arguments: resource "azurerm_databricks_workspace" "workspace" { name = "

Azure DevOps CD Pipeline to Deploy Library to Databricks DBFS 403 Forbidden Error

I'm following the tutorial Continuous integration and delivery on Azure Databricks using Azure DevOps to automate the process to deploy and install library on a

Update using JOIN or CTE in Databricks

I am trying to update a delta table in Databricks using the Databricks documentation here as an example. This document talks only about updating a literal value

How to run a Azure DataBricks Notebook and get it's result via Rest API

I have a use case where I need to run a set of notebooks developed in Azure Databricks (that performs several queries and calculations), but the end user (non-t