Category "databricks"

Get the list of loaded files from Databricks Autoloader

We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader databa

Spark SQL - org.apache.spark.sql.AnalysisException

The error described below occurs when I run Spark job on Databricks the second time (the first less often). The sql query just performs create table as select

Using Databricks/Python3.x ZipFile to extract 7gb file from zip

I've got a large NPI zipfile which includes a 7.3gb csv. (file can be located on NPI site here: http://download.cms.gov/nppes/NPI_Files.html -- the Full Replac

How to avoid zipfile error with python-pptx saving files

I am using the python-pptx package to create a number of .pptx files from a series of dataframes. All works well with adding slides and such until it comes time

Issues installing gdal-bin (libmysqlclient21 dependency) on 20.04.3 (databricks job clusters)

I've had, in the past, gdal utilities installed successfully on a Databricks Cluster running 20.04.3 LTS (focal). $ cat /etc/os-release NAME="Ubuntu" VERSION="2

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want

How to slice a pyspark dataframe in two row-wise

I am working in Databricks. I have a dataframe which contains 500 rows, I would like to create two dataframes on containing 100 rows and the other containing t

How Execute Azure data bricks notebook from excel

Is there any way to trigger Azure data bricks notebook from excel, if is there please help me how..? Many thanks

I am trying to connect to databricks through cli, wated to replicate same in Azure devops

In the local system i am writing commands: pip install databricks-cli databricks configure--token token value and later token Now the thing is In azure devops i

Databricks Spark: java.lang.OutOfMemoryError: GC overhead limit exceeded i

I am executing a Spark job in Databricks cluster. I am triggering the job via a Azure Data Factory pipeline and it execute at 15 minute interval so after the su

how to read data from multiple folder from adls to databricks dataframe

file path format is data/year/weeknumber/no of day/data_hour.parquet data/2022/05/01/00/data_00.parquet data/2022/05/01/01/data_01.parquet data/2022/05/01/02/da

How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql.

Databricks display() function equivalent or alternative to Jupyter

I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) functi

Spark Delta table restore to version

I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTa

Databricks Cluster terminated. Reason: Cloud Provider Launch Failure

I'm using Azure Databricks with a custom configuration that uses vnet injection and I am unable to start a cluster in my workspace. The error message being give

Is there any way to unnest bigquery columns in databricks in single pyspark script

I am trying to connect bigquery using databricks latest version(7.1+, spark 3.0) with pyspark as script editor/base language. We ran a below pyspark script to f

How to add a select all option in a sql databricks parameter? Or if the parameter value is null make it select all?

So I want to create a select all button in a parameter. The actual parameter has around 200 options because of the size of the database. However, if I want a ge

Update using JOIN or CTE in Databricks

I am trying to update a delta table in Databricks using the Databricks documentation here as an example. This document talks only about updating a literal value

Databricks: Z-order vs partitionBy

I am learning Databricks and I have some questions about z-order and partitionBy. When I am reading about both functions it sounds pretty similar. Both function

Why time format is changing in Azure Databricks

I have a file with a timestamp with time format as 2017-01-20 16:53:05.212 (yyyy-MM-dd HH:mm:ss.SSS). I have uploaded this file to Azure data lake gen 2 and acc