'Is to possible to set up automatic execution and download of Databricks scripts?
At my company I have developed a few tables in Databrick, which pulls data from our Google query and Azure data lake data.
Those are used in Excel dashboards and SQL tables, but it is a huge pickle that it depends on me manually running and downloading the tables daily or weekly. Is there any way where I can set up jobs so the scripts can be run and tables downloaded to my drive automatically? Then I could use PowerShell to move and rename the files
I have been consulting other coworkers and our Data Science consultant who has it on his roadmap, but there must be a somewhat approachable method.
I have tried googling it, but with no real success.
Solution 1:[1]
Yes, you can use Databricks Notebook activity in Azure Data Factory (ADF) to run the Databricks Notebooks and schedule the execution based on event or any particular time and interval.
To create a ADF pipeline to run Notebook, you need to perform following tasks:
You perform the following steps in this tutorial:
Create a data factory.
Create Linked Service to make connection to the Databricks Notebook.
Create a pipeline that uses Databricks Notebook Activity.
Trigger a pipeline run.
Monitor the pipeline run.
Refer the official Microsoft tutorial Run a Databricks notebook with the Databricks Notebook Activity in Azure Data Factory to follow the step-by-step points for deployment.
Once done, you can trigger your pipeline immediately to check how it is working. To automate the pipeline execution, you can create Schedule Trigger. By doing so you can make the pipeline to execute at any particular time and after certain interval of time. Refer Create a trigger that runs a pipeline on a schedule.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | UtkarshPal-MT |
