'How to create a variable from a SQL statement's output in Databricks
I would like to create a variable that I can use later in multiple sql queries within the same notebook. The variable itself is simply an integer output from a sql statement such as:
select count(people) from persons_table;
Then in subsequent sql command cells, I would simply call the variable in calculations such as:
select (count(cars) / $variable) as car_rate from table_x;
The only variables I've used in Databricks have been simple widgets from a Python script (dbutils.widgets.text('name','value')).
I already have an alternative, where I have a temp table with the output, but would rather use a variable if this is possible.
Solution 1:[1]
I found this while looking for the same problem. As your qustion isn't answered I'll put the one I found for others. I'm not sure it is the best, but it works.
Use a python notebook.
Then you can use python variables.
peopleCount = spark.sql("select count(people) from persons_table").collect()[0][0]
The spark.sql reads the sql into a pyspark dataframe, if you just sent the SQL the variable would be a dataframe object. The collect reads the result from the dataframe into the variable. The [0][0] is the "offset" but you only need to worry about that if you have more than one column or row.
Then in the next cell
%sql
select (count(cars) / '$peopleCount') as car_rate from table_x;
The %sql tells databricks this is a SQL cell and the variable needs to be in quotes with a $
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | JonB65 |
