'How to configure Databricks token inside Docker File

I have a docker file where I want to

  1. Download the Databricks CLI
  2. Configure the CLI by adding a host and token
  3. And then running a python file that hits the Databricks token

I am able to install the CLI in the docker image, and I have a working python file that is able to submit the job to the Databricks API but Im unsure of how to configure my CLI within docker.

Here is what I have

FROM python
MAINTAINER nope

# Creating Application Source Code Directory
RUN mkdir -p /src

# Setting Home Directory for containers
WORKDIR /src

# Installing python dependencies
RUN pip install databricks_cli

# Not sure how to do this part???
# databricks token kicks off the config via CLI
RUN databricks configure --token

# Copying src code to Container
COPY . /src

# Start Container
CMD echo $(databricks --version)

#Kicks off Pythern Job
CMD ["python", "get_run.py"]

If I was to do databricks configure --token in the CLI it would prompt for the configs like this :

databricks configure --token
Databricks Host (should begin with https://): 


Solution 1:[1]

It's better not to do it this way for multiple reasons:

  1. It's insecure - if you configure Databricks CLI this way it will generate a file inside the container that could be read by anyone who has access to it
  2. Token has time-to-live (default is 90 days) - this means that you'll need to rebuild your containers regularly...

Instead it's just better to pass two environment variables to the container, and they will be picked up by the databricks command. These are DATABRICKS_HOST and DATABRICKS_TOKEN as it described in the documentation.

Solution 2:[2]

When databricks configure is run successfully, it writes the information to the file ~/.databrickscfg:

[DEFAULT]
host = https://your-databricks-host-url
token = your-api-token

One way you could set this in the container is by using a startup command (syntax here for docker-compose.yml):

/bin/bash -ic "echo '[DEFAULT]\nhost = ${HOST_URL}\ntoken = ${TOKEN}' > ~/.databrickscfg"

Solution 3:[3]

It is not very secure to put your token in the DockerFile. However, if you want to pursue this approach you can use the code below.

RUN export DATABRICKS_HOST=XXXXX && \
    export DATABRICKS_API_TOKEN=XXXXX && \
    export DATABRICKS_ORG_ID=XXXXX && \
    export DATABRICKS_PORT=XXXXX && \
    export DATABRICKS_CLUSTER_ID=XXXXX && \
    echo "{\"host\": \"${DATABRICKS_HOST}\",\"token\": \"${DATABRICKS_API_TOKEN}\",\"cluster_id\":\"${DATABRICKS_CLUSTER_ID}\",\"org_id\": \"${DATABRICKS_ORG_ID}\", \"port\": \"${DATABRICKS_PORT}\" }" >> /root/.databricks-connect

Make sure to run all the commands in a line using one RUN command. Otherwise, the variable such as DATABRICKS_HOST or DATABRICKS_API_TOKEN may not properly propagate.

If you want to connect to a Databricks Cluster within a docker container you need more configuration. You can find the required details in this article: How to Connect a Local or Remote Machine to a Databricks Cluster

Solution 4:[4]

The number of personal access tokens per user is limited to 600 But via bash is easy echo "y $(WORKSPACE-REGION-URL) $(CSE-DEVELOP-PAT) $(EXISTING-CLUSTER-ID) $(WORKSPACE-ORG-ID) 15001" | databricks-connect configure

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex Ott
Solution 2 jonchar
Solution 3 Pedram
Solution 4 Romerito Morais