'Databricks notebook command not recognizing installed packages

I'm trying to programatically monitor the Python libraries that are installed on a cluster in Databricks notebooks. For this I have been relying on the following code snippet:

import requests

def get_context():
    return dbutils.notebook.entry_point.getDbutils().notebook().getContext()

def get_host_name():
    host_name = get_context().tags().get("browserHostName").get()
    return host_name

def get_host_token():
    return get_context().apiToken().get()
  
def get_cluster_id():
    cluster_id = get_context().tags().get("clusterId").get()
    return cluster_id

def get_installed_libraries():
    response = requests.get(
      f'https://{get_host_name()}/api/2.0/libraries/cluster-status?cluster_id={get_cluster_id()}',
      headers={'Authorization': f'Bearer {get_host_token()}'}
    ).json()
    
    return [x['library']['pypi']['package'] for x in response['library_statuses']]
  
get_installed_libraries()

This should print all libraries that are installed on the cluster. However, I notice there is no difference whatsoever in the output of the get_installed_libraries() method before and after running a pip-install command such as pip install spacy, the spacy library, even though successfully installed via pip, doesn't show up in the output of the above method.

Edit: The reason I mentioned installation via the pip command is because the focus is to install libraries to the cluster programatically, e.g. using contents of a requirements.txt file. I have the method to perform the installation, but I'm unable to view where the installed libraries end up, since they are clearly not on the cluster, as the above method doesn't print any of these to console.

What am I missing?



Solution 1:[1]

You can use following simple code snippets for library management through programmatically .

import requests
import json
import sys
from databricks_cli.clusters.api import ClusterApi
from pyspark.sql.functions import from_json, col
from pyspark.sql import functions as F
from functools import reduce
from operator import concat

class BearerAuth(requests.auth.AuthBase):
    def __init__(self, token):
        self.token = token
    def __call__(self, r):
        r.headers["authorization"] = "Bearer " + self.token
        return r
      
var_source_db_instance = 'https://instance.azuredatabricks.net'
      
var_lib_response = requests.get(f'{var_source_db_instance}/api/2.0/libraries/cluster-status?cluster_id=0412-063619-dummy', auth=BearerAuth('token')).json()
print(var_lib_response)

Testing :

Pre-image : spacy library is not present in this cluster enter image description here

enter image description here

Post-image :

enter image description here

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Karthikeyan Rasipalay Durairaj