'Gather cluster info from multiple databricks instances
I have admin access in our azure portal, which I can use to log in to any databricks instance we have. I want to programmatically gather info from each workspace, but I have not logged in to every databricks instance like most guides expect you to do (to get an access token). I want to identify databricks clusters across all workspaces that are not using VM SKUs in our reserved instances.
How can I gather a json file for analysis that contains:
- all databricks workspace names
- all databricks clusters in each workspace, especially the cluster VM SKU
- all VMs running for each cluster, including VM SKU and utilization metrics
without having to log in to each workspace, instead just using my MS credential?
Solution 1:[1]
The az-cli databricks module, and the databricks CLI itself won't be enough. Thankfully, we can use an access token to access the REST api for each workspace without having to manually log in to each.
Here's a script that loops through each subscription, finds the workspaces, and gathers the VM info.
It will dump it to a data.json file which can be used for analysis.
Write-Progress "Acquiring access token";
# https://stackoverflow.com/a/62707371/11141271
$databricksResourceId="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"; # Programmatic ID for Azure Databricks
$token = az account get-access-token --resource $databricksResourceId --query accessToken --output tsv;
$subs = az account list | ConvertFrom-Json;
$total=New-Object -TypeName System.Collections.ArrayList;
$subIndex=0;
foreach ($sub in $subs)
{
$subIndex++;
Write-Progress "Gathering databricks instances" -Status $sub.name -PercentComplete ($subIndex / $subs.Count * 100) -Id 0;
$workspaces = az databricks workspace list --subscription $sub.id --output json | ConvertFrom-Json;
$i=0;
foreach ($workspace in $workspaces)
{
$i++;
Write-Progress "Gathering instance data" -Status $workspace.name -PercentComplete ($i/$workspaces.Count*100) -Id 1 -ParentId 0;
$headers=@{
"Authorization"= "Bearer " + $token;
"X-Databricks-Azure-Workspace-Resource-Id" = $workspace.Id;
}
$url="https://$($workspace.workspaceUrl)/api/2.0/clusters/list"
Write-Progress "Gathering clusters" -ParentId 1 -Id 2;
$clusters=Invoke-RestMethod -Method Get -Uri $url -Headers $headers -ContentType "application/json" -UseBasicParsing
Write-Progress "Gathering VMs" -ParentId 1 -Id 2;
$workspace.managedResourceGroupId -match "/resourceGroups/(.+)$" > $null;
$managedResourceGroupName = $matches.1;
$vms = az vm list --resource-group $managedResourceGroupName --subscription $sub.id | ConvertFrom-Json;
$vmData=New-Object -TypeName System.Collections.ArrayList;
$j=0;
foreach ($vm in $vms)
{
$j++;
Write-Progress "Gathering VM metrics" -ParentId 2 -Id 3 -PercentComplete ($j / $vms.Count * 100);
$metrics = az monitor metrics list `
--resource $vm.id `
--metrics "Percentage CPU" "Available Memory Bytes" "Disk Write Operations/sec" "Disk Read Operations/sec" "Disk Read Bytes" "Disk Write Bytes" `
--aggregation "Average" "Maximum" "Minimum" "Total" `
--interval "15m" `
--offset "1h" `
--subscription $sub.id `
| ConvertFrom-Json;
$vmData.Add(
@{
vm = $vm
metrics = $metrics
}
) > $null;
}
Write-Progress "Writing progress" -ParentId 1 -Id 2;
$total.Add(
@{
subscriptionId=$sub.id
subscriptionName=$sub.name
workspace=$workspace
clusters=$clusters.clusters
vms=$vmData
}
) > $null;
$total | ConvertTo-Json -Depth 12 > data.json;
}
}
See-also
Error 403 User not authorized when trying to access Azure Databricks API through Active Directory
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TeamDman |
