'azurerm_kubernetes_cluster - AKS encountered an internal error while attempting the requested
Im trying to create an AKS cluster with a default nodepool.
Terraform version.: 1.1.7
AzureRM version: 2.99 ( After an update the behavior is the same on AzureRM 3.5)
First error message
Failure sending request: StatusCode=400 -- Original Error: Code="SubnetsAssociatedWithNATgatewayWhenOutboundTypeIsStandardLoadBalancer"
After that error I introduced the outbound_type = "userAssignedNATGateway"
Unfortunately it doesnt help. I get the follwoing error message
waiting for creation of Cluster: (Managed Cluster Name "managed-cluster" / Resource Group "managed-cluster-rg"): Code="CreateVMSSAgentPoolFailed" Message="AKS encountered an internal error while attempting the requested Creating operation. AKS will continuously retry the requested operation until successful or a retry timeout is hit. Check back to see if the operation requires resubmission.
I tried it many times. I Checked the resource group activity log also but it looks everything is fine.
The code without outbound type is worked around mid of April. Is something changed on the azure side?
resource "azurerm_kubernetes_cluster" "aks" {
depends_on = [azurerm_subnet_nat_gateway_association.aks_nat_gw]
name = var.prefix
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
node_resource_group = "${var.prefix}-nodes-rg"
dns_prefix = var.prefix
sku_tier = "Paid"
private_cluster_enabled = false
kubernetes_version = data.azurerm_kubernetes_service_versions.current.latest_version
automatic_channel_upgrade = "patch"
auto_scaler_profile {
balance_similar_node_groups = true
max_graceful_termination_sec = "600"
scale_down_utilization_threshold = "0.7"
skip_nodes_with_local_storage = false # Nodes should use local storage only as cache or temporary not as persistent storage
skip_nodes_with_system_pods = false
}
addon_profile {
http_application_routing {
enabled = false
}
kube_dashboard {
enabled = false
}
azure_policy {
enabled = true
}
}
default_node_pool {
name = "system"
os_disk_size_gb = var.system_pool_config.disk_size
os_disk_type = var.system_pool_config.os_disk_type
vm_size = var.system_pool_config.vm_size
min_count = var.system_pool_config.min_count
max_count = var.system_pool_config.max_count
vnet_subnet_id = azurerm_subnet.subnet.id
tags = var.tags
only_critical_addons_enabled = true
enable_auto_scaling = true
enable_host_encryption = true
max_pods = 30 # Default | Changing this will require more IPS check the subnet and change the max node count accordingly
availability_zones = [1, 2, 3]
orchestrator_version = var.system_pool_config.orchestrator_version
upgrade_settings {
max_surge = var.system_pool_config.max_surge
}
}
identity {
type = "SystemAssigned"
}
linux_profile {
admin_username = "kubernetes"
ssh_key {
key_data = tls_private_key.node-ssh-key.public_key_openssh
}
}
role_based_access_control {
enabled = true
azure_active_directory {
managed = "true"
admin_group_object_ids = var.admin_group_object_ids
}
}
network_profile {
network_plugin = "azure"
network_mode = "transparent"
network_policy = "calico"
service_cidr = "10.0.0.0/16"
dns_service_ip = "10.0.0.10"
docker_bridge_cidr = "172.17.0.1/16"
load_balancer_sku = "standard"
load_balancer_profile {
outbound_ports_allocated = 0
idle_timeout_in_minutes = 4
managed_outbound_ip_count = 1
}
outbound_type = "userAssignedNATGateway"
}
tags = var.tags
}
There is a Gateway already attached to the subnet
I opened a github issue also but no response: https://github.com/hashicorp/terraform-provider-azurerm/issues/16712
Solution 1:[1]
pls follow this workflow here that describes how-to create create an AKS cluster with a user-assigned NAT Gateway. You just need to move this into Terraform.
Basically the userAssignedNATGateway needs a managed identity (see step 2 in the workflow) instead using a SystemAssigned identity. You additionally need to assign the new managed identity the Network Contributor Role & Monitoring Metrics Publisher:
# Create Managed identity
resource "azurerm_user_assigned_identity" "example" {
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
name = "natclusterid"
}
# Role assignment to be able to manage the virtual network
resource "azurerm_role_assignment" "aks_vnet_contributor" {
scope = azurerm_resource_group.example.name
role_definition_name = "Network Contributor"
principal_id = azurerm_user_assigned_identity.example.principal_id
skip_service_principal_aad_check = true
}
# Role assignment to publish metrics
resource "azurerm_role_assignment" "aks_metrics_publisher" {
scope = azurerm_kubernetes_cluster.aks.id
role_definition_name = "Monitoring Metrics Publisher"
principal_id = azurerm_user_assigned_identity.example.principal_id
skip_service_principal_aad_check = true
}
# Create AKS with managed identity
resource "azurerm_kubernetes_cluster" "aks" {
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.example.id]
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Philip Welz |
