'Airflow - Why is the externalDatabase configuration breaking helm upgrade?
I am trying to deploy Airflow using Helm charts, for a personal POC, but I have been facing some issues with the deployment and could not find clear instructions to solve my issue - which is why I am seeking help here.
Context around the issue
A bit of background of the POC first - I want to deploy a K8S cluster that hosts airflow, connect it to a git repo that hosts the dags and has the metastore and cache hosted externally from K8S Airflow.
I have successfully deployed Airflow to a local Kubernetes cluster using kind and Airflow's default helm chart. On the helm chart I have specified that the executor mode to be used must be KubernetesExecutor.
I have also configured Airflow to sync up the DAGs to/from a bitbucket repository.
The issue and current implementation
I am having issues with connecting Airflow with the external services - I have created an Azure PostgreSQL server, created a airflow database, created a admin user on the psql as follows:
CREATE DATABASE airflow;
CREATE USER aflw_admin WITH PASSWORD 'some_password';
GRANT ALL PRIVILEGES ON DATABASE airflow TO aflw_admin;
ALTER USER aflw_admin SET search_path = public;
Since I am using helm to deploy, I have my values.yaml as follows:
postgresql:
enabled: false
externalDatabase:
type: postgres
host: dbname.postgres.database.azure.com
port: 5432
database: airflow
user: aflw_admin
passwordSecretKey: "postgresql-password"
data:
metadataSecretName: ~
resultBackendSecretName: ~
metadataConnection:
user: aflw_admin
pass: some_password
protocol: postgresql
host: dbname.postgres.database.azure.com
port: 5432
db: airflow
sslmode: require
resultBackendConnection:
user: aflw_admin
pass: some_password
protocol: postgresql
host: dbname.postgres.database.azure.com
port: 5432
db: airflow
sslmode: require
The secret postgresql-password was created by the following:
kubectl create secret generic airflow-postgresql --from-literal=postgresql-password=$(openssl rand -base64 13) --namespace airflow
I deployed the solution using:
kubectl apply -f ./helm/variables.yaml
helm upgrade --install airflow apache-airflow/airflow -n airflow -f ./values.yaml --debug
What I have tried and problem details
After some back and forward, I figured out that by reverting the configuration - aka, setting postgresql enabled to true and removing the metadataConnection, resultBackendConnection and externalDatabase sections from the values.yaml file - I could deploy the postgres service successfully but with the tradeoff that the postgresql is NOT a external service, which helps to at least partially isolate the problem.
So, if I go back to the initial configuration and try to deploy it, the results I get are :
- first I get a timeout --> in order to face this issue, I naturally increased the timeout duration to a bigger value like
20m0s; - after I increase the timeout I get an error
BackoffLimitExceededand nothing is deployed.
Here's a log of the helm deployment in question:
history.go:56: [debug] getting history for release airflow
upgrade.go:142: [debug] preparing upgrade for airflow
upgrade.go:150: [debug] performing update for airflow
upgrade.go:322: [debug] creating upgraded release for airflow
client.go:218: [debug] checking 20 resources for changes
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-create-user-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-migrate-database-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-scheduler"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-triggerer"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-webserver"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-worker"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-airflow-metadata"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-webserver-secret-key"
client.go:501: [debug] Looks like there are no changes for ConfigMap "airflow-airflow-config"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-launcher-role"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-log-reader-role"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-launcher-rolebinding"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-log-reader-rolebinding"
client.go:501: [debug] Looks like there are no changes for Service "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for Service "airflow-webserver"
client.go:510: [debug] Patch Deployment "airflow-scheduler" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-statsd" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-triggerer" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-webserver" in namespace airflow
client.go:267: [debug] Deleting Secret "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: secrets "airflow-postgresql" not found
client.go:267: [debug] Deleting Service "airflow-postgresql-headless" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql-headless", err: services "airflow-postgresql-headless" not found
client.go:267: [debug] Deleting Service "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: services "airflow-postgresql" not found
client.go:267: [debug] Deleting StatefulSet "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: statefulsets.apps "airflow-postgresql" not found
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 20m0s
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
upgrade.go:433: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: job failed: BackoffLimitExceeded
Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded
helm.go:84: [debug] post-upgrade hooks failed: job failed: BackoffLimitExceeded
UPGRADE FAILED
main.newUpgradeCmd.func2
helm.sh/helm/v3/cmd/helm/upgrade.go:199
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:974
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:902
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:255
runtime.goexit
runtime/asm_amd64.s:1581
make: *** [Makefile:46: deploy-airflow] Error 1
This behavior leads me to think that it is some sort of configuration error, but I can't pinpoint what.
What can I be misconfiguring in my helm chart that could have broken the helm upgrade?
The versions for helm/airflow/psql are below:
- Airflow -> apache/airflow:2.2.3
- Helm chart -> default image with version 1.4.0 (https://artifacthub.io/packages/helm/apache-airflow/airflow)
- PSQL (on azure) -> Azure Database for PostgreSQL flexible server, PSQL version 13.4
Solution 1:[1]
It's sometimes difficult to diagnose such problems because there are so many moving parts. Although, I have set up Airflow on Azure AKS (Postgres sslmode: require) and AWS EKS (RDS Postgres sslmode: disable), each with their own issues.
Perhaps remove config for externalDatabase and resultBackendConnection. Why? - because resultBackendConnection will use metadataConnection if it is not configured. I haven't seen the externalDatabase key in my current config file v2.2.4. Are you using -f values.yaml to override helm install with the correct values.yaml?
If you disable postgresql
postgresql:
enabled: false
as you have, then you need to configure metadataConnection for your external database.
I only configured the metadataSecretName once I had the connection to Postgres working properly.
Also, try sslmode: disable in the metadataConnection config.
Once I got the config file the way I wanted it, I un-installed Airflow then re-installed:
- helm delete airflow
- dropped airflow db and re-created it
- kubectl delete secrets [airflow-xxxxx-xxxxx] All the secrets in the namespace, because the db migration didn't work properly inbetween the errors.
- kubectl delete pvc (then made sure the pv's were also deleted)
After that I re-installed and it was all good, which is not a lot of work, but ensures I can re-deploy with the correct values.
Oh, and do remember to set up PSQL and make sure you can actually connect from your command line as an extra check.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
