'How to pull my own private repository's docker image using airflow DockerOperator?
I'm using Airflow install via Docker image in AWS instance, and I have created a docker image of my project and pushed it to the GitLab container registry. Now I want to pull this image in airflow to run daily, I know when we pull our own private image we have to authenticate So How I can login using airflow dag, file, or any method to resolve this problem. My code
stripetos3_scheduler = DockerOperator(
task_id='stripe-to-s3',
image='registry.gitlab.com/mobinalhassan/stripetos3dags:latest',
auto_remove=True,
force_pull=True,
dag=dag
)
Solution 1:[1]
Use imagePullSecrets mentioned here https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
Create a Secret like this
apiVersion: v1
kind: Secret
metadata:
name: {{.Release.Name}}-image-pull-secret
namespace: {{.Release.Namespace}}
data:
.dockerconfigjson: {{ template "dockerConfigTemplate" . }}
type: kubernetes.io/dockerconfigjson
Create a template which is used by above Secret Component
{{- define "dockerConfigTemplate" }}
{{- if .Values.images.airflow.registry }}
{{- $url := .Values.images.airflow.registry.url }}
{{- $name := .Values.images.airflow.registry.username }}
{{- $password := .Values.images.airflow.registry.password }}
{{- $email := .Values.images.airflow.registry.email }}
{{- $auth := (printf "%s:%s" $name $password | b64enc) }}
{{- printf "{\"auths\":{\"%s\":
{\"username\":\"%s\",\"password\":\"%s\",\"email\":\"%s\",\"auth\":\"%s\"}}}"
$url $name $password $email $auth | b64enc }}
{{- end }}
{{- end }}
In my case I'm deploying Airflow (Custom Image which has Dags as well) from my Private Registry
I'm exporting this Secret as a Env Variable which can be used by other components
apiVersion: v1
kind: Pod
metadata:
name: worker-pod
spec:
containers:
- args: []
command: []
env:
- name: OI_DATAPIPELINE_IMAGE_PULL_SECRET
value: {{.Release.Name}}-pipeline-image-pull-secret
Now we need to use the above created secret in our Dags as follows
data_pipeline = KubernetesPodOperator(
namespace='default',
name="DataPipeline",
task_id="data_pipeline",
image='*********.jfrog.io/*****:latest',
image_pull_secrets=
[k8s.V1LocalObjectReference('OI_DATAPIPELINE_IMAGE_PULL_SECRET')],
env_from=env_from,
cmds=["./deployments/data_pipeline/start.sh"],
get_logs=True,
is_delete_operator_pod=True,
dag=dag
)
Solution 2:[2]
You should create new connection of Docker type via Airflow UI and provide necessary data there:
- GitLab registry server (not sure about GitLab, but example for DockerHub is
docker.io) - Username
- Password
Then in your DAG definition you need to pass connection's name to docker_conn_id param.
stripetos3_scheduler = DockerOperator(
task_id='stripe-to-s3',
image='registry.gitlab.com/mobinalhassan/stripetos3dags:latest',
auto_remove=True,
force_pull=True,
dag=dag,
docker_conn_id="my-connection",
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | dzejeu |
