'Download from s3 into a actions workflow
I'm working on 2 github actions workflows:
- Train a model and save it to s3 (monthly)
- Download the model from s3 and use it in predictions (daily)
Using https://github.com/jakejarvis/s3-sync-action I was able to complete the first workflow. I train a model and then sync a dir, 'models' with a bucket on s3.
I had planned on using the same action to download the model for use in prediction but it looks like this action is one directional, upload only no download.
I found out the hard way by creating a workflow and attempting to sync with the runner:
retreive-model-s3:
runs-on: ubuntu-latest
steps:
- name: checkout current repo
uses: actions/checkout@master
- name: make dir to sync with s3
run: mkdir models
- name: checkout s3 sync action
uses: jakejarvis/s3-sync-action@master
with:
args: --follow-symlinks
env:
AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_S3_ENDPOINT: ${{ secrets.AWS_S3_ENDPOINT }}
AWS_REGION: 'us-south' # optional: defaults to us-east-1
SOURCE_DIR: 'models' # optional: defaults to entire repository
- name: dir after
run: |
ls -l
ls -l models
- name: Upload model as artifact
uses: actions/upload-artifact@v2
with:
name: xgb-model
path: models/regression_model_full.rds
At the time of running, when I login to the UI I can see the object regression_model_full.rds
is indeed there, it's just not downloading. I'm still unsure if this is expected or not (the name of the action 'sync' is what's confusing me).
For our s3 we must use the parameter AWS_S3_ENDPOINT
. I found another action, AWS S3
here but unlike the sync action I started out with there's no option to add AWS_S3_ENDPOINT. Looking at the repo too it's two years old except a update tot he readme 8 months ago.
What's the 'prescribed' or conventional way to download from s3 during a workflow?
Solution 1:[1]
Soo I had the same problem as you. I was trying to download from S3 to update a directory folder in GitHub.
What I learned from actions is if you're updating some files in the repo you must follow the normal approach as if you were doing it locally eg) checkout, make changes, push.
So for your particular workflow you must checkout your repo in the workflow using actions/checkout@master
and after you sync with a particular directory the main problem I was not doing was then pushing the changes back to the repo! This allowed me to update my folder daily.
Anyway, here is my script and hope you find it useful. I am using the AWS S3 action you mention towards the end.
# This is a basic workflow to help you get started with Actions
name: Fetch data.
# Controls when the workflow will run
on:
schedule:
# Runs "at hour 6 past every day" (see https://crontab.guru)
- cron: '00 6 * * *'
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: keithweaver/[email protected] # Verifies the recursive flag
name: sync folder
with:
command: sync
source: ${{ secrets.S3_BUCKET }}
destination: ./data/
aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws_region: ${{ secrets.AWS_REGION }}
flags: --delete
- name: Commit changes
run: |
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git add .
git diff-index --quiet HEAD || git commit -m "{commit message}" -a
git push origin main:main
Sidenote: the flag --delete
allows you to keep your current folder up to date with your s3 folder by deleting any files that are not present in your s3 folder anymore
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |