'Download from s3 into a actions workflow

I'm working on 2 github actions workflows:

  1. Train a model and save it to s3 (monthly)
  2. Download the model from s3 and use it in predictions (daily)

Using https://github.com/jakejarvis/s3-sync-action I was able to complete the first workflow. I train a model and then sync a dir, 'models' with a bucket on s3.

I had planned on using the same action to download the model for use in prediction but it looks like this action is one directional, upload only no download.

I found out the hard way by creating a workflow and attempting to sync with the runner:

  retreive-model-s3:
    runs-on: ubuntu-latest
    steps:
      - name: checkout current repo
        uses: actions/checkout@master
      - name: make dir to sync with s3
        run: mkdir models
      - name: checkout s3 sync action
        uses: jakejarvis/s3-sync-action@master
        with:
          args: --follow-symlinks
        env:
          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_S3_ENDPOINT: ${{ secrets.AWS_S3_ENDPOINT }}
          AWS_REGION: 'us-south'  # optional: defaults to us-east-1
          SOURCE_DIR: 'models'    # optional: defaults to entire repository
      - name: dir after
        run: |
          ls -l
          ls -l models
      - name: Upload model as artifact
        uses: actions/upload-artifact@v2
        with:
          name: xgb-model
          path: models/regression_model_full.rds

At the time of running, when I login to the UI I can see the object regression_model_full.rds is indeed there, it's just not downloading. I'm still unsure if this is expected or not (the name of the action 'sync' is what's confusing me).

For our s3 we must use the parameter AWS_S3_ENDPOINT. I found another action, AWS S3 here but unlike the sync action I started out with there's no option to add AWS_S3_ENDPOINT. Looking at the repo too it's two years old except a update tot he readme 8 months ago.

What's the 'prescribed' or conventional way to download from s3 during a workflow?



Solution 1:[1]

Soo I had the same problem as you. I was trying to download from S3 to update a directory folder in GitHub.

What I learned from actions is if you're updating some files in the repo you must follow the normal approach as if you were doing it locally eg) checkout, make changes, push.

So for your particular workflow you must checkout your repo in the workflow using actions/checkout@master and after you sync with a particular directory the main problem I was not doing was then pushing the changes back to the repo! This allowed me to update my folder daily.

Anyway, here is my script and hope you find it useful. I am using the AWS S3 action you mention towards the end.

# This is a basic workflow to help you get started with Actions

name: Fetch data.

# Controls when the workflow will run
on:
  schedule:
    # Runs "at hour 6 past every day" (see https://crontab.guru)
    - cron: '00 6 * * *'
    
  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2
      - uses: keithweaver/[email protected] # Verifies the recursive flag
        name: sync folder
        with:
          command: sync
          source: ${{ secrets.S3_BUCKET }}
          destination: ./data/
          aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws_region: ${{ secrets.AWS_REGION }}
          flags: --delete
      - name: Commit changes
        run: |
         git config --local user.email "[email protected]"
         git config --local user.name "GitHub Action"
         git add .
         git diff-index --quiet HEAD || git commit -m "{commit message}" -a
         git push origin main:main

Sidenote: the flag --delete allows you to keep your current folder up to date with your s3 folder by deleting any files that are not present in your s3 folder anymore

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1