'how to pass output param of glue job to step function and again pass as input param for another glue job in step function

My first glue job has code as :

    import boto3
    from awsglue.utils import getResolvedOptions
    import sys
    
    s3_path="s3://bucketname/filename"

My second glue job:

    import boto3
    from awsglue.utils import getResolvedOptions
    import sys
    
    args = getResolvedOptions(sys.argv,['s3_path'])
    s3_path = args['s3_path']
    print(args,s3_path)

My step function has definition as:

    {
      "Comment": "A description of my state machine",
      "StartAt": "Glue StartJobRun",
      "States": {
        "Glue StartJobRun": {
          "Type": "Task",
          "Resource": "arn:aws:states:::glue:startJobRun.sync",
          "Parameters": {
            "JobName": "test1",
            "Arguments": {
              "--s3_path.$": "$.s3_path"
            }
          },
          "Next": "Glue StartJobRun2"
        },
        "Glue StartJobRun2": {
          "Type": "Task",
          "Resource": "arn:aws:states:::glue:startJobRun.sync",
          "Parameters": {
            "JobName": "test2",
            "Arguments": {
              "--s3_path.$": "$.s3_path"
            }
          },
          "End": true
        }
      }
    }

For first glue I am getting that input from lambda via step function, but for second glue job code I need to get input from first glue job after running it. Can anyone please answer how to pass the glue job output to step function and run the second glue in same step function by passing first glue job output as input?



Solution 1:[1]

For second glue job code I need to get input from first glue job after running it

It seems you want to use the same input for both job 1 and job 2 so that they can use the same s3_path argument. The easiest way to do that is to set job 1's ResultPath to null so that job 1's result will be discarded and job 2 can use the same input as job 1.

You can also select job 1 output's $.Arguments.--s3_path (with ResultSelector) and set it to $.s3_path (using ResultPath), but the result would be the same as the easier method above.

Can anyone please answer how to ... passing first glue job output as input?

Therefore, you probably want to discard first glue job's output instead of passing it to the downstream. Hope it helps.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user13451354