'Aws step functions - Resume from failed step function activity instead of starting a new execution

I created a step function with 4 different activities which run one after other and also integrated to trigger this step function from a java application. Flow looks like this.

Start -> Activity1 -> Activity2 -> Activity3 -> Activity4 -> Stop

When an execution fails during some activity, let's say Activity2, the execution is marked as failure.

Now, is there anyway to resume this failed execution from the activity(Activity2) during which it failed earlier instead of starting a new execution?

I went through the operations which are possible through AWSStepFunctions but none seem to solve this requirement. https://docs.aws.amazon.com/step-functions/latest/apireference/API_Operations.html

Solution 1:[1]

The public documentation for error handling with AWS Step Functions contains a section about fallback states. This allows you to specify logic around handling Task failures, and proceed to another state based on the error observed.

The following Catch field redirects the flow of the execution based on the error observed from the associated Task state:

"Catch": [ {
   "ErrorEquals": [ "java.lang.Exception" ],
   "ResultPath": "$.error-info",
   "Next": "RecoveryState"
}, {
   "ErrorEquals": [ "States.ALL" ],
   "Next": "EndState"
} ]

Solution 2:[2]

This should solve your issue .

This is describe in details here

Parse the execution history of the failed execution to find the name of the state at which it failed, as well as the JSON input to that state.

Create a new state machine, which adds an additional state to failed state machine, called “GoToState”. “GoToState” is a choice state at the beginning of the state machine that branches execution directly to the failed state, allowing you to skip states that had succeeded in the previous execution.

Solution 3:[3]

Checkout this resource, which implements the solution from the blog posted linked by Atharv:


python3 gotostate.py --failedExecutionArn 'my-arn'

You use the script to generate a new Step Function that will start off at the failed step. The new Step Function takes the input to the failed step as its input with an additional "resuming": true field.


This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aws-jordan
Solution 2 Atharv Thakur
Solution 3