'Handling job/step exceptions with asynchronous TaskManager

I can't find a proper way to handle spring-batch exception in asynchronous context.

When I set a ThreadPoolTaskManager to my JobLauncher, the real job/step exception is not logged anymore. Instead the log will be something like:

org.springframework.batch.core.JobInterruptedException: Job interrupted by step execution
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:165)
at ...

I tried to resolve this adding a JobExecutionListener like this:

@Override
public void afterJob(JobExecution jobExecution) {
    List<Throwable> jobExceptions = jobExecution.getFailureExceptions();
    if (CollectionUtils.isNotEmpty(jobExceptions)) {
        Throwable lastJobException = jobExceptions.get(jobExceptions.size() - 1);
        LOGGER.error("Spring-Batch error at job level", lastJobException);
        String lastJobExceptionMessage = ExceptionUtils.getRootCauseMessage(lastJobException);

        // storing message in ExecutionContext for the batch-admin webapp
        String message = "";
        if (jobExecution.getExecutionContext().get(Consts.JOB_EXECUTION_MESSAGE_KEY) != null) {
            message = jobExecution.getExecutionContext().getString(Consts.JOB_EXECUTION_MESSAGE_KEY);
        }
        message += "\n" + lastJobExceptionMessage;
        jobExecution.getExecutionContext().put(Consts.JOB_EXECUTION_MESSAGE_KEY, message);
    }

}

But I still end with a JobInterruptedException. Is there a way to retrieve the initial cause of the interruption (might be a error in the reader/processor/writer code?



Solution 1:[1]

I don't think your diagnosis is correct. That exception is thrown with that error message only in SimpleStepHandler:

if (currentStepExecution.getStatus() == BatchStatus.STOPPING
        || currentStepExecution.getStatus() == BatchStatus.STOPPED) {
    // Ensure that the job gets the message that it is stopping
    execution.setStatus(BatchStatus.STOPPING);
    throw new JobInterruptedException("Job interrupted by step execution");
}

and only if the step itself didn't throw JobInterruptedException. The most obvious case where this can happen is if the job was stopped. See this example, whose output ends with

INFO: Executing step: [step1]
Feb 24, 2016 1:25:02 PM org.springframework.batch.core.repository.support.SimpleJobRepository checkForInterruption
INFO: Parent JobExecution is stopped, so passing message on to StepExecution
Feb 24, 2016 1:25:02 PM org.springframework.batch.core.step.ThreadStepInterruptionPolicy isInterrupted
INFO: Step interrupted through StepExecution
Feb 24, 2016 1:25:02 PM org.springframework.batch.core.step.AbstractStep execute
INFO: Encountered interruption executing step step1 in job myJob : Job interrupted status detected.
Feb 24, 2016 1:25:02 PM org.springframework.batch.core.repository.support.SimpleJobRepository checkForInterruption
INFO: Parent JobExecution is stopped, so passing message on to StepExecution
Feb 24, 2016 1:25:02 PM org.springframework.batch.core.job.AbstractJob execute
INFO: Encountered interruption executing job: Job interrupted by step execution
Feb 24, 2016 1:25:02 PM org.springframework.batch.core.launch.support.SimpleJobLauncher$1 run
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [STOPPED]
Status is: STOPPED

This other example shows that throwing an exception when using a thread pool changes nothing. The final output is

INFO: Executing step: [step1]
Feb 24, 2016 1:28:44 PM org.springframework.batch.core.step.AbstractStep execute
SEVERE: Encountered an error executing step step1 in job myJob
java.lang.RuntimeException: My exception
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
(...)

Feb 24, 2016 1:28:44 PM org.springframework.batch.core.launch.support.SimpleJobLauncher$1 run
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [FAILED]
Status is: FAILED, job execution id 0
  #1 step1 FAILED
Step step1
java.lang.RuntimeException: My exception
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
(...)

Solution 2:[2]

So the answer was so simple than I felt really stupid when I understood it : @Artefacto was right. The job was stopped. By the end of the process. Because it reached the end of the main() method.

When I switch to an asynchronous mode with my ThreadPoolTaskManager, I forgot to add one very important line to my main method:

    // Wait for the end of the JobExecution
    main.endOfJobLatch.await();

Hope this answer will help someone else...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Artefacto
Solution 2 Cyril Dejonghe