'Under which circumstances does Python wait for sub-processes of a started process?

Context

I have a backup application implemented in Python which allows starting shell scripts before, after etc. a backup is processed. This allows e.g. to mount taken file system snapshots to be backed up from. I have one use-case in which I need to start sub-shells or additional processes in one of those BEFORE-hooks, which need to stay alive all the time the backup gets processed. Though, the hook-script itself finishes at some point and the Python-app really needs to wait for the hook script itself to finish, so that my setup is completed and the backup possible at all.

Problem

The Python-app doesn't ever return after starting the hook and doesn't continue actually processing the backup. I'm somewhat sure that the hook-script itself really finishes, because I it's PID vanishes at some point and trying to kill the PID results in error messages about a missing process.

Additionally, I can see the running sub-shells and when I kill ALL of those, the Python-app does continue processing the backup. Though, because of the missing background processes, the results are not what I need.

Research

I've found one user having pretty much the same problem with Python wrongly waiting for sub-processes of the started process to finish. Though, that user claims that adding a special argument shell=True has solved the problem. Though, my app seems to provide that argument already, but still seems to wait for child processes.

Code

The following is how the hook script gets executed:

                execute.execute_command(
                    [command],
                    output_log_level=logging.ERROR
                    if description == 'on-error'
                    else logging.WARNING,
                    shell=True,
                )

The following is how the process gets started:

    process = subprocess.Popen(
        command,
        stdin=input_file,
        stdout=None if do_not_capture else (output_file or subprocess.PIPE),
        stderr=None if do_not_capture else (subprocess.PIPE if output_file else subprocess.STDOUT),
        shell=shell,
        env=environment,
        cwd=working_directory,
    )

    if not run_to_completion:
        return process

    log_outputs(
        (process,), (input_file, output_file), output_log_level, borg_local_path=borg_local_path
    )

The following is an excerpt of reading output of processes:

    buffer_last_lines = collections.defaultdict(list)
    process_for_output_buffer = {
        output_buffer_for_process(process, exclude_stdouts): process
        for process in processes
        if process.stdout or process.stderr
    }
    output_buffers = list(process_for_output_buffer.keys())
    # Log output for each process until they all exit.
    while True:
        if output_buffers:
            (ready_buffers, _, _) = select.select(output_buffers, [], [])
            for ready_buffer in ready_buffers:
                ready_process = process_for_output_buffer.get(ready_buffer)
                # The "ready" process has exited, but it might be a pipe destination with other
                # processes (pipe sources) waiting to be read from. So as a measure to prevent
                # hangs, vent all processes when one exits.
                if ready_process and ready_process.poll() is not None:
                    for other_process in processes:
                        if (
                            other_process.poll() is None
                            and other_process.stdout
                            and other_process.stdout not in output_buffers
                        ):
                            # Add the process's output to output_buffers to ensure it'll get read.
                            output_buffers.append(other_process.stdout)
                line = ready_buffer.readline().rstrip().decode()
                if not line or not ready_process:
                    continue
[...]
        still_running = False
        for process in processes:
            exit_code = process.poll() if output_buffers else process.wait()
[...]
        if not still_running:
            break
    # Consume any remaining output that we missed (if any).
    for process in processes:
        output_buffer = output_buffer_for_process(process, exclude_stdouts)
        if not output_buffer:
            continue
[...]

Educated guess

Looking at the above code, there are two possibilities from my point of view: Either invoking the shell script returns multiple process objects somehow, including those of the children and that would be the problem already. Or if only one process object for the hook script itself is returned, trying to read process output somehow unintendendly accesses those of the child processes as well, which simply don't return anything or even finish by design in my case.

Question

So, under which circumstances does Python wait for sub-processes of a started process?

Is that the case by default, only when not providing shell=True, even with providing that argument, depending on other conditions...?

Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source