'Go exec.CommandContext is not being terminated after context timeout

In golang, I can usually use context.WithTimeout() in combination with exec.CommandContext() to get a command to automatically be killed (with SIGKILL) after the timeout.

But I'm running into a strange issue that if I wrap the command with sh -c AND buffer the command's outputs by setting cmd.Stdout = &bytes.Buffer{}, the timeout no longer works, and the command runs forever.

Why does this happen?

Here is a minimal reproducible example:

package main

import (
    "bytes"
    "context"
    "os/exec"
    "time"
)

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
    defer cancel()

    cmdArgs := []string{"sh", "-c", "sleep infinity"}
    bufferOutputs := true

    // Uncommenting *either* of the next two lines will make the issue go away:

    // cmdArgs = []string{"sleep", "infinity"}
    // bufferOutputs = false

    cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
    if bufferOutputs {
        cmd.Stdout = &bytes.Buffer{}
    }
    _ = cmd.Run()
}

I've tagged this question with Linux because I've only verified that this happens on Ubuntu 20.04 and I'm not sure whether it would reproduce on other platforms.



Solution 1:[1]

My issue was that the child sleep process was not being killed when the context timed out. The sh parent process was being killed, but the child sleep was being left around.

This would normally still allow the cmd.Wait() call to succeed, but the problem is that cmd.Wait() waits for both the process to exit and for outputs to be copied. Because we've assigned cmd.Stdout, we have to wait for the read-end of the sleep process' stdout pipe to close, but it never closes because the process is still running.

In order to kill child processes, we can instead start the process as its own process group leader by setting the Setpgid bit, which will then allow us to kill the process using its negative PID to kill the process as well as any subprocesses.

Here is a drop-in replacement for exec.CommandContext I came up with that does exactly this:

type Cmd struct {
    ctx context.Context
    *exec.Cmd
}

// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
    return &Cmd{ctx, exec.Command(command, args...)}
}

func (c *Cmd) Start() error {
    // Force-enable setpgid bit so that we can kill child processes when the
    // context times out or is canceled.
    if c.Cmd.SysProcAttr == nil {
        c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
    }
    c.Cmd.SysProcAttr.Setpgid = true
    err := c.Cmd.Start()
    if err != nil {
        return err
    }
    go func() {
        <-c.ctx.Done()
        p := c.Cmd.Process
        if p == nil {
            return
        }
        // Kill by negative PID to kill the process group, which includes
        // the top-level process we spawned as well as any subprocesses
        // it spawned.
        _ = syscall.Kill(-p.Pid, syscall.SIGKILL)
    }()
    return nil
}

func (c *Cmd) Run() error {
    if err := c.Start(); err != nil {
        return err
    }
    return c.Wait()
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1