'Why does xtrace show piped commands executing out of order?

Here's a simple reproducer:

cat >sample_pipeline.sh << EOF                                                                                 
set -x
head /dev/urandom | awk '{\$2=\$3=""; print \$0}' | column -t | grep 123 | wc -l >/dev/null
EOF
watch -g -d -n 0.1 'bash sample_pipeline.sh 2>&1 | tee -a /tmp/watchout'
wc -l /tmp/watchout
tail /tmp/watchout

As expected, usually the commands execute in the order they are written in:

+ head /dev/urandom
+ awk '{$2=$3=""; print $0}'
+ column -t
+ grep 123
+ wc -l

...but some of the time the order is different, e.g. awk before head:

+ awk '{$2=$3=""; print $0}'
+ head /dev/urandom
+ column -t
+ grep 123
+ wc -l

I can understand if the shell pre-spawns processes waiting for input, but why wouldn't it spawn them in order?

Replacing bash with dash (the default Ubuntu shell) seems to have the same behavior.



Solution 1:[1]

When you write a pipeline like that, the shell will fork off a bunch of child processes to run all of the subcommands. Each child process will then print the command it is executing just before it calls exec. Since each child is an independent process, the OS might schedule them in any order, and various things going on (cache misses/thrashing between CPU cores) might delay some children and not others. So the order the messages come out is unpredictable.

Solution 2:[2]

This happens because of how pipelines are implement. The shell will first fork N subshells, and each subshell will (when scheduled) print out its xtrace output and invoke its command. The order of the output is therefore the result of a race.

You can see a similar effect with

for i in {1..5}; do echo "$i" & done

Even though each echo command is spawned in order, the output may still be 3 4 5 2 1.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 that other guy