'bash or zsh: how to pass multiple inputs to interactive piped parameters?

I have 3 different files that I want to compare

words_freq words_freq_deduped words_freq_alpha

For each file, I run a command like so, which I iterate on constantly to compare the results.

For example, I would do this:

$ cat words_freq | grep -v '[soe]'
$ cat words_freq_deduped | grep -v '[soe]'
$ cat words_freq_alpha | grep -v '[soe]'

and then review the results, and then do it again, with an additional filter

$ cat words_freq | grep -v '[soe]' | grep a | grep r | head -n20
a

$ cat words_freq_deduped | grep -v '[soe]' | grep a | grep r | head -n20
b

$ cat words_freq_alpha | grep -v '[soe]' | grep a | grep r | head -n20
c

This continues on until I've analyzed my data.

I would like to write a script that could take the piped portions, and pass it to each of these files, as I iterate on the grep/head portions of the command.

e.g. The following would dump the results of running the 3 commands above AND also compare the 3 results, and dump additional calculations on them

$ myScript | grep -v '[soe]' | grep a | grep r | head -n20
the letters were in all 3 runs, and it took 5 seconds
a
b
c

How can I do this using bash/python or zsh for the myScript part?

EDIT: After asking the question, it occurred to me that I could use eval to do it, like so, which I've added as an answer as well

The following approach allows me to process multiple files by using eval, which I know is frowned upon - any other suggestions are greatly appreciated!

$ myScript "grep -v '[soe]' | grep a | grep r | head -n20"

myScript

#!/usr/bin/env bash
function doIt(){
  FILE=$1
  CMD="cat $1 | $2"
  echo processing file "$FILE"
  eval "$CMD"
  echo
}

doIt words_freq "$@" 
doIt words_freq_deduped "$@" 
doIt words_freq_alpha "$@"


Solution 1:[1]

You can't avoid your shell from running pipes itself, so using it like that isn't very practical - you'd need to either quote everything and then eval it, which would make it hard to pass arguments with spaces, or quote every pipe, which you can then eval, making it so you have to quote every pipe. But yeah, these solutions are kinda hacky.

I'd suggest doing one of these two:

  1. Keep your editor open, and put whatever you want to run inside the doIt function itself before you run it. Then run it in your shell without any arguments:
#!/usr/bin/env bash

doIt() {
  # grep -v '[soe]' < "$1"
  grep -v '[soe]' < "$1" | grep a | grep r | head -n20
}

doIt words_freq
doIt words_freq_deduped
doIt words_freq_alpha

Or, you could always use a "for" in your shell, which you can use Ctrl+r to find in your history when you want to use:

$ for f in words_freq*; do grep -v '[soe]' < "$f" | grep a | grep r | head -n20; done

But if you really want your approach, I tried to make it accept spaces, but it ended up being even hackier:

#!/usr/bin/env bash

doIt() {
  local FILE=$1
  shift
  echo processing file "$FILE"
  local args=()

  for n in $(seq 1 $#); do
    arg=$1
    shift
    if [[ $arg == '|' ]]; then
      args+=('|')
    else
      args+=("\"$arg\"")
    fi
  done
  eval "cat '$FILE' | ${args[@]}"
}

doIt words_freq "$@" 
doIt words_freq_deduped "$@" 
doIt words_freq_alpha "$@"

With this version you can use it like this:

$ ./myScript grep "a a" "|" head -n1

Notice that it need you to quote the |, and that it now handles arguments with spaces.

Solution 2:[2]

Not fully understood problem correctly.

I understood you want to write a script without pipes, by including the filtering logic into the script. And feeding the filtering patterns as arguments.

Here is a gawk script (standard Linux awk).

With one sweep on 3 input files, without piping.

script.awk

BEGIN {
  RS="!@!@!@!@!@!@!@"; 
  # set record separator to something unlikely matched, causing each file to be read entirely as a single record
}
$0 !~ excludeRegEx      # if file does not match excludeRegEx
&& $0 ~ includeRegEx1   # and match includeRegEx1
&& $0 ~ includeRegEx2 { # and match includeRegEx2
  system "head -n20 "FILENAME; # call shell command "head -n20 " on current filename
}

Running script.awk

   awk -v excludeRegEx='[soe]' \
       -v includeRegEx1='a' \
       -v includeRegEx2='r' \
       -f script.awk words_freq words_freq_deduped words_freq_alpha

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Leonardo Dagnino
Solution 2 Dudi Boy