'How to create symlinks in a single directory with the lowest number of forks?

How to create symlinks in a single directory when:

  1. The common way fails:
ln -s /readonlyShare/mydataset/*.mrc .
-bash: /bin/ln: Argument list too long
  1. The find command doesn't allow the following syntax:
find /readonlyShare/mydataset -maxdepth 1 -name '*.mrc' -exec ln -s {} . +
  1. Using wild forking takes hours to complete:
find /readonlyShare/mydataset -maxdepth 1 -name '*.mrc' -exec ln -s {} . ';'


Solution 1:[1]

I was in a rush when I needed it so I didn't explore all possibilities but I worked-out something meanwhile

Thanks to @WeihangJian answer I now know that find ... | xargs -I {} ... is as bad as find ... -exec ... {} ';'.

A correct answer to my question would be:

find /readonlyShare/mydataset -maxdepth 1 -name '*.mrc' \
    -exec sh -c 'ln -s "$0" $@" .' {} +

Solution 2:[2]

find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -exec ln -s '{}' '+' .

or if you prefer xargs:

find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -print0 |
  xargs -0 -P0 sh -c 'ln -s "$@" .' sh

If you are using BSD xargs instead of GNU xargs, it can be simpler:

find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -print0 |
  xargs -0 -J@ -P0 ln -s @ .

Why '{}' '+'?

Quoted from man find:

-exec utility [argument ...] {} +
             Same as -exec, except that “{}” is replaced with as many pathnames as possible for each invocation of utility.  This behaviour is similar
             to that of xargs(1).  The primary always returns true; if at least one invocation of utility returns a non-zero exit status, find will
             return a non-zero exit status.

find is good at splitting large number of arguments:

find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -exec ruby -e 'pp ARGV.size' '{}' '+'
15925
15924
15925
15927
1835

Why not xargs -I?

It is not efficient and slow because -I executes the utility per argument, for example:

printf 'foo\0bar' | xargs -0 -I@ ruby -e 'pp ARGV' @
["foo"]
["bar"]
printf 'foo\0bar' | xargs -0 ruby -e 'pp ARGV'
["foo", "bar"]

xargs is also good at splitting large number of arguments

seq 65536 | tr '\n' '\0' | xargs -0 ruby -e 'pp ARGV.size'
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
536

Why sh -c?

Only BSD xargs have -J flag to put arguments in the middle of commands. For GNU xargs, we need the combination of sh -c and "$@" to do the same thing.

find -exec vs find | xargs

It depends but I would suggest use xargs when you want to utilize all your CPUs. xargs can execute utility parallelly by -P while find can't.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2