'How to get full paths recursively in UNIX? [closed]

I'm looking for a way to recursively get paths to all files in a given directory in UNIX. (without using find)

EXAMPLE:

Given a tree like this

lab_assignment:
file1.txt
file2.txt
subdir1
subdir2
./subdir1:
file11.txt
./subdir2:
file21.txt

I need a command which would list paths to all files contained in lab_assignment recursively.

./file1.txt
./file2.txt
./subdir1/file11.txt
./subdir2/file21.txt

I found this in an assignment, so the toolset was purposely limited. I'm aware you can do it easily with the find command, but this assignment didn't allow the use of find, so there must be a way to do it without find, but I couldn't come up with one.

Teacher told us it was possible to achieve this using only ls, quotation, and maybe pipes and grep.

UPDATE:

I faced this problem in a recent assignment, although it wasn't the primary focus of it. Because of this I managed to avoid the problem altogether, but later found myself curious of what the proper solution for it was.

Solution for this problem is used in tasks like:
Recursively output conetnts of files, names of which end with .txt
Recursively count the amount of lines in all files, names of which start with f

Utilities like cat and wc work with filenames provided in their stdin and don't have recursive functionality buil-in, so you have to provide a list of paths to files.

The Ugly Way

I decided to avoid the problem if possible and did this:

cat *.txt */*.txt */*/*.txt  
wc -l f* */f* */*/f*`  

This worked. The teacher seemed quite displeased, calling this method messy and ugly, but he accepted my report. I was left curious of how should I have done this.

The Broken Way

After bugging the teacher for over a month, he agreed to show me the corrcet way one would have done this.

He typed this:

cat `ls -R $PWD`

This seemed to only cause errors and didn't create anything like the required result.

He then came up with:

cat $PWD/`ls -R`

This thing did at least something, but still - not even close to the required result.
The teacher then told me that it was his first year giving this course, which was designed a long time ago by different division of the uni, and that he, as a UNIX user, would just do it with find and he doesn't know the solution
but he swears must have seen it somewhere in the design docs for the course, or somewhere...

So, is there a way to get a recursive list of filepaths without find? What clever piece of UNIX-trickery and mind gymnastics is the key for this?



Solution 1:[1]

——— Using globstar ———

I need a command which would list paths to all files [...] recursively.
[...]
The command should be as simple as possible.

When you have bash > 4.0 and there is at least one file in the current directory, you can use

shopt -s globstar
printf ./%s\\n **

When the working directory can be empty, use

shopt -s globstar nullglob
a=(**)
(( ${#a[@]} > 0 )) && printf ./%s\\n "${a[@]}"

And to solve the explicit assignments

Recursively output contents of files, names of which end with .txt

shopt -s globstar
cat **/*.txt

Recursively count the amount of lines in all files, names of which start with f

shopt -s globstar
wc -l **/f*

Note that **/* also matches files in the working directory. The expanded list may or may not have paths with / inside.


——— Using ls/grep ———

Teacher told us it was possible to achieve this using only ls, quotation, and maybe pipes and grep

I don't think so, at least not reliably. If any file/directory name contains a line break, there is no way to make it work using only the mentioned mechanisms.

If you can make assumptions like »no path contains a newline« or even »no path contains whitespaces« then the assignment becomes solvable. However, I couldn't find a solution that uses ls, since ls never outputs full paths and we are missing the tools (for instance sed, recursion, or a loop) to build full paths from its output.

List paths of all files (but not directories)

grep -RLE '$^'

-R applies grep to all files recursively. -E '$^' is a regex that never matches. -L prints all files that did not match.

Print contents of all files ending with .txt

cat $(grep -RLE '$^' | grep -E '\.txt$')

Count lines of all files starting with f

wc -l $(grep -RLE '$^' | grep -E '(^|/)f[^/]*$')

——— Closing Remarks ———

In my opinion, this assignment is bad, not so much because it may not be solvable but rather because it teaches bad practices (e.g. not using the right tools, relying on assumptions, ...).

Solution 2:[2]

Summary: You can do it using only the shell, no external tools. That's below. You can also do it using only ls -R plus some shell, or using only tools. See my other answer.

I'm genuinely interested in how would one do this the correct way.

The "correct" way is find. That's the tool for this job. It's defined in POSIX:

The find utility shall recursively descend the directory hierarchy from each file specified by path, evaluating a Boolean expression composed of the primaries described in the OPERANDS section for each file encountered.

I'll give your instructor the benefit of the doubt and assume this isn't some trivial academic exercise. I'll assume the assignment has some practicality, like:

"You've been dropped into a damaged UNIX system that has had most of its toolset removed, including its find command. You need to triage the directory structure. All you've got is ls, grep and a classic Bourne shell. You know that file names are conventional: no spaces in them, no leading dash in them, no control characters in them, etc. How would you do this?" (1)

(This isn't so far fetched. I once triaged a system whose /usr/bin was missing thanks to a mistaken mount directive. I had to diagnose and recover it using only shell built-ins like echo.)

Given this:

$ tree
.
??? file1.txt
??? file2.txt
??? subdir1
?   ??? file11.txt
?   ??? file12.c
?   ??? subdira
?       ??? file1a1.c
?       ??? file1a1.txt
??? subdir2
?   ??? file21.txt

First, the "correct" way. This is our target output:

$ find . -name '*.txt'
./file2.txt
./file1.txt
./subdir1/file11.txt
./subdir1/subdira/file1a1.txt
./subdir2/file21.txt

So, is there a way to get a recursive list of filepaths without find?

Yes. We can solve it under these conditions with just the shell built-ins:

$ r() {
    d=${1:-.}
    for f in *
    do
        if test -f "$f"; then
            case "$f" in *.txt)
                echo $d/$f
                ;;
            esac
        elif test -d "$f"; then
            ( cd "$f"; r "$d/$f" )
        fi
    done
}
$ r
./file1.txt
./file2.txt
./subdir1/file11.txt
./subdir1/subdira/file1a1.txt
./subdir2/file21.txt

No external programs, just shell built-ins. It is easily extensible: instead of echoing the match, you can call a program like wc. Since it is all shell, you can keep tracking variables for summation, etc.

But, this is hardly performant, and it's subject to the exclusion of "weird" file names. Also, it's not identical to the find solution: find output is in inode order, while my shell solution is in locale order. These may differ, as in my example.

This also isn't the only way to do recursive descent, it's just an obvious way. For an alternative version to recursive descent without find, see Rich's POSIX sh tricks.


(1) If your instructor believes this can be correctly done with esoteric file names containing spaces, control characters, dashes, and so on, I suggest your instructor read David Wheeler's treatise (rant) on the subject.

Solution 3:[3]

If you're looking for a pure tool solution (vs. a pure shell solution as in my other answer), then a few options:

tar cvf /dev/null . | grep '\.txt$'
du -a | grep '.txt$' | cut -f2

If you're looking for a hybrid solution, both tool and shell, then:

ls -R . | while read l; do case $l in *:) d=${l%:};; "") d=;; *.txt) echo "$d/$l";; esac; done

This latter one is the closest I can get to the parameters your instructor gave.

Solution 4:[4]

CAVEAT!
in the answer https://stackoverflow.com/a/53109541/16881092 above:
Note bene:

echo "" | grep -Ec '$^'  
1

This is not 0! This zero value is required for the "solution":

 grep -RLE '$^'

Indeed, as seen, this statement is naïvely erroneous:

-E '$^' is a regex that never matches.

In fact, it provides no disambiguating potential for Listing files.
Compare:

echo -e "$^"    | grep -Ec '$^'  
0
echo -e "$^\n"  | grep -Ec '$^'  
1

However, some further hand waving can salvage the technique by making TWO lists of files; those that have a match and those that do not. (Presumably, concatenating the two lists with a following sort.)
The environment in use:

uname -a  
Linux ubuntu 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:00 UTC 2019 i686 i686 i686 GNU/Linux

grep --version
grep (GNU grep) 3.1

There is virtue in the pedagogy, though pedantic, for brain pain strain training.

Specifically, ls does provide for unambiguously deciphering file and path names.

ls --help
-D, --dired                generate output designed for Emacs' dired mode

as detailed in man ls:

'-D'
'--dired'
     With the long listing ('-l') format, print an additional line after
     the main output:

          //DIRED// BEG1 END1 BEG2 END2 ...

     The BEGN and ENDN are unsigned integers that record the byte
     position of the beginning and end of each file name in the output.
     This makes it easy for Emacs to find the names, even when they
     contain unusual characters such as space or newline, without fancy
     searching.

This is parsable by other utilities, sed, besides emacs (editing macros). My motivation to do this is severely lacking.


From the comments of socowi:
How to get full paths recursively in UNIX?
this script has great potential

 ls -R | sed -n -E '/:$/h;/[^:]$/{G;s|(.*)\n(.*):|\2/\1|p}'

though eliminating pathological cases, filtering as stated, is needed.

It is worth noting (or so I believe, to be tested) that the only byte codes not allowed in file names are \x0 and /.

Techniques, without using --dired, may involve ls -p -Q and traditional archaic name globbing, man -s 7 glob.

To be completed (maybe unsuccessfully) ... stay tuned, same time, same channel ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 halfer
Solution 3 bishop
Solution 4 ekim