'How to count number of tracked files in each sub-directory of the repository?
In a git repo, I want to list directories (and sub-directories) that contain tracked items and the number items (tracked files only) in each of them.
The following command gives list of directories:
$ git ls-files | xargs -n 1 dirname | uniq
, and this one counts all tracked items in the repository:
$ git ls-files | wc -l
The following command counts files in all sub-directories:
$ find . -type d -exec sh -c "echo '{}'; ls -1 '{}' | wc -l" \; | xargs -n 2 | awk '{print $1" "$2}'
But it also counts the directories themselves and, of course, it does not care if files are tracked. Take a look at the example below for more explanation:
C:\ROOT
│ tracked1.txt
│
├───Dir1
│ ├───Dir11
│ │ tracked111.txt
│ │ tracked112.txt
│ │
│ └───Dir12
│ ignored121.tmp
│ tracked121.txt
│
└───Dir2
│ ignored21.tmp
│ Tracked21.txt
│
└───Dir21
ignored211.tmp
ignored212.tmp
Running $ find root -type d -exec sh -c "echo '{}'; ls -1 '{}' | wc -l" \; | xargs -n 2 | awk '{print $2", "$1}' command gives the following result:
3, root
2, root/Dir1
2, root/Dir1/Dir11
2, root/Dir1/Dir12
3, root/Dir2
2, root/Dir2/Dir21
What I need is:
31, root2, root/Dir1
2, root/Dir1/Dir1121, root/Dir1/Dir1231, root/Dir22, root/Dir2/Dir21
, where sub-directories and ignored items are not counted, and directories with no tracked items are not included. But I don't know how to pipe these commands to get the results.
Solution 1:[1]
git ls-files | awk '{$NF="";print}' FS=/ OFS=/ | sort | uniq -c
or, shorter,
git ls-files | sed 's,[^/]*$,,' | sort | uniq -c
Solution 2:[2]
The following code lists all files, groups them by their directory names and prints the size of each group:
$ git ls-files | xargs -n 1 dirname | awk ' { filescount[$1] += 1 }
END {
n=asorti(filescount, sortedpath);
for (i = 1; i <= n; i++) print filescount[sortedpath[i]], sortedpath[i]
}'
1 .
1 Root
2 Root/Dir1/Dir11
1 Root/Dir1/Dir12
1 Root/Dir2
If you also need the total number of lines of code in each directory:
$ git ls-files | xargs -n1 wc -l | awk ' { sub("/[^/]*$", "/") } 1' |
awk ' { filescount[$2] += 1; linescount[$2] += $1 }
END {
n=asorti(filescount, sortedpath);
for (i = 1; i <= n; i++)
print filescount[sortedpath[i]], linescount[sortedpath[i]], sortedpath[i]
}'
1 260 .gitignore
1 5 Root/
2 1 Root/Dir1/Dir11/
1 2 Root/Dir1/Dir12/
1 4 Root/Dir2/
The second command does not group files of the root directory and adds a separate line for each of them. The problem is in awk ' { sub("/[^/]*$", "/") } 1' part that tries to extract directories from a path. It fails and returns the whole path when there is no parent directory in the path (e.g. .gitignore).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jthill |
| Solution 2 | saastn |
