'Print each column count alongwith header name
I have a coma delemiated file. I am interested to count the number of rows (column count/length) in each column with their header name.
Example Dataset:
ID, IB, IM, IZ
0.05, 0.02, 0.01, 0.09
0.06, 0.01, , 0.08
0.02, 0.06,
Coumn ID:3
Column IB:3
Column IM: 1
Column IZ:2
I have tried quite few option:
I can split these columns into seperate files and then can count number of lines in each file using wc -l File_name command.
This Command is very close to what I am interested in but stillunable to get header name. Any help will be highly appreciated.
Solution 1:[1]
I would use GNU AWK for this task following way, let file.txt content be
ID, IB, IM, IZ
0.05, 0.02, 0.01, 0.09
0.06, 0.01, , 0.08
0.02, 0.06,
then
awk 'BEGIN{FS=","}NR==1{split($0,names);next}{for(i=1;i<=NF;i+=1){counts[i]+=$i~/[^[:space:]]/}}END{for(i=1;i<=length(names);i+=1){print "Column",names[i]": "counts[i]}}' file.txt
output
Column ID: 3
Column IB: 3
Column IM: 1
Column IZ: 2
Explanation: I inform GNU AWK that , is field separator, when processing 1st record split whole lines ($0) into array names, so ID becomes names[1], IB becomes names[2], IM becomes names[3] and so on. After doing that go to next line. For all but 1st line iterate over columns using for loop, for every line increase value of counts[i] (where i is number of column) by does that column contain non-whitespace character? which is 0 for false and 1 for true. In other words increase by 1 if non-whitespace character found else increase by 0. After processing all lines iterate over names and print name with corresponding value of counts.
(tested in gawk 4.2.1)
Solution 2:[2]
With GNU awk:
awk -F'[[:space:]]*,[[:space:]]*' 'NR == 1 {header = $0; next} \
{for(i = 1; i <= NF; i++) n[i] += ($i ~ /\S/)} \
END {$0 = header; for(i = 1; i <= NF; i++) print "Column " $i ": " n[i]}' file
Column ID: 3
Column IB: 3
Column IM: 2
Column IZ: 1
Solution 3:[3]
Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=" *, *" }
NR == 1 {
numCols = split($0,tags)
next
}
{
for ( i=1; i<=NF; i++ ) {
if ( $i ~ /./ ) {
cnt[i]++
}
}
}
END {
for ( i=1; i<=numCols; i++ ) {
printf "Column %s:%d\n", tags[i], cnt[i]
}
}
$ awk -f tst.awk file
Column ID:3
Column IB:3
Column IM:1
Column IZ:2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Daweo |
| Solution 2 | Renaud Pacalet |
| Solution 3 | Ed Morton |
