'Print column contents by column name
I want to input a string name (i.e. "COL2") to an awk or cut command and print the column that matches that column header string.
the datafile looks like this:
COL1 COL2 COL3 COL4 COL5 COL6
a a b d c f
a d g h e f
c v a s g a
If I pass in COL3, I want it to print the third column, etc. I'm thinking awk might be the easiest thing to use, but cut may also work. I'm just not sure how to go about doing this.
Solution 1:[1]
a slight modification of anubhava post on top, for multiple columns
awk -vcol1="COL2" -vcol2="COL6" 'NR==1{for(i=1;i<=NF;i++){if($i==col1)c1=i; if ($i==col2)c2=i;}} NR>0{print $c1 " " $c2}' file.txt
when NR>1 does not print the column headers. This was modified to NR>0 which should print the columns with header names.
Solution 2:[2]
Note that the first solution prints out the whole file if the named column does not exist. To output a warning message if this occurs try
awk -v col=NoneSuch 'NR==1{for(i=1;i<=NF;i++){if($i==col){c=i;break}} if (c > 0) {print $c}} else {print "Column " col "does not exist"} NR>1 && c > 0 {print $c}' file1.txt
Solution 3:[3]
It's a little unclear what you're trying to do.
If you want to get the single column from the data, use substr().
If you want to use an argument to choose the column use something like
BEGIN { mycol = ARGV[1] ; }
{ print $mycol }
Update
Hmmm, so you want generalized column names?
Okay, we'll assume that your data is organized like this:
XXXXX YYYYY ZZZZZ
and you want to name the columns "harpo", "groucho" and "zeppo", and the column name is in ARGV[1]:
BEGIN { cols["harpo"] = 1; cols["groucho"] = 2; cols["zeppo"] = 3; }
{ print $cols[ARGV[1]] }
Second update
Yup, this trick will do it. Replace "harpo" etc with "COL1", "Col2", and so on.
Solution 4:[4]
say column is the variable you declared that is the column you want from the shell. You pass it in using awk's -v option
column=3
awk -vcol="$column" '{print $col}' file
Solution 5:[5]
I'm about a decade late, but here is another simple way to extract a column from .csv file using column name:
column_number=$(awk -v RS=',' '/column_name/{print NR; exit}' file.csv)
column_output=$(awk -F "\"*,\"*" -vcol="$column_number" '{print $col}' file.csv)
echo $column_output
You can change the -F flag in the second command from comma to tab for a .tsv file
Solution 6:[6]
When you say "pass a string" to awk, I guess you want to give the string on the command line. One option is to use the -v feature for defining variables
$ gawk -f columnprinter.awk -v col=thecolumnnameyouwant
Alternately you can use the built-in variable ARGV as Charlie explains.
That only leaves the matter for forming an array to associate column names with column numbers. If the first line of the input contains the column names (a common convention) this becomes pretty easy.
Use
NR==1{...}
to process the first column to get the mapping
NR==1{
colnum=-1;
for(i=1; i<=NF; i++)
if ($i == col) {
colnum=i
break
}
}
which you can use like
{
print $colnum
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Community |
| Solution 2 | AaronB |
| Solution 3 | |
| Solution 4 | ghostdog74 |
| Solution 5 | Komalharini Tiwari |
| Solution 6 | Community |
