'Xargs and wc -c
I have something like this:
grep -v ">" $subfolder/assembly/contigs_1L.fasta | xargs -d \n wc >> $subfolder/N50_analysis/NC_len.txt
With this I want, from a fasta files, remove the identifiers, and then I would like to count for each line the characters of that line.
xargs does what it is supposed to, split after the \n but it wouldn't count. If I do it without the \n, then averything is in just one big line.
EDIT:
Input
>C1
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG
>C2
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG
I only need the length of the sequence of AGCT, so I am taking (grep) everything that doesn't start with >, in the file. Then I want to count the length of each sequence independently, so at the end I get something like this:
150
100
.
.
.
Cn
Solution 1:[1]
grep -v ">" testfile | awk '{ print length }' >> charcount.txt
may be what you are looking for.
Solution 2:[2]
Take your input file for example:
cat >input.txt <<EOS
>C1
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG
>C2
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG
EOS
GNU
grep -v '>' input.txt |
tr -s '\n' |
xargs -d '\n' -n1 sh -c 'printf %s "$@" | wc -c' sh
148
148
BSD
grep -v '>' input.txt |
tr '\n' '\0' | tr -s '\0' |
xargs -0 -n1 sh -c 'printf %s "$@" | wc -c' sh
148
148
Explanation
tr -sis required because there are empty lines in the input file that would causexargsgenerating empty strings as arguments.- There is no
-dflag for BSDxargs, an idiomatic solution is to combine it withtr. - We use
printfinstead ofechobecause the latter prints an extra trailing new line character. - We use
printf %s "$@"instead ofprintf "$@"to prevent escaping characters. - When using
sh -c, an extra argument is required for$0. Here we useshbut you can use any other sensible names.-c string If the -c option is present, then commands are read from string. If there are arguments after the string, they are assigned to the positional parameters, starting with $0.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | sjsam |
| Solution 2 | Weihang Jian |
