'Xargs and wc -c

I have something like this:

  grep -v ">" $subfolder/assembly/contigs_1L.fasta | xargs -d \n wc   >> $subfolder/N50_analysis/NC_len.txt

With this I want, from a fasta files, remove the identifiers, and then I would like to count for each line the characters of that line.

xargs does what it is supposed to, split after the \n but it wouldn't count. If I do it without the \n, then averything is in just one big line.

EDIT:

Input

>C1
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG

>C2
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG

I only need the length of the sequence of AGCT, so I am taking (grep) everything that doesn't start with >, in the file. Then I want to count the length of each sequence independently, so at the end I get something like this:

 150
 100
  .
  .
  .
  Cn


Solution 1:[1]

grep -v ">" testfile | awk '{ print length }' >> charcount.txt

may be what you are looking for.

Solution 2:[2]

Take your input file for example:

cat >input.txt <<EOS
>C1
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG

>C2
AGATGATGAGGATGAGATTGACTACGATCGATCGATGCATCGATCGGCATCGATCGACTGATCGATCGATCGATCGATCGATCGTACGATCGGCTACGCGCGATCGACGCGCGCGATCGATCGATCGTCGATCGGCGCGCTACGATCG
EOS

GNU

grep -v '>' input.txt |
  tr -s '\n' |
  xargs -d '\n' -n1 sh -c 'printf %s "$@" | wc -c' sh
148
148

BSD

grep -v '>' input.txt |
  tr '\n' '\0' | tr -s '\0' |
  xargs -0 -n1 sh -c 'printf %s "$@" | wc -c' sh
     148
     148

Explanation

  1. tr -s is required because there are empty lines in the input file that would cause xargs generating empty strings as arguments.
  2. There is no -d flag for BSD xargs, an idiomatic solution is to combine it with tr.
  3. We use printf instead of echo because the latter prints an extra trailing new line character.
  4. We use printf %s "$@" instead of printf "$@" to prevent escaping characters.
  5. When using sh -c, an extra argument is required for $0. Here we use sh but you can use any other sensible names.

    -c string If the -c option is present, then commands are read from string. If there are arguments after the string, they are assigned to the positional parameters, starting with $0.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 sjsam
Solution 2 Weihang Jian