'Filtering out DNA repeats with vcftools

I am trying to filter out repeats from DNA sequence reads. For this, I have:

  • grw_vcf_filtered_2.vcf as the input file,
  • grw_repeatmasker_runner_combined.bed with the repeat positions I want to filter out
  • grw_repeats_removed.vcf as the output file I will generate.

However I keep getting the error: grw_repeats_removed.vcf not such file or directory after 4 minutes running.

Here is my code:

module load bioinfo-tools vcftools

cd $SNIC_TMP #making temporary file 

cp /proj/snic2020-2-25/nobackup/violeta/grw_vcf_filtered_2.vcf /proj/snic2020-2-25/nobackup/violeta/grw_repeatmasker_runner_combined.bed ./

vcftools --vcf grw_vcf_filtered_2.vcf --out grw_repeats_removed.vcf --exclude-positions grw_repeatmasker_runner_combined.bed

#copy from current location -temporary file- to my directory 

cp ./grw_repeats_removed.vcf /proj/snic2020-2-25/nobackup/violeta/


Solution 1:[1]

vcftools is deprecated;

use bcftools

use option

-T, --targets-file [^]FILE        Similar to -R but streams rather than index-jumps. Exclude regions with "^" prefix

invoke:

bcftools view -O v -o grw_repeats_removed.vcf --targets-file ^grw_repeatmasker_runner_combined.bed grw_vcf_filtered_2.vcf

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pierre