'Filtering out DNA repeats with vcftools
I am trying to filter out repeats from DNA sequence reads. For this, I have:
- grw_vcf_filtered_2.vcf as the input file,
- grw_repeatmasker_runner_combined.bed with the repeat positions I want to filter out
- grw_repeats_removed.vcf as the output file I will generate.
However I keep getting the error: grw_repeats_removed.vcf not such file or directory after 4 minutes running.
Here is my code:
module load bioinfo-tools vcftools
cd $SNIC_TMP #making temporary file
cp /proj/snic2020-2-25/nobackup/violeta/grw_vcf_filtered_2.vcf /proj/snic2020-2-25/nobackup/violeta/grw_repeatmasker_runner_combined.bed ./
vcftools --vcf grw_vcf_filtered_2.vcf --out grw_repeats_removed.vcf --exclude-positions grw_repeatmasker_runner_combined.bed
#copy from current location -temporary file- to my directory
cp ./grw_repeats_removed.vcf /proj/snic2020-2-25/nobackup/violeta/
Solution 1:[1]
vcftools is deprecated;
use bcftools
use option
-T, --targets-file [^]FILE Similar to -R but streams rather than index-jumps. Exclude regions with "^" prefix
invoke:
bcftools view -O v -o grw_repeats_removed.vcf --targets-file ^grw_repeatmasker_runner_combined.bed grw_vcf_filtered_2.vcf
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pierre |
