'How can I filter my R dataframe based on a column match AND ranging values?

I am trying to filter a VCF file based on chromossome and genomic range that is present in my annotation file.

My annotation file looks like this:

CHROM START END
chr1 64833245 65067732
chr10 6010689 6062367
chr11 36591943 36598236
chr11 36568007 36579762

And my VCF file:

CHROM POS ID REF ALT
chr1 3 . A G
chr10 6020671 . T C
chr11 36591872 . T G
chr11 36567002 . G A

So, I need filter my VCF based on CHROM match and variant position ranging between the annotation values "START" and "END".

Is there an easy way?



Solution 1:[1]

Not on R, but the annotation file is in .bed format and the other file is a vcf so you can use

bedtools --intersect

the documentation: https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

the command line: bedtools intersect -a <FILEA> -b <FILEB>

To sort the bedfile by chrom and by pos: sort -k1,1 -k2,2n in.bed > in.sorted.bed

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ekerde