'How can I filter my R dataframe based on a column match AND ranging values?
I am trying to filter a VCF file based on chromossome and genomic range that is present in my annotation file.
My annotation file looks like this:
| CHROM | START | END |
|---|---|---|
| chr1 | 64833245 | 65067732 |
| chr10 | 6010689 | 6062367 |
| chr11 | 36591943 | 36598236 |
| chr11 | 36568007 | 36579762 |
And my VCF file:
| CHROM | POS | ID | REF | ALT |
|---|---|---|---|---|
| chr1 | 3 | . | A | G |
| chr10 | 6020671 | . | T | C |
| chr11 | 36591872 | . | T | G |
| chr11 | 36567002 | . | G | A |
So, I need filter my VCF based on CHROM match and variant position ranging between the annotation values "START" and "END".
Is there an easy way?
Solution 1:[1]
Not on R, but the annotation file is in .bed format and the other file is a vcf so you can use
bedtools --intersect
the documentation: https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
the command line: bedtools intersect -a <FILEA> -b <FILEB>
To sort the bedfile by chrom and by pos:
sort -k1,1 -k2,2n in.bed > in.sorted.bed
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ekerde |
