'creating a per sample table from a vcf using bcftools

I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this:

ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1
ID2 chr2:87432_A:T_1/1 
ID3 chr11:432434:T:G chr14:34234234:C:G chr20:34324234:T:C

This is to then read into R

I have tried combinations of:

bcftools query -f '[%SAMPLE\t] %CHROM:%POS:%REF:%ALT[%GT]\n' but I keep getting sample IDs overlapping on the same line and I can't quite figure out the sytnax.

Your help would be much appreciated



Solution 1:[1]

You cannot achieve what you want with a single BCFtools command. BCFtools parses one VCF variant at a time. However, you can use a command like this to extract what you want:

bcftools +split -i 'GT="0/1" | GT="1/1"' -Ob -o DIR input.vcf

This will create one small .bcf file for each sample and you can then run multiple instance of bcftools query to get what you want

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 freeseek