A standard variation file format for human genome sequences

Reese, Martin G; Moore, Barry; Batchelor, Colin; Salas, Fidel; Cunningham, Fiona; Marth, Gabor T; Stein, Lincoln; Flicek, Paul; Yandell, Mark; Eilbeck, Karen

doi:10.1186/gb-2010-11-8-r88

Table 1 A summary of the tag-value pairs, and their requirement for GVF

From: A standard variation file format for human genome sequences

Tag	Value	Necessity	Description
ID	String	Mandatory	While the GFF3 specification considers the ID tag to be optional, GVF requires it. As in GFF3 this ID must be unique within the file and is not required to have meaning outside of the file
			ID = chr1:Soap:SNP:12345;
			ID = rs10399749;
Variant_seq	String	Optional	All sequences found in this individual (or group of individuals) at a variant location are given with the Variant_seq tag. If the sequence is longer than 50 nucleotides, the sequence may be abbreviated as '~'. In the case where the variant represents a deletion of sequence relative to the reference, the Variant_seq is given as '-'
			Variant_seq = A,T;
Reference_seq	String	Optional	The reference sequence corresponding to the start and end coordinates of this feature
			Reference_seq = G;
Variant_reads	Integer	Optional	The number of reads supporting each variant at this location
			Variant_reads = 34, 23;
Total_reads	Integer	Optional	The total number of reads covering a variant
			Total_reads = 57;
Genotype	String	Optional	The genotype of this variant, either heterozygous, homozygous, or hemizygous
			Genotype = heterozygous;
Variant_freq	Real number between 0 and 1	Optional	A real number describing the frequency of the variant in a population. The details of the source of the frequency should be described in an attribute-method pragma as discussed above. The order of the values given must be in the same order that the corresponding sequences occur in the Variant_seq tag
			Variant_freq = 0.05;
Variant_effect	[1]String: SO term sequence_variant [2]Integer-index [3]String: SO sequence_feature [4]String feature ID	Optional	The effect of a variant on sequence features that overlap it. It is a four part, space delimited tag, The sequence_variant describes the effect of the alteration on the sequence features that follow. Both are typed by SO. The 0-based index corresponds to the causative sequence in the Variant_seq tag. The feature ID lists the IDs of affected features. A variant may have more than one variant effect depending on the intersected features
			Variant_effect = sequence_variant 0 mRNA NM_012345, NM_098765;
Variant_copy_number	Integer	Optional	For regions on the variant genome that exist in multiple copies, this tag represents the copy number of the region as an integer value
			Variant_copy_number = 7;
Reference_copy_number	Integer	Optional	For regions on the reference genome that exist in multiple copies, this tag represents the copy number of the region as an integer in the form:
			Reference_copy_number = 5;
Nomenclature	String	Optional	A tag to capture the given nomenclature of the variant, as described by an authority such as the Human Genome Variation Society
			Nomenclature = HGVS: p.Trp26Cys;

For Dbxrefs, the format of each type of ID varies from database to database. An authoritative list of databases, their DBTAGs, and the URL transformation rules that can be used to fetch the objects given their IDs can be found at this location [45]. Further details can be found here [46]. In addition, a Dbxref can be given as a stable Uniform Resource Identifier (URI).

Back to article page

ISSN: 1474-760X

Contact us

Submission enquiries: editorial@genomebiology.com
General enquiries: info@biomedcentral.com

Genome Biology

Contact us