Skip to main content

Table 1 A summary of the tag-value pairs, and their requirement for GVF

From: A standard variation file format for human genome sequences

Tag

Value

Necessity

Description

ID

String

Mandatory

While the GFF3 specification considers the ID tag to be optional, GVF requires it. As in GFF3 this ID must be unique within the file and is not required to have meaning outside of the file

   

ID = chr1:Soap:SNP:12345;

   

ID = rs10399749;

Variant_seq

String

Optional

All sequences found in this individual (or group of individuals) at a variant location are given with the Variant_seq tag. If the sequence is longer than 50 nucleotides, the sequence may be abbreviated as '~'. In the case where the variant represents a deletion of sequence relative to the reference, the Variant_seq is given as '-'

   

Variant_seq = A,T;

Reference_seq

String

Optional

The reference sequence corresponding to the start and end coordinates of this feature

   

Reference_seq = G;

Variant_reads

Integer

Optional

The number of reads supporting each variant at this location

   

Variant_reads = 34, 23;

Total_reads

Integer

Optional

The total number of reads covering a variant

   

Total_reads = 57;

Genotype

String

Optional

The genotype of this variant, either heterozygous, homozygous, or hemizygous

   

Genotype = heterozygous;

Variant_freq

Real number between 0 and 1

Optional

A real number describing the frequency of the variant in a population. The details of the source of the frequency should be described in an attribute-method pragma as discussed above. The order of the values given must be in the same order that the corresponding sequences occur in the Variant_seq tag

   

Variant_freq = 0.05;

Variant_effect

[1]String: SO term sequence_variant

[2]Integer-index

[3]String: SO sequence_feature

[4]String feature ID

Optional

The effect of a variant on sequence features that overlap it. It is a four part, space delimited tag, The sequence_variant describes the effect of the alteration on the sequence features that follow. Both are typed by SO. The 0-based index corresponds to the causative sequence in the Variant_seq tag. The feature ID lists the IDs of affected features. A variant may have more than one variant effect depending on the intersected features

   

Variant_effect = sequence_variant 0 mRNA NM_012345, NM_098765;

Variant_copy_number

Integer

Optional

For regions on the variant genome that exist in multiple copies, this tag represents the copy number of the region as an integer value

   

Variant_copy_number = 7;

Reference_copy_number

Integer

Optional

For regions on the reference genome that exist in multiple copies, this tag represents the copy number of the region as an integer in the form:

   

Reference_copy_number = 5;

Nomenclature

String

Optional

A tag to capture the given nomenclature of the variant, as described by an authority such as the Human Genome Variation Society

   

Nomenclature = HGVS: p.Trp26Cys;

  1. For Dbxrefs, the format of each type of ID varies from database to database. An authoritative list of databases, their DBTAGs, and the URL transformation rules that can be used to fetch the objects given their IDs can be found at this location [45]. Further details can be found here [46]. In addition, a Dbxref can be given as a stable Uniform Resource Identifier (URI).