Skip to main content

Table 1 A summary of the tag-value pairs, and their requirement for GVF

From: A standard variation file format for human genome sequences

Tag Value Necessity Description
ID String Mandatory While the GFF3 specification considers the ID tag to be optional, GVF requires it. As in GFF3 this ID must be unique within the file and is not required to have meaning outside of the file
    ID = chr1:Soap:SNP:12345;
    ID = rs10399749;
Variant_seq String Optional All sequences found in this individual (or group of individuals) at a variant location are given with the Variant_seq tag. If the sequence is longer than 50 nucleotides, the sequence may be abbreviated as '~'. In the case where the variant represents a deletion of sequence relative to the reference, the Variant_seq is given as '-'
    Variant_seq = A,T;
Reference_seq String Optional The reference sequence corresponding to the start and end coordinates of this feature
    Reference_seq = G;
Variant_reads Integer Optional The number of reads supporting each variant at this location
    Variant_reads = 34, 23;
Total_reads Integer Optional The total number of reads covering a variant
    Total_reads = 57;
Genotype String Optional The genotype of this variant, either heterozygous, homozygous, or hemizygous
    Genotype = heterozygous;
Variant_freq Real number between 0 and 1 Optional A real number describing the frequency of the variant in a population. The details of the source of the frequency should be described in an attribute-method pragma as discussed above. The order of the values given must be in the same order that the corresponding sequences occur in the Variant_seq tag
    Variant_freq = 0.05;
Variant_effect [1]String: SO term sequence_variant
[2]Integer-index
[3]String: SO sequence_feature
[4]String feature ID
Optional The effect of a variant on sequence features that overlap it. It is a four part, space delimited tag, The sequence_variant describes the effect of the alteration on the sequence features that follow. Both are typed by SO. The 0-based index corresponds to the causative sequence in the Variant_seq tag. The feature ID lists the IDs of affected features. A variant may have more than one variant effect depending on the intersected features
    Variant_effect = sequence_variant 0 mRNA NM_012345, NM_098765;
Variant_copy_number Integer Optional For regions on the variant genome that exist in multiple copies, this tag represents the copy number of the region as an integer value
    Variant_copy_number = 7;
Reference_copy_number Integer Optional For regions on the reference genome that exist in multiple copies, this tag represents the copy number of the region as an integer in the form:
    Reference_copy_number = 5;
Nomenclature String Optional A tag to capture the given nomenclature of the variant, as described by an authority such as the Human Genome Variation Society
    Nomenclature = HGVS: p.Trp26Cys;
  1. For Dbxrefs, the format of each type of ID varies from database to database. An authoritative list of databases, their DBTAGs, and the URL transformation rules that can be used to fetch the objects given their IDs can be found at this location [45]. Further details can be found here [46]. In addition, a Dbxref can be given as a stable Uniform Resource Identifier (URI).