Volume 11 Supplement 1

Beyond the Genome: The true gene count, human evolution and disease genomics

Open Access

Genes and genomes, an imperfect world: comparison of gene annotations of two Bos taurus draft assemblies

  • Liliana Florea1,
  • Alexander Souvorov2 and
  • Steven L Salzberg1
Genome Biology201011(Suppl 1):P13


Published: 11 October 2010


Gene annotation is the first and most important step in analyzing a genome. As an increasing number of species are being sequenced and assembled to various degrees of completion, two key questions are: how does the quality of the assembled sequence affect gene annotation? What is the impact on scientists using the annotation?


We compared the gene annotations produced with the GNOMON [1] annotation pipeline for two Bos taurus genome assemblies produced at the University of Maryland [2, 3]. We find that the changes to the assembly from one release to the next, which were quite substantial, made significant differences in the gene sets and gene structures, and implicitly the predicted proteins. The annotation was affected by the availability of new gene evidence and by seemingly rare genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, hundreds of protein coding genes in the earlier assembly are missing from the annotation of the later genome, and ~15% (~3600) of the genes have complex structural differences between the two assemblies. In addition, 15-20% of the predicted proteins have relatively large sequence differences when compared to their Ref-seq models.


These findings highlight the consequences of genome quality on gene annotation, and argue for continued improvements in a genome sequence until it is truly finished. They also demonstrate the challenges that confront a biologist when tracking a gene of interest between different assembly versions. In addition, these analyses have helped us identify specific loci for improvement, which will benefit the user community of the Bos taurusgenome.

Authors’ Affiliations

Center for Bioinformatics and Computational Biology, University of Maryland
National Center for Biotechnology Information, National Institutes of Health


  1. The NCBI GNOMON eukaryotic gene prediction tool. [http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.shtml]
  2. Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, et al: A whole-genome assembly of the domestic cow, Bos taurus. Genome. Biol. 2009, 10: R42-10.1186/gb-2009-10-4-r42.PubMedPubMed CentralView ArticleGoogle Scholar
  3. Bos taurus assembly site at the University of Maryland. [http://www.cbcb.umd.edu/research/bos_taurus_assembly.shtml]


© Florea et al; licensee BioMed Central Ltd. 2010

This article is published under license to BioMed Central Ltd.