The GENCODE human gene set

Searle, S; Frankish, A; Bignell, A; Aken, B; Derrien, T; Diekhans, M; Harte, R; Howald, C; Kokocinski, F; Lin, M; Tress, M; Van Baren, M; Barnes, I; Hunt, T; Carvalho-Silva, D; Davidson, C; Donaldson, S; Gilbert, J; Kay, M; Lloyd, D; Loveland, J; Mudge, J; Snow, C; Vamathevan, J; Wilming, L; Brent, M; Gerstein, M; Guigó, R; Kellis, M; Reymond, A; Zadissa, A; Valencia, A; Harrow, J; Hubbard, T

doi:10.1186/gb-2010-11-s1-p36

Volume 11 Supplement 1

Beyond the Genome: The true gene count, human evolution and disease genomics

Poster presentation
Published: 11 October 2010

The GENCODE human gene set

S Searle¹,
A Frankish¹,
A Bignell¹,
B Aken¹,
T Derrien⁵,
M Diekhans⁷,
R Harte⁷,
C Howald,
F Kokocinski¹,
M Lin³,
M Tress²,
M Van Baren⁴,
I Barnes¹,
T Hunt¹,
D Carvalho-Silva¹,
C Davidson¹,
S Donaldson¹,
J Gilbert¹,
M Kay¹,
D Lloyd¹,
J Loveland¹,
J Mudge¹,
C Snow¹,
J Vamathevan¹,
L Wilming¹,
M Brent⁴,
M Gerstein⁶,
R Guigó⁵,
M Kellis³,
A Reymond⁸,
A Zadissa¹,
A Valencia²,
J Harrow¹ &
…
T Hubbard¹

Genome Biology volume 11, Article number: P36 (2010) Cite this article

1737 Accesses
1 Citations
3 Altmetric
Metrics details

The GENCODE consortium is a sub group of the ENCODE consortium. Its aim is to provide complete annotation of genes in the human genome including protein-coding loci, non-coding loci and pseudogenes, based on experimental evidence. The final aim is for the HAVANA team to manually annotate the complete genome. This is a time-consuming process which will be completed over the course of the ENCODE project. Currently, to provide a set of annotation covering the complete genome, rather than just the regions that have been manually annotated, a merge of manual annotation from HAVANA with automatic annotation from the Ensembl automatically annotated gene set is created. This process also adds unique full-length CDS predictions from the Ensembl protein coding set into manually annotated genes, to provide the most complete up to date annotation of the genome possible. Also included in the set are short and long ncRNA genes predicted by the Ensembl prediction pipelines and a consensus set of pseudogene predictions agreed between Havana, Yale and UCSC. The CCDS set is also fully represented within the GENCODE set. The GENCODE set is the default annotation available in Ensembl and is also available in the UCSC genome browser. All the annotation is tagged as to whether it is produced by manual annotation alone, automatic annotation alone, or by both approaches. We are currently working to provide confidence levels for annotation, based on depth and type of evidence supporting it.

There are several analysis groups in the GENCODE consortium that run pipelines that aid the manual annotators in producing models in unannotated regions and to identify potential missed or incorrect manual annotation, including completely missing loci, missing alternative isoforms, incorrect splice sites and incorrect biotypes. These are fed back to the manual annotators using a tracking system. Some of these pipelines use data from other ENCODE subgroups including RNASeq data, histone modification and CAGE and Ditag data. RNAseq data is an important new source of evidence, but generating complete gene models from it is a difficult problem. As part of GENCODE, a competiton was run to assess the quality of predictions produced by various RNAseq prediction pipelines. To confirm uncertain models, GENCODE also has an experimental validation pipeline using RNA sequencing and RACE.

Author information

Authors and Affiliations

Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
S Searle, A Frankish, A Bignell, B Aken, F Kokocinski, I Barnes, T Hunt, D Carvalho-Silva, C Davidson, S Donaldson, J Gilbert, M Kay, D Lloyd, J Loveland, J Mudge, C Snow, J Vamathevan, L Wilming, A Zadissa, J Harrow & T Hubbard
Spanish National Cancer Research Centre (CNIO), Madrid, Spain
M Tress & A Valencia
MIT Computer Science and AI Laboratory, Broad Institute, Cambridge, MA, USA
M Lin & M Kellis
Lab. for Comp. Genomics and Dept. of CS, Washington Univ, St. Louis, Missouri, USA
M Van Baren & M Brent
Centre for Genomic Regulation, Barcelona, Catalonia, Spain
T Derrien & R Guigó
Department of Molecular Biophys. and Biochem., Yale University New Haven, CT, USA
M Gerstein
Center for Biomolecular Science and Engineering, UCSC, CA, USA
M Diekhans & R Harte
Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
A Reymond

Authors

S Searle
View author publications
You can also search for this author in PubMed Google Scholar
A Frankish
View author publications
You can also search for this author in PubMed Google Scholar
A Bignell
View author publications
You can also search for this author in PubMed Google Scholar
B Aken
View author publications
You can also search for this author in PubMed Google Scholar
T Derrien
View author publications
You can also search for this author in PubMed Google Scholar
M Diekhans
View author publications
You can also search for this author in PubMed Google Scholar
R Harte
View author publications
You can also search for this author in PubMed Google Scholar
C Howald
View author publications
You can also search for this author in PubMed Google Scholar
F Kokocinski
View author publications
You can also search for this author in PubMed Google Scholar
M Lin
View author publications
You can also search for this author in PubMed Google Scholar
M Tress
View author publications
You can also search for this author in PubMed Google Scholar
M Van Baren
View author publications
You can also search for this author in PubMed Google Scholar
I Barnes
View author publications
You can also search for this author in PubMed Google Scholar
T Hunt
View author publications
You can also search for this author in PubMed Google Scholar
D Carvalho-Silva
View author publications
You can also search for this author in PubMed Google Scholar
C Davidson
View author publications
You can also search for this author in PubMed Google Scholar
S Donaldson
View author publications
You can also search for this author in PubMed Google Scholar
J Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
M Kay
View author publications
You can also search for this author in PubMed Google Scholar
D Lloyd
View author publications
You can also search for this author in PubMed Google Scholar
J Loveland
View author publications
You can also search for this author in PubMed Google Scholar
J Mudge
View author publications
You can also search for this author in PubMed Google Scholar
C Snow
View author publications
You can also search for this author in PubMed Google Scholar
J Vamathevan
View author publications
You can also search for this author in PubMed Google Scholar
L Wilming
View author publications
You can also search for this author in PubMed Google Scholar
M Brent
View author publications
You can also search for this author in PubMed Google Scholar
M Gerstein
View author publications
You can also search for this author in PubMed Google Scholar
R Guigó
View author publications
You can also search for this author in PubMed Google Scholar
M Kellis
View author publications
You can also search for this author in PubMed Google Scholar
A Reymond
View author publications
You can also search for this author in PubMed Google Scholar
A Zadissa
View author publications
You can also search for this author in PubMed Google Scholar
A Valencia
View author publications
You can also search for this author in PubMed Google Scholar
J Harrow
View author publications
You can also search for this author in PubMed Google Scholar
T Hubbard
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Searle, S., Frankish, A., Bignell, A. et al. The GENCODE human gene set. Genome Biol 11 (Suppl 1), P36 (2010). https://doi.org/10.1186/gb-2010-11-s1-p36

Download citation

Published: 11 October 2010
DOI: https://doi.org/10.1186/gb-2010-11-s1-p36

Beyond the Genome: The true gene count, human evolution and disease genomics

The GENCODE human gene set

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Beyond the Genome: The true gene count, human evolution and disease genomics

The GENCODE human gene set

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us