Defining the human reference protein-coding gene set

Balasubramanian, Suganthi; Habegger, Lukas; Frankish, Adam; MacArthur, Daniel; Harte, Rachel; Tyler-Smith, Chris; Harrow, Jennifer; Gerstein, Mark

doi:10.1186/gb-2010-11-s1-o5

Volume 11 Supplement 1

Beyond the Genome: The true gene count, human evolution and disease genomics

Selected oral presentation
Published: 11 October 2010

Defining the human reference protein-coding gene set

Suganthi Balasubramanian¹,
Lukas Habegger¹,
Adam Frankish²,
Daniel MacArthur²,
Rachel Harte³,
Chris Tyler-Smith²,
Jennifer Harrow² &
…
Mark Gerstein¹

Genome Biology volume 11, Article number: O5 (2010) Cite this article

2835 Accesses
Metrics details

The number of coding genes in the human genome is still under debate [1]. Here, we present a proposal to define the human reference gene set that takes into account the inter-individual differences in gene numbers arising from gene inactivation events, such as premature termination or aberrant splicing due to nonsense SNPs or SNPs at essential splice sites respectively. We have analyzed SNPs (specifically nonsense SNPs and SNPs affecting essential splice sites) from 23 personal genomes and exomes. We see a wide range in numbers of SNPs in each of the categories surveyed. A large fraction of these SNPs are singletons. Using a data set of high-confidence SNPs obtained by intersecting SNPs from dbSNP and the personal genomes, we identify a common set of 279 genes predicted to be pseudogenic (non-functional) in some individuals and functional in others.

We focused on two key questions arising from these considerations: (i) Which criteria should be used for inclusion and exclusion of genes from the reference set? (ii) What sequence should be used as the reference for genes that are non-functional in some humans? For the first question, we propose to include all genes that are functional even in one individual to produce a maximally-inclusive set of genes. For the second, we propose the use of the ancestral allele as the reference allele. This will provide a uniform basis for gene annotation and ensure that the reference gene set and sequence will be relatively stable as more individual genomes are sequenced. In the few cases where an ancestral state assignment is unavailable or ambiguous, we propose that genes be annotated as the functional allele.

References

Pertea M, Salzberg SL: Between a chicken and a grape: estimating the number of human genes. Genome Biol. 2010, 11: 206-10.1186/gb-2010-11-5-206.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
Suganthi Balasubramanian, Lukas Habegger & Mark Gerstein
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
Adam Frankish, Daniel MacArthur, Chris Tyler-Smith & Jennifer Harrow
Department of Biomolecular Engineering, University of California, Santa Cruz, 1156 High Street, Santa Cruz, California, 95064, USA
Rachel Harte

Authors

Suganthi Balasubramanian
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Habegger
View author publications
You can also search for this author in PubMed Google Scholar
Adam Frankish
View author publications
You can also search for this author in PubMed Google Scholar
Daniel MacArthur
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Harte
View author publications
You can also search for this author in PubMed Google Scholar
Chris Tyler-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Harrow
View author publications
You can also search for this author in PubMed Google Scholar
Mark Gerstein
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balasubramanian, S., Habegger, L., Frankish, A. et al. Defining the human reference protein-coding gene set. Genome Biol 11 (Suppl 1), O5 (2010). https://doi.org/10.1186/gb-2010-11-s1-o5

Download citation

Published: 11 October 2010
DOI: https://doi.org/10.1186/gb-2010-11-s1-o5

Beyond the Genome: The true gene count, human evolution and disease genomics

Defining the human reference protein-coding gene set

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Beyond the Genome: The true gene count, human evolution and disease genomics

Defining the human reference protein-coding gene set

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us