Volume 11 Supplement 1

Beyond the Genome: The true gene count, human evolution and disease genomics

Open Access

Defining the human reference protein-coding gene set

  • Suganthi Balasubramanian1,
  • Lukas Habegger1,
  • Adam Frankish2,
  • Daniel MacArthur2,
  • Rachel Harte3,
  • Chris Tyler-Smith2,
  • Jennifer Harrow2 and
  • Mark Gerstein1
Genome Biology201011(Suppl 1):O5

https://doi.org/10.1186/gb-2010-11-s1-o5

Published: 11 October 2010

The number of coding genes in the human genome is still under debate [1]. Here, we present a proposal to define the human reference gene set that takes into account the inter-individual differences in gene numbers arising from gene inactivation events, such as premature termination or aberrant splicing due to nonsense SNPs or SNPs at essential splice sites respectively. We have analyzed SNPs (specifically nonsense SNPs and SNPs affecting essential splice sites) from 23 personal genomes and exomes. We see a wide range in numbers of SNPs in each of the categories surveyed. A large fraction of these SNPs are singletons. Using a data set of high-confidence SNPs obtained by intersecting SNPs from dbSNP and the personal genomes, we identify a common set of 279 genes predicted to be pseudogenic (non-functional) in some individuals and functional in others.

We focused on two key questions arising from these considerations: (i) Which criteria should be used for inclusion and exclusion of genes from the reference set? (ii) What sequence should be used as the reference for genes that are non-functional in some humans? For the first question, we propose to include all genes that are functional even in one individual to produce a maximally-inclusive set of genes. For the second, we propose the use of the ancestral allele as the reference allele. This will provide a uniform basis for gene annotation and ensure that the reference gene set and sequence will be relatively stable as more individual genomes are sequenced. In the few cases where an ancestral state assignment is unavailable or ambiguous, we propose that genes be annotated as the functional allele.

Authors’ Affiliations

(1)
Department of Molecular Biophysics and Biochemistry, Yale University
(2)
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus
(3)
Department of Biomolecular Engineering, University of California, Santa Cruz

References

  1. Pertea M, Salzberg SL: Between a chicken and a grape: estimating the number of human genes. Genome Biol. 2010, 11: 206-10.1186/gb-2010-11-5-206.PubMedPubMed CentralView ArticleGoogle Scholar

Copyright

© Gerstein et al; licensee BioMed Central Ltd. 2010

This article is published under license to BioMed Central Ltd.

Advertisement