What transcripts are found in a human cell?
- Hamish Scott
© BioMed Central Ltd 2000
Received: 14 February 2000
Published: 27 April 2000
Serial analysis of gene expression (SAGE) has been used to analyze the complete human 'transcriptome' - the number, identity and level of expression of genes in humans.
Significance and context
Understanding how the spatial and temporal expression patterns of individual genes relate to expression patterns of other genes and to changes in the organism's behavioral state, onset of disease and response to drugs should provide valuable insight into molecular physiology. Hence the great interest in high-throughput differential gene-expression technologies that can quantitatively analyze all mRNAs expressed by a cell or tissue type at any given time. The number, identity and level of expression of the entire set of genes expressed from a eukaryotic genome for a defined population of cells is defined as a 'transcriptome'. An alternative to the much-hyped cDNA microarray technology for this purpose is serial analysis of gene expression (SAGE), which, curiously, was originally described in the same issue of Science as the microarray methodology from Stanford. As with expressed sequence tags (ESTs), SAGE relies on sequencing to identify genes and can be considered as a variant of EST analysis. Unlike EST analysis, however, SAGE identifies only a short sequence from a defined position within the transcript. The use of short tags enables approximately 40 times as many transcripts to be identified by SAGE as can be identified in an EST project for the same sequencing effort.Theoretically, the defined position of the tag within the transcript enables unambiguous transcript identification, in contrast to ESTs. Also, SAGE does not depend on prior knowledge of transcript sequence; experimental data are electronically stored and can be reanalyzed as genome projects advance or are completed; and SAGE provides absolute rather than relative expression levels. The analysis by Velculescu et al. provides a true snapshot of how many transcripts are needed to make up a human cell's biochemical machinery and the levels of expression of these transcripts, and provides a wealth of data for analysis in other laboratories.
More information about SAGE is available from the Serial analysis of gene expressionhttp://www.sagenet.org homepage and the SAGEmap site at the National Center for Biotechnological Information (NCBI). For information on cDNA microarray technology visit the Gene Chips (DNA Microarrays)http://www.gene-chips.com website.
Velculescu et al.have provided a database of transcripts that is ripe for the picking. There are known and unknown transcripts and ones that are ubiquitously expressed or specific to certain cell types. By making comparisons with this published human transcriptome, SAGE analyses of human tissue will be able quickly to identify transcripts unique to other biological situations. Perhaps the most troubling aspect of this data for future studies in all mammals is the quantification of the large number of transcripts that are identified as being present at fewer than five copies per cell. These rare transcripts comprise 25% of cellular mRNA by mass but 94% of unique transcripts, and only 50% match transcript sequences (mRNAs and ESTs) in GenBank or EMBL (Figure 1). Many of these transcripts will be at least partially identified by genomic sequencing. But will microarray technologies be sensitive enough not only to detect but also to quantitate these low-abundance transcripts?
Many researchers are already combining different techniques, including SAGE and cDNA microarrays, in an attempt to get the best of both techniques. Probes representing a substantial group of genes of known different expression levels (from SAGE) and an additional labeled target RNA (from a cell line for which SAGE data exists) could be added to microarray experiments as an additional set of controls to allow conversion of microarray data to more absolute expression levels.