LongSAGE profiling of nine human embryonic stem cell lines
- Martin Hirst1,
- Allen Delaney1,
- Sean A Rogers1,
- Angelique Schnerch1,
- Deryck R Persaud1,
- Michael D O'Connor2,
- Thomas Zeng1,
- Michelle Moksa1,
- Keith Fichter1,
- Diana Mah1,
- Anne Go1,
- Ryan D Morin1,
- Agnes Baross1,
- Yongjun Zhao1,
- Jaswinder Khattra1,
- Anna-Liisa Prabhu1,
- Pawan Pandoh1,
- Helen McDonald1,
- Jennifer Asano1,
- Noreen Dhalla1,
- Kevin Ma1,
- Stephanie Lee1,
- Adrian Ally1,
- Neil Chahal1,
- Stephanie Menzies1,
- Asim Siddiqui1,
- Robert Holt1,
- Steven Jones1,
- Daniela S Gerhard3,
- James A Thomson4,
- Connie J Eaves2 and
- Marco A Marra1Email author
© Hirst et al.; licensee BioMed Central Ltd. 2007
Received: 18 December 2006
Accepted: 14 June 2007
Published: 14 June 2007
To facilitate discovery of novel human embryonic stem cell (ESC) transcripts, we generated 2.5 million LongSAGE tags from 9 human ESC lines. Analysis of this data revealed that ESCs express proportionately more RNA binding proteins compared with terminally differentiated cells, and identified novel ESC transcripts, at least one of which may represent a marker of the pluripotent state.
Embryonic stem cells (ESCs) can be derived from the inner cell mass of blastocysts and are defined by their ability to be propagated indefinitely as undifferentiated cells with the potential, upon appropriate stimulation, to generate cell types representing all three embryonic germ layers . Since the first reported isolation of human cells with these properties , the derivation of more than 150 such lines has been described. This large collection of human ESC lines provides opportunities for understanding the earliest stages of human embryo and tissue development, as well as for elucidating the mechanisms that can permanently maintain pluripotency. Studies of mouse ESCs have defined a number of genes that appear to play key roles in this process, including those encoding Oct4 , Nanog [4, 5], Sox2 , FoxD3  and fibroblast growth factor-4 [8, 9]. Comparisons of mouse and human ESCs have also revealed a number of conserved signaling pathways, including those involving JAK/STAT, transforming growth factor-β and fibroblast growth factor [10–12]. However, cross-species analysis of microarray data [13, 14] and expressed sequence tag (EST) resources [15–18] suggest that additional molecular regulators of ESC self-renewal may exist and that likely candidates are heterochronic genes, microRNAs, genes involved in telomeric regulation and polycomb group repressors .
Microarray-based approaches have been used to define the transcriptomes of numerous human ESC lines, including BG01, BG02, WA01, WA07, WA09, WA13, WA14, TE06, UC01 and UC06 [19–22]. These studies provide a rich resource for cell line comparisons; however, incomplete annotation of the genome and inherent biases in the microarray technology limit interpretation to well characterized, abundantly expressed transcripts [23–25]. A number of DNA sequence-based approaches have also been used to study the human ESC transcriptome, including EST analysis , serial analysis of gene expression (SAGE)  and massively parallel signature sequencing (MPSS) [16, 18]. Comparisons of these datasets have been used to search for genes that might be required for maintenance of pluripotency [13, 15, 16, 22] but, interestingly, exhibit limited overlap between datasets, in some cases as low as 1% [26–28], possibly because of the different technologies employed in different studies . The fact that a large proportion of transcripts expressed in ESCs do not correspond to annotated genes has further confounded the yields of such comparisons . To generate a transcript discovery resource complementary to previous work, we undertook a large scale gene expression analysis of nine different human ESC lines, maintained as undifferentiated cells, using the long serial analysis of gene expression (LongSAGE ) approach.
Results and discussion
Digital gene expression profiling of nine human ESC lines reveals an enrichment of RNA binding proteins
Human embryonic stem cell lines profiled in this study
Total no. of tags
bFGF-2 concentration (ng/ml)
Mouse embryonic fibroblasts (CF-1)
Mouse embryonic fibroblasts (CF-1)
Mouse embryonic fibroblasts (CF-1)
Mouse embryonic fibroblasts (CF-1)
Mouse embryonic fibroblasts (B-81)
Mouse embryonic fibroblasts (B-81)
Mouse embryonic fibroblasts (CF-1)
Mouse embryonic fibroblasts (CF-1)
Mouse embryonic fibroblasts
Expression of undifferentiated and differentiated ESC markers
Embryonic stem cell libraries
A previous analysis of SAGE data generated using ES03 and ES04 cells showed that Rex1 was within the top 25 differentially expressed transcripts, with no Rex1 tags detected in the ES04 line and an absence of Rex1 expression in ES04 cells confirmed by quantitative and semi-quantitative real time (RT)-PCR . Interestingly, in our LongSAGE libraries, tags for Rex1 were present in all nine ESC libraries, including the library prepared from ES04 cells and there was less than a three-fold difference in Rex1 expression between ES03 and ES04 (Table 2).
To generate a list of transcripts common to all libraries (excluding the HSF-6 library because of the differentiation markers found therein), we first identified tags from each library that uniquely mapped to transcripts within RefSeq  and the Mammalian Gene Collection (MGC) . This analysis identified a set of 4,337 LongSAGE tags present in all libraries (Additional data file 5). Comparison of this list to those generated by previous MPSS and SAGE approaches revealed extensive (80%) concordance between the SAGE-based transcriptomes. In contrast, 52% of genes identified by MPSS were not found in either of the SAGE common gene lists. Some of this lack of concordance may be explained by differences in the tagging restriction enzyme used by the two protocols (NlaIII for SAGE and Dpn1 for MPSS) and the fact that different mRNA preparations were used in each study. To further explore this lack of concordance, we compared the longSAGE and MPSS-derived gene lists to a common gene list derived from Affymetrix expression arrays generated from the same RNAs used to construct our LongSAGE libraries . The Affymetrix common gene set contained more than 80% of the LongSAGE common gene list (Additional data file 5) while MPSS contained only 68% of the genes on this list.
Identification of novel ESC-specific transcripts
LongSAGE offers opportunities for discovering novel transcripts. These can be identified as tags that map uniquely to the genome but not to any available transcript resources. To look for these, we used the 2.5 million tag meta-library, which contained 379,645 unique tag sequences. Grouping LongSAGE tags that mapped to genomic locations in close proximity to one another  resulted in the identification of 24,593 transcription units. Of these, 14,588 did not overlap with known genes and were classified as novel. Most tags were expressed at low levels with 46% (6,672) identified by a single LongSAGE tag. Even though singletons are enriched for artifacts, many of these are likely to represent real transcripts, for two reasons: first, they map to the genome; and second, we  and others  have shown previously that at least 70% of novel, singleton, high quality LongSAGE tags identify rare transcripts whose expression can be confirmed in RNA-dependent RT-PCR experiments.
To further characterize these putative novel, low-abundance ESC library specific transcripts, we compared the ESC meta-library to publicly available data derived from 247 non-ESC SAGE libraries that together contained 654,491 unique tag sequences. This comparison identified 20,047 tag sequences found only in the human ESC meta-library (Additional data file 6). For subsequent analyses, we focused on those tags that uniquely mapped at least 2 kb away from any known gene. This analysis reduced the number of tags to 634 (Additional data file 7), of which 301 were found within genomic regions exhibiting sequence conservation between human and mouse or rat (Additional data file 8). We used rapid amplification of cDNA ends (RACE) [47, 48] to clone the 5' ends of 52 of these (Additional data file 9). Alignment of the resulting sequences to the human genome revealed that 22 (40%) were spliced. An open reading frame (ORF) scan of the 52 RACE clone sequences using Bioperl  tools and custom scripts identified 6 transcripts that encoded peptides longer than 100 amino acids in length. However, with the exception of one transcript (HA_003333) that overlapped the 3' end of the MAPK2 gene, none of the identified ORFs demonstrated Ka/Ks ratios suggestive of purifying selection . Hence, these transcripts may not encode proteins but may instead represent non-coding RNAs (ncRNAs).
Many pseudogenes have been identified in the human genome using homology-based approaches [56–58]. Pseudogenes are generally not transcribed due to their lack of functional promoters [59, 60]. However, there are examples of pseudogenes that have retained or acquired functional promoters, leading to their transcription . Because of the low levels of expression of the 52 novel transcripts (on average, only 3 tags per million) we asked whether the 5' RACE clones were derived from expressed pseudogenes. Comparison of the RACE clone sequences to three computationally generated lists of known human pseudogenes [56–58] revealed only one clone (HA_003350) with a predicted pseudogene contained within its exon. Furthermore, with the exception of HA_003333, none of the novel transcript sequences showed significant sequence similarity to any known ORF (using a 70% ORF threshold ). Taken together, these analyses do not support the notion that the novel genes identified by our analysis are enriched for expressed pseudogenes.
Comparison of the 5' RACE clone sequences to publicly available ESTs revealed 36 (69%) with matches to other ESTs, of which 7 were found only in data derived from pluripotent human ESC lines. One RACE clone that overlapped an EST derived from pluripotent human ESC lines (HA_003152) was also found to be expressed in all nine ESC lines studied here. BLAT  alignment of the 5' RACE clone sequence to the human reference genome sequence revealed that HA_003152 contained two introns and resided within a genomic region that exhibited sequence similarity to long interspersed nuclear elements. An ORF scan revealed a 129 amino acid peptide encoded in the second exon with homology to the carboxyl terminus of the LINE p40 ORF.
As part of the ongoing effort to elucidate mechanisms regulating ESC self-renewal, we generated 2.5 million LongSAGE tags from nine human ESC lines. Comparison of these data to libraries prepared from differentiated tissues identified a group of ESC-library specific transcripts and an enrichment of transcripts encoding mitochondrial and RNA binding proteins (by comparison to differentiated cells). RNA binding proteins play a role in the regulation of mRNA processing and examination of non-canonical longSAGE tags in the human ESC libraries suggest that these cells express a distinct collection of gene isoforms. One such isoform may bypass translational down regulation through the expression of a transcript lacking predicted miRNA target sequences.
An emerging theme in digital gene expression profiling is the identification of a large class of transcripts that map uniquely to the genome, but cannot be localized to any known or computationally predicted transcripts. Tags in this class are predominantly found at relatively low levels. Analysis of the 2.5 million LongSAGE tags generated in the course of this study revealed 14,588 such tag sequences, a subset of which were found exclusively in human ESCs. As a first step towards understanding the relevance of these transcripts to ESC biology we generated 5' RACE clones for 52 novel apparently ESC-specific transcripts. Analyses of these transcripts revealed that the majority do not appear to encode proteins and do not overlap existing pseudogene predictions. One transcript was found to be expressed across all nine ESC lines we profiled and matched ESTs generated by others from ESCs. Its restricted expression pattern suggests that it may represent a novel transcriptional marker for the maintenance of pluripotentiality. In addition to the discovery of this potential marker, we also identified four novel transcripts that may participate in the regulation of expression of known genes, one of which is known to play a direct role in differentiation. Our analyses indicate that there are many previously undiscovered transcripts expressed in human ESCs and support the contention that sampling of SAGE libraries to depths beyond currently accepted practice is required to fully explore the coding potential of the mammalian transcriptome. To assess possible functions associated with such rare transcripts, we are actively pursuing the cloning and characterization of the remaining novel human ESC-specific transcripts identified in this study.
Materials and methods
Cell culture and RNA isolation
Detailed information regarding the human ESC lines used in this study can be found at the NIH Stem Cell Information website . The passage numbers of the cells analyzed in this study are presented in Table 1. Total RNA was prepared using Trizol reagent (Invitrogen, Burlington, ON, USA) following the manufacturer's protocol and was assayed for quality and quantified using an Agilent 2100 Bioanalyzer (Agilent Technologies) and RNA 6000 Nano LabChip kit (Caliper Technologies, Hopkinton, MA, USA).
LongSAGE library construction
Nine LongSAGE  libraries were constructed from 5-20 μg of DNase I-treated total RNA as described  (DNase I from Invitrogen). LongSAGE data generated for this study are available through our embryonic stem cell transcriptomes website  and through the CGAP web portal .
Novel transcript identification
LongSAGE tags of at least 99.9% accuracy (calculated using Phred [66, 67] quality scores) from the meta-library were compared to 247 publicly available human SAGE libraries (GEO , Discovery db ). To allow direct comparison of the LongSAGE data to the 14 bp SAGE tags available in the public libraries, the 3' ends of the 21 bp tags were truncated in silico to form 14 bp tags. A total of 2,508,608 tags corresponding to 222,337 unique 14 bp tag sequences (379,465; 21 bp parental sequences) were utilized in this analysis. These tags were directly compared to all unique tags from the human SAGE libraries to generate a list of tags found solely in the ESC meta-library.
Tag-to-gene mapping was performed using the comprehensive mapping of SAGE tags (CMOST) software  as follows. Tags were mapped to various publicly available transcript databases in a hierarchical fashion with the highest quality transcript databases used first. As tags were mapped to a known transcript in a higher quality database, they were excluded from further analysis with subsequent lower quality databases to mitigate redundancies arising from lower quality DNA sequence resources. The following databases were used for CMOST tag-to-gene mapping in this order: MGC , RefSeq , Ensembl transcripts  (exon sequences only), Genbank Human Mitochondrial Sequence (accession AY289102.1), Genbank Non-coding sequences , Ensembl genes  (1,000 bp UTR and intron sequences included), Ensembl ESTs , and Golden path genomic contigs (Genbank Human Genome Assembly Contigs build 34, January 2004 ). In addition to allowing perfect matches, the CMOST approach attempts to account for single base permutations, insertions and deletions, improving the rate of tag-to-gene mapping.
SAGE tag-to-gene mapping
LongSAGE tags were mapped to known and computationally predicted transcripts using versions of the following databases available as of March, 2005: RefSeq , RefSeqX , Mammalian Gene Collection , and RefSeqGS . Tags were also mapped to human genomic sequence using the NCBI Reference Sequence Genome database , release 35, August 2004. From the genome sequence, a table was generated containing all 27.4 million potential SAGE tags adjacent to genomic NlaIII restriction sites (CATG). Of these, our analysis defined a subset of 19.4 million genomic tag sequences that were unique within the genome.
A second table was generated that stored information about exons: genome sequence contig, transcript orientation, exon number, exon boundary type and nucleotide positions of exon boundaries for all approximately 267,000 exons annotated on release 35 of the Reference Sequence genome. The LongSAGE tag sequences were compared to the unique genomic tag table, yielding sets of genomic positions for all tags in the library. These in turn were compared to the table of exon information, producing a mapping for each tag relative to annotated exons.
For the GO category comparisons, a standard t-test comparing two samples was used. The null hypothesis was that the two samples arose from populations with the same mean and standard deviation. The values within each sample were the number of GO categories represented in each library of the set, nine in the ESC set and four in the normal set. To account for variation due to library size, only the transcripts with the top 1,000 expression values were included. A one-sided p value was reported. Microsoft Excel was used to perform the computation.
To select differentially expressed LongSAGE tags, the ESC and CGN meta-libraries were compared on a tag per tag basis to obtain a p value for the null hypothesis that the two tag frequencies arose from Poisson distributions with the same mean. This was derived using a normal approximation to the Poisson as described by Kal et al. . All transcripts that showed differences with a significance of p < 0.05 were selected. Tag counts were converted to tags per million, and transcripts that differed by less than three-fold were eliminated. All pairs of tags existing within the same transcript were then listed if the differential expression for the two tags was in the opposite direction.
First strand 5' and 3' RACE ready cDNA was synthesized from 2.0 μg of DNase I (DNA-free™ kit; Ambion, Austin, TX, USA) treated RNA using the BD SMART RACE cDNA Amplification kit following the manufacturer's recommended protocol (BD Biosciences Clontech, Mountain View, CA, USA). Gene specific 5' RACE primers were designed using custom scripts and Primer 3  to lie downstream of the target LongSAGE tag with an optimal Tm of 68°C (Additional data file 10). For 3' RACE reactions a series of primers were designed manually based on the 5' RACE clone sequence (Additional data file 10). The cDNA was amplified using the Phusion™ High-Fidelity PCR Kit (MJ Research, Inc., Waltham, MA, USA) following the manufacturer's recommended protocol with the addition of DMSO to a final concentration of 3%. The cycling conditions consisted of an initial denaturation at 98°C for 30 seconds followed by 10 touchdown PCR cycles starting with 98°C for 10 seconds, 72°C (decreased by 1°C in each subsequent cycle) for 15 seconds, 72°C for 30 seconds; then 29 cycles of 98°C for 10 seconds, 62°C for 15 seconds, 72°C for 30 seconds; followed by an extension at 72°C for 10 minutes. PCR product for each sample (10 μl) was loaded on a 1.2% agarose gel and subjected to electrophoresis for 3.5 hours at 110 mA in 1× TBE buffer (Tris/Boric Acid/EDTA). The gel was stained with SYBR Green (Mandel, Guelph, ON, Canada) and visualized using a Typhoon 9400 Variable Mode Imager (Amersham, Baie d'Urfe, PQ, Canada). Amplicons were extracted from the gel, purified and cloned into the pCR4®-TOPO® vector using the TOPO TA Cloning® Kit for Sequencing (Invitrogen). Plasmid vectors were electroporated into bacterial cells, and recombinant clones were selected on agar plates containing appropriate antibiotics as described . Glycerol stocks were prepared from 12 individual clone isolates per amplicon and stored in 384-well plates. Clone inserts were sequenced on an ABI PRISM 3730 XL DNA Analyzer using BigDye primer cycle sequencing reagents (Applied Biosystems, Foster City, CA, USA).
RNA was obtained from H9 cells before and after induction of differentiation using a 30-day embryoid body protocol. Undifferentiated H9 cells maintained for 7 days on matrigel (BD Biosciences, San Jose, CA, USA) in media conditioned by mouse embryonic fibroblasts and supplemented with 4 ng/ml fibroblast growth factor (bFGF-2) were harvested for embryoid body formation. Briefly, the cells were incubated with TrypLE (Invitrogen) for 10 minutes at 37°C and then collected by scraping. Resultant cell aggregates were subsequently cultured in non-adherent dishes using KOSR-based media without FGF2, for 15 to 30 days. At appropriate time-points RNA was extracted into Trizol. cDNA was synthesized from 2.0 ug of DNase I (DNA-free™ kit, Ambion) treated total RNA using the SuperScript Choice System following the manufacturer's recommended protocol (Invitrogen). Gene specific primer pairs were designed using custom scripts and Primer 3  to amplify approximately 150 bp of the target gene with an optimal Tm of 68°C (Additional data file 10). Whenever possible amplicons were designed to cross exon/intron boundaries. Amplification was performed in a 10 μl reaction mixture containing 5 μl of 2× SYBR Green PCR Master Mix (Applied Biosystems), 2 μl of template cDNA, and 250 pmol of the forward and reverse primer pair. After preparation of the reaction mixtures in 96-well plates, the plates were centrifuged at 800 rpm for 1 minute in an Eppendorf 5810 swing rotor centrifuge (Eppendorf, Westbury, NY, USA). Amplification and detection were performed on an ABI Prism 7600 Sequence Detection System (Applied Biosystems). The PCR protocol consisted of the following: a single cycle of 10 minute at 95°C and 40 two-step cycles, with one cycle consisting of 15 seconds at 95°C and 60 seconds at 60°C. Results were analyzed as described  using a GAPDH probe for normalization.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a summary of mouse specific tag types identified. Additional data file 2 is a table of genomic mappings for 268,515 unique tag sequences found in nine independent human embryonic stem cell lines. Additional data file 3 is a Gene Ontology analysis of nine independent human embryonic stem cells. Tag counts are expressed for each GO category for the top 1,000 by tag count. Additional data file 4 lists statistically significant differentially expressed LongSAGE tags found between embryonic stem cells and terminally differentiated tissues. Additional data file 5 is a table listing the 4,337 genes found in common across 8 undifferentiated human embryonic stem cell lines. Additional data file 6 is a table listing the 20,047 LongSAGE tags exclusively expressed in embryonic stem cell lines. Additional data file 7 is a table listing the 634 LongSAGE tags exclusively expressed in ESCs that uniquely map to the human genome at least 2 kb away from an annotated transcript. Additional data file 8 is a table listing the 301 LongSAGE tags exclusively expressed in ESCs that uniquely map to species conserved regions of the human genome at least 2 kb away from an annotated transcript. Additional data file 9 is a table listing the 52 ESC specific transcripts identified by 5' RACE. Additional data file 10 lists the RACE and qPCR primer sequences used in this study.
We are grateful to MF Pera (Monash Institute of Medical Research, Monash University and the Australian Stem Cell Center, Clayton, Victoria, Australia), MT Firpo (Department of Obstetrics, Gynecology and Reproductive Sciences, University of California San Francisco, San Francisco, CA) and BresaGen Inc. (Athens, GA), for providing human ESC RNA samples. This project was supported by funds from the National Cancer Institute, National Institutes of Health, under Contract No. N01-C0-12400 and by grants from Genome Canada, Genome British Columbia and the Canadian Stem Cell Network to MAM and CE. MAM is a Scholar of the Michael Smith Foundation for Health Research and is a Terry Fox Young Investigator of the National Cancer Institute of Canada. The content of this publication does not necessarily reflect the views or policies of the US Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the US Government.
- Evans MJ, Kaufman MH: Establishment in culture of pluripotential cells from mouse embryos. Nature. 1981, 292: 154-156. 10.1038/292154a0.PubMedView ArticleGoogle Scholar
- Thomson JA, Itskovitz-Eldor J, Shapiro SS, Waknitz MA, Swiergiel JJ, Marshall VS, Jones JM: Embryonic stem cell lines derived from human blastocysts. Science. 1998, 282: 1145-1147. 10.1126/science.282.5391.1145.PubMedView ArticleGoogle Scholar
- Scholer HR, Balling R, Hatzopoulos AK, Suzuki N, Gruss P: Octamer binding proteins confer transcriptional activity in early mouse embryogenesis. EMBO J. 1989, 8: 2551-2557.PubMedPubMed CentralGoogle Scholar
- Mitsui K, Tokuzawa Y, Itoh H, Segawa K, Murakami M, Takahashi K, Maruyama M, Maeda M, Yamanaka S: The homeoprotein Nanog is required for maintenance of pluripotency in mouse epiblast and ES cells. Cell. 2003, 113: 631-642. 10.1016/S0092-8674(03)00393-3.PubMedView ArticleGoogle Scholar
- Chambers I, Colby D, Robertson M, Nichols J, Lee S, Tweedie S, Smith A: Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell. 2003, 113: 643-655. 10.1016/S0092-8674(03)00392-1.PubMedView ArticleGoogle Scholar
- Avilion AA, Nicolis SK, Pevny LH, Perez L, Vivian N, Lovell-Badge R: Multipotent cell lineages in early mouse development depend on SOX2 function. Genes Dev. 2003, 17: 126-140. 10.1101/gad.224503.PubMedPubMed CentralView ArticleGoogle Scholar
- Sutton J, Costa R, Klug M, Field L, Xu D, Largaespada DA, Fletcher CF, Jenkins NA, Copeland NG, Klemsz M, et al: Genesis, a winged helix transcriptional repressor with expression restricted to embryonic stem cells. J Biol Chem. 1996, 271: 23126-23133. 10.1074/jbc.271.38.23126.PubMedView ArticleGoogle Scholar
- Wilder PJ, Kelly D, Brigman K, Peterson CL, Nowling T, Gao QS, McComb RD, Capecchi MR, Rizzino A: Inactivation of the FGF-4 gene in embryonic stem cells alters the growth and/or the survival of their early differentiated progeny. Dev Biol. 1997, 192: 614-629. 10.1006/dbio.1997.8777.PubMedView ArticleGoogle Scholar
- Yuan H, Corbi N, Basilico C, Dailey L: Developmental-specific activity of the FGF-4 enhancer requires the synergistic action of Sox2 and Oct-3. Genes Dev. 1995, 9: 2635-2645. 10.1101/gad.9.21.2635.PubMedView ArticleGoogle Scholar
- Xu RH, Chen X, Li DS, Li R, Addicks GC, Glennon C, Zwaka TP, Thomson JA: BMP4 initiates human embryonic stem cell differentiation to trophoblast. Nat Biotech. 2002, 20: 1261-1264. 10.1038/nbt761.View ArticleGoogle Scholar
- Thomson JA, Odorico JS: Human embryonic stem cell and embryonic germ cell lines. Trends Biotechnol. 2000, 18: 53-57. 10.1016/S0167-7799(99)01410-9.PubMedView ArticleGoogle Scholar
- Sato N, Meijer L, Skaltsounis L, Greengard P, Brivanlou AH: Maintenance of pluripotency in human and mouse embryonic stem cells through activation of Wnt signaling by a pharmacological GSK-3-specific inhibitor. Nat Med. 2004, 10: 55-63. 10.1038/nm979.PubMedView ArticleGoogle Scholar
- Sato N, Sanjuan IM, Heke M, Uchida M, Naef F, Brivanlou AH: Molecular signature of human embryonic stem cells and its comparison with the mouse. Dev Biol. 2003, 260: 404-413. 10.1016/S0012-1606(03)00256-2.PubMedView ArticleGoogle Scholar
- Rao M: Conserved and divergent paths that regulate self-renewal in mouse and human embryonic stem cells. Dev Biol. 2004, 275: 269-286. 10.1016/j.ydbio.2004.08.013.PubMedView ArticleGoogle Scholar
- Richards M, Tan SP, Tan JH, Chan WK, Bongso A: The transcriptome profile of human embryonic stem cells as defined by SAGE. Stem Cells. 2004, 22: 51-64. 10.1634/stemcells.22-1-51.PubMedView ArticleGoogle Scholar
- Brimble SN, Zeng X, Weiler DA, Luo Y, Liu Y, Lyons IG, Freed WJ, Robins AJ, Rao MS, Schulz TC: Karyotypic stability, genotyping, differentiation, feeder-free maintenance, and gene expression sampling in three human embryonic stem cell lines derived prior to August 9, 2001. Stem Cells Dev. 2004, 13: 585-597. 10.1089/scd.2004.13.585.PubMedView ArticleGoogle Scholar
- Brandenberger R, Wei H, Zhang S, Lei S, Murage J, Fisk GJ, Li Y, Xu C, Fang R, Guegler K, et al: Transcriptome characterization elucidates signaling networks that control human ES cell growth and differentiation. Nat Biotechnol. 2004, 22: 707-716. 10.1038/nbt971.PubMedView ArticleGoogle Scholar
- Brandenberger R, Khrebtukova I, Thies RS, Miura T, Jingli C, Puri R, Vasicek T, Lebkowski J, Rao M: MPSS profiling of human embryonic stem cells. BMC Dev Biol. 2004, 4: 10-10.1186/1471-213X-4-10.PubMedPubMed CentralView ArticleGoogle Scholar
- Sperger JM, Chen X, Draper JS, Antosiewicz JE, Chon CH, Jones SB, Brooks JD, Andrews PW, Brown PO, Thomson JA: Gene expression patterns in human embryonic stem cells and human pluripotent germ cell tumors. Proc Natl Acad Sci USA. 2003, 100: 13350-13355. 10.1073/pnas.2235735100.PubMedPubMed CentralView ArticleGoogle Scholar
- Ginis I, Luo Y, Miura T, Thies S, Brandenberger R, Gerecht-Nir S, Amit M, Hoke A, Carpenter MK, Itskovitz-Eldor J, et al: Differences between human and mouse embryonic stem cells. Dev Biol. 2004, 269: 360-380. 10.1016/j.ydbio.2003.12.034.PubMedView ArticleGoogle Scholar
- Bhattacharya B, Miura T, Brandenberger R, Mejido J, Luo Y, Yang AX, Joshi BH, Ginis I, Thies RS, Amit M, et al: Gene expression in human embryonic stem cell lines: unique molecular signature. Blood. 2004, 103: 2956-2964. 10.1182/blood-2003-09-3314.PubMedView ArticleGoogle Scholar
- Abeyta MJ, Clark AT, Rodriguez RT, Bodnar MS, Pera RA, Firpo MT: Unique gene expression signatures of independently-derived human embryonic stem cell lines. Human Mol Genet. 2004, 13: 601-608. 10.1093/hmg/ddh068.View ArticleGoogle Scholar
- Mah N, Thelin A, Lu T, Nikolaus S, Kuhbacher T, Gurbuz Y, Eickhoff H, Kloppel G, Lehrach H, Mellgard B, et al: A comparison of oligonucleotide and cDNA-based microarray systems. Physiol Genomics. 2004, 16: 361-370. 10.1152/physiolgenomics.00080.2003.PubMedView ArticleGoogle Scholar
- Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics. 2002, 18: 405-412. 10.1093/bioinformatics/18.3.405.PubMedView ArticleGoogle Scholar
- Jenssen TK, Langaas M, Kuo WP, Smith-Sorensen B, Myklebost O, Hovig E: Analysis of repeatability in spotted cDNA microarrays. Nucleic Acids Res. 2002, 30: 3235-3244. 10.1093/nar/gkf441.PubMedPubMed CentralView ArticleGoogle Scholar
- Ramalho-Santos M, Yoon S, Matsuzaki Y, Mulligan RC, Melton DA: "Stemness": transcriptional profiling of embryonic and adult stem cells. Science. 2002, 298: 597-600. 10.1126/science.1072530.PubMedView ArticleGoogle Scholar
- Ivanova NB, Dimos JT, Schaniel C, Hackney JA, Moore KA, Lemischka IR: A stem cell molecular signature. Science. 2002, 298: 601-604. 10.1126/science.1073823.PubMedView ArticleGoogle Scholar
- Fortunel NO, Otu HH, Ng HH, Chen J, Mu X, Chevassut T, Li X, Joseph M, Bailey C, Hatzfeld JA, et al: Comment on "'Stemness': transcriptional profiling of embryonic and adult stem cells" and "a stem cell molecular signature". Science. 2003, 302: 393-10.1126/science.1086384.PubMedView ArticleGoogle Scholar
- Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE: Using the transcriptome to annotate the genome. Nat Biotechnol. 2002, 20: 508-512. 10.1038/nbt0502-508.PubMedView ArticleGoogle Scholar
- Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C, Asano J, Babakaiff R, Barber S, Beland J, Bohacec S, et al: A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. Proc Natl Acad Sci USA. 2005, 102: 18485-18490. 10.1073/pnas.0509455102.PubMedPubMed CentralView ArticleGoogle Scholar
- Pearson K: Mathematical contributions to the theory of evolution III. Regression, heredity and panmixia. Phil Trans R Soc Lond Series A. 1896, 187: 253-318. 10.1098/rsta.1896.0007.View ArticleGoogle Scholar
- The Cancer Genome Anatomy Project. [http://cgap.nci.nih.gov]
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS: Human MicroRNA Targets. PLOS Biol. 2004, 2: e363-10.1371/journal.pbio.0020363.PubMedPubMed CentralView ArticleGoogle Scholar
- Houbaviy HB, Murray MF, Sharp PA: Embryonic stem cell-specific MicroRNAs. Dev Cell. 2003, 5: 351-358. 10.1016/S1534-5807(03)00227-2.PubMedView ArticleGoogle Scholar
- Dravid G, Ye Z, Hammond H, Chen G, Pyle A, Donovan P, Yu X, Cheng L: Defining the role of Wnt/B-catenin signaling in the survival, proliferation and self-renewal of human embryonic stem cells. Stem Cells Express. 2005, 23: 1489-1501. 10.1634/stemcells.2005-0034.View ArticleGoogle Scholar
- Nichols J, Zevnik B, Anastassiadis K, Niwa H, Klewe-Nebenius D, Chambers I, Scholer H, Smith A: Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell. 1998, 95: 379-391. 10.1016/S0092-8674(00)81769-9.PubMedView ArticleGoogle Scholar
- Baldassarre G, Romano A, Armenante F, Rambaldi M, Paoletti I, Sandomenico C, Pepe S, Staibano S, Salvatore G, De Rosa G, et al: Expression of teratocarcinoma-derived growth factor-1 (TDGF-1) in testis germ cell tumors and its effects on growth and differentiation of embryonal carcinoma cell line NTERA2/D1. Oncogene. 1997, 15: 927-936. 10.1038/sj.onc.1201260.PubMedView ArticleGoogle Scholar
- Henderson JK, Draper JS, Baillie HS, Fishel S, Thomson JA, Moore H, Andrews PW: Preimplantation human embryos and embryonic stem cells show comparable expression of stage-specific embryonic antigens. Stem Cells. 2002, 20: 329-337. 10.1634/stemcells.20-4-329.PubMedView ArticleGoogle Scholar
- Wong RC, Pebay A, Nguyen LT, Koh KL, Pera MF: Presence of functional gap junctions in human embryonic stem cells. Stem Cells. 2004, 22: 883-889. 10.1634/stemcells.22-6-883.PubMedView ArticleGoogle Scholar
- Rao RR, Stice SL: Gene expression profiling of embryonic stem cells leads to greater understanding of pluripotency and early developmental events. Biol Reprod. 2004, 71: 1772-1778. 10.1095/biolreprod.104.030395.PubMedView ArticleGoogle Scholar
- Besser D: Expression of nodal, lefty-a, and lefty-B in undifferentiated human embryonic stem cells requires activation of Smad2/3. J Biol Chem. 2004, 279: 45076-45084. 10.1074/jbc.M404979200.PubMedView ArticleGoogle Scholar
- Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2001, 29: 137-140. 10.1093/nar/29.1.137.PubMedPubMed CentralView ArticleGoogle Scholar
- Strausberg RL, Feingold EA, Klausner RD, Collins FS: The mammalian gene collection. Science. 1999, 286: 455-457. 10.1126/science.286.5439.455.PubMedView ArticleGoogle Scholar
- Embryonic Stem Cell Transcriptomes. [http://www.transcriptomes.org]
- Chen J, Sun M, Lee S, Zhou G, Rowley JD, Wang SM: Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc Natl Acad Sci USA. 2002, 99: 12257-12262. 10.1073/pnas.192436499.PubMedPubMed CentralView ArticleGoogle Scholar
- Frohman MA, Dush MK, Martin GR: Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proc Natl Acad Sci USA. 1988, 85: 8998-9002. 10.1073/pnas.85.23.8998.PubMedPubMed CentralView ArticleGoogle Scholar
- Chenchik A, Diachenko L, Moqadam F, Tarabykin V, Lukyanov S, Siebert PD: Full-length cDNA cloning and determination of mRNA 5' and 3' ends by amplification of adaptor-ligated cDNA. Biotechniques. 1996, 21: 526-534.PubMedGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618. 10.1101/gr.361602.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155: 431-449.PubMedPubMed CentralGoogle Scholar
- Alvarez-Bolado G, Zhou X, Cecconi F, Gruss P: Expression of Foxb1 reveals two strategies for the formation of nuclei in the developing ventral diencephalon. Dev Neurosci. 2000, 22: 197-206. 10.1159/000017442.PubMedView ArticleGoogle Scholar
- Alvarez-Bolado G, Zhou X, Voss AK, Thomas T, Gruss P: Winged helix transcription factor Foxb1 is essential for access of mammillothalamic axons to the thalamus. Development. 2000, 127: 1029-1038.PubMedGoogle Scholar
- Labosky PA, Winnier GE, Jetton TL, Hargett L, Ryan AK, Rosenfeld MG, Parlow AF, Hogan BL: The winged helix gene, Mf3, is required for normal development of the diencephalon and midbrain, postnatal growth and the milk-ejection reflex. Development. 1997, 124: 1263-1274.PubMedGoogle Scholar
- Uptain SM, Kane CM, Chamberlin MJ: Basic mechanisms of transcript elongation and its regulation. Annu Rev Biochem. 1997, 66: 117-172. 10.1146/annurev.biochem.66.1.117.PubMedView ArticleGoogle Scholar
- Kuroda T, Tada M, Kubota H, Kimura H, Hatano SY, Suemori H, Nakatsuji N, Tada T: Octamer and Sox elements are required for transcriptional cis regulation of Nanog gene expression. Mol Cell Biol. 2005, 25: 2475-2485. 10.1128/MCB.25.6.2475-2485.2005.PubMedPubMed CentralView ArticleGoogle Scholar
- Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N: Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 2003, 4: R74-10.1186/gb-2003-4-11-r74.PubMedPubMed CentralView ArticleGoogle Scholar
- Torrents D, Suyama M, Zdobnov E, Bork P: A genome-wide survey of human pseudogenes. Genome Res. 2003, 13: 2559-2567. 10.1101/gr.1455503.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Harrison PM, Liu Y, Gerstein M: Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 2003, 13: 2541-2558. 10.1101/gr.1429003.PubMedPubMed CentralView ArticleGoogle Scholar
- Balakirev ES, Ayala FJ: Pseudogenes: are they "junk" or functional DNA?. Annu Rev Genet. 2003, 37: 123-151. 10.1146/annurev.genet.37.040103.103949.PubMedView ArticleGoogle Scholar
- Mighell AJ, Smith NR, Robinson PA, Markham AF: Vertebrate pseudogenes. FEBS Lett. 2000, 468: 109-114. 10.1016/S0014-5793(00)01199-6.PubMedView ArticleGoogle Scholar
- Harrison PM, Zheng D, Zhang Z, Carriero N, Gerstein M: Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res. 2005, 33: 2374-2383. 10.1093/nar/gki531.PubMedPubMed CentralView ArticleGoogle Scholar
- Bortvin A, Eggan K, Skaletsky H, Akutsu H, Berry DL, Yanagimachi R, Page DC, Jaenisch R: Incomplete reactivation of Oct4-related genes in mouse embryos cloned from somatic nuclei. Development. 2003, 130: 1673-1680. 10.1242/dev.00366.PubMedView ArticleGoogle Scholar
- Boyer L, Lee TI, Cole MF, Johnstone SE, Zucker JP, Young RA: Core transcriptional regulatory circuitry in human embyronic stem cells. Cell. 2005, 122: 947-956. 10.1016/j.cell.2005.08.020.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ: BLAT - the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Stem Cell Information. [http://stemcells.nih.gov]
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.PubMedView ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.PubMedView ArticleGoogle Scholar
- Gene Expression Omnibus. [http://www.ncbi.nlm.nih.gov/geo]
- Discovery Space. [http://www.bcgsc.ca/bioinfo/software/discoveryspace/]
- The Mammalian Gene Collection. [http://mgc.nci.nih.gov]
- NCBI Reference Sequence. [http://www.ncbi.nlm.nih.gov/RefSeq]
- Ensembl Genome Browser. [http://www.ensembl.org]
- GenBank. [http://www.ncbi.nlm.nih.gov/Genbank]
- Kal AJ, van Zonneveld AJ, Benes V, van den Berg M, Koerkamp MG, Albermann K, Strack N, Ruijter JM, Richter A, Dujon B, et al: Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources. Mol Biol of the Cell. 1999, 10: 1859-1872.View ArticleGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMedGoogle Scholar
- Baross A, Butterfield YS, Coughlin SM, Zeng T, Griffith M, Griffith OL, Petrescu AS, Smailus DE, Khattra J, McDonald HL, et al: Systematic recovery and analysis of full-ORF human cDNA clones. Genome Res. 2004, 14: 2083-2092. 10.1101/gr.2473704.PubMedPubMed CentralView ArticleGoogle Scholar
- Muller PY, Janovjak H, Miserez AR, Dobbie Z: Processing of gene expression data generated by quantitative real-time RT-PCR. Biotechniques. 2002, 32: 1372-1374.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.