Heart-specific genes revealed by expressed sequence tag (EST) sampling
© Mégy et al., licensee BioMed Central Ltd 2002
Received: 31 July 2002
Accepted: 11 October 2002
Published: 25 November 2002
Cardiovascular diseases are the primary cause of death worldwide; the identification of genes specifically expressed in the heart is thus of major biomedical interest. We carried out a comprehensive analysis of gene-expression profiles using expressed sequence tags (ESTs) to identify genes overexpressed in the human adult heart. The initial set of genes expressed in the heart was constructed by clustering and assembling ESTs from heart cDNA libraries. Expression profiles were then generated for each gene by counting their cognate ESTs in all libraries. Differential expression was assessed by applying a previously published statistical procedure to these profiles.
We identified 35 cardiac-specific genes overexpressed in the heart, some of which displayed significant coexpression. Some genes had no previously recognized cardiac function. Of the 35 genes, 32 were mapped back onto the human genome sequence. According to Online Mendelian Inheritance in Man (OMIM), five genes were previously known as heart-disease genes and one gene was located in the locus of a bleeding disorder. Analysis of the promoter regions of this collection of genes provides the first list of putative regulatory elements associated with differential cardiac expression.
This study shows that ESTs are still a powerful tool to identify differentially expressed genes. We present a list of genes specifically expressed in the human heart, one of which is a candidate for a bleeding disorder. In addition, we provide the first set of putative regulatory elements, the combination of which appears correlated with heart-specific gene expression.
The notion of differential expression is central to the identification of target genes of biomedical interest . At the transcriptome level, differentially expressed genes are defined as those exhibiting markedly different amounts of cognate mRNA in one particular tissue, disease or developmental stage, compared with other tissues. Differential expression is involved in various processes, such as development, metabolism, cellular differentiation or pathological states. The identification of differentially expressed genes directs the development of functional assays to understand those biological processes. The analysis of coexpression of differentially expressed genes is also of biomedical interest. Coexpressed genes are defined by their correlated expression patterns in time or in space. Genes involved in a common cellular function (for example, within a metabolic pathway or encoding interacting proteins) often display significant coexpression. Coexpressed genes might be co-regulated, through common regulatory elements (transcription factor, transcription signal). Coexpression can be used to guide the comparative analyses of promoter regions by enhancing our capacity to identify the weak signature of regulation elements .
The most convenient approaches to study differential expression and coexpression use measurements of mRNA abundance. Although mRNA is not the ultimate product of a gene, and that mature protein levels in a cell do not show perfect correlation with the abundance of mRNA [3,4], transcriptional activity and its variation are useful indicators of the involvement of a given gene in a given physiological process. Gene-expression profiling is increasingly carried out using hybridization on various types of microarrays and DNA chips . These techniques are, however, not yet widely available, and have intrinsic limitations of cost and reproducibility. They are not convenient for the comparison of multiple tissues, as they usually require normalization to the same control sample. Moreover, they require access to human tissue samples and can only measure the expression level of the predetermined set of genes spotted on the array.
In 1992, Okubo et al.  proposed the use of large-scale random 3'-end cDNA library sequencing (3' expressed sequence tags; ESTs) as a way of estimating the level of gene expression. The abundance of mRNA is simply estimated from the number of cognate ESTs found in each library, under the assumption that it is proportional to the transcript frequencies. This 'digital' approach has become very popular (reviewed in ); it has the advantage that the expression data for each gene in every tissue tested can be stored in easily accessible databases, once and for all. The detection of mRNA does not depend on a common control, in contrast to the popular microarray protocol , nor does it require access to actual tissue samples. In addition, the EST approach allows the detection of novel genes (or splice variants) expressed in a given sample. As we do not know in advance which genes are to be found expressed in the heart, EST data is well suited to our project.
As cardiovascular diseases are the first cause of death worldwide, the analysis of cardiac genes is of major interest. In this study we have identified genes differentially expressed in the human heart, using their EST frequency in various human adult libraries. We followed a four-step protocol: first, the ESTs detected in the various heart cDNA libraries were assembled in separate contigs, representing genes expressed in the heart; second, we calculated the probability of differential expression in the heart for each contig, using our previously published statistical test ; third, we computed the pairwise correlation of the expression profiles of the contigs found to be differentially expressed in the heart; fourth, we clustered the contigs according to their coexpression level, and represented them as a dendrogram.
On the basis of this approach, we identified a set of new genes specifically overexpressed in the heart and with no previously recognized cardiac function. By locating these genes in the human genome sequence, we identified candidate genes for cardiovascular diseases and gathered the first collection of cardiac gene promoter sequences. The analysis of this collection for the first time suggests a pattern of regulatory elements that appear to characterize the promoters of at least a subset of heart-specific genes.
Heart expression of contigs
We first generated contigs representing genes expressed in the heart, as described in Materials and methods. Cardiac ESTs were grouped into 194 clusters, from which 220 contigs were constructed. Such contigs represent heart transcripts.
Heart-specific expression of contigs
Statistics on EST clusters and contigs
Human heart ESTs
Human heart ESTs cleaned
Contigs differentially expressed
Genes differentially expressed = unique contigs
The expression profile was then derived for each of the 68 contigs. For each of the 438 EST libraries, the contig expression level was computed as the fraction of ESTs matching the contig, relative to the total number of ESTs in the library. The final gene-expression data was thus transformed in a 68 contigs × 438 libraries matrix. This matrix was the basis of all subsequent computations.
Coexpression of contigs and dendrogram representation
Coexpression of contigs
Function of contigs
Accession number in GenBank and function of the heart-specific genes from Figure 1
Gene name in Figure 1
Accession number in GenBank
Heart physiology relation
Myosin heavy chain 7 (MYH 7)
Myosin light chain 2 (MYL 2)
Myosin light chain 2a (MYL 2a)
Myosin light chain polypeptide 4 (MYL 4)
HSP 17 kD
HSP 90 kD
Alpha-enolase 1 (ENO1)
Voltage-dependent anion channel 1 (VDAC-1)
NADH dehydrogenase ubiquinone 1, alpha subcomplex, 4
Protein phosphatase 1, regulatory (inhibitor) subunit (PPP1 R)
Protein phosphatase 1, catalytic subunit - beta isoform (PPP1 CB)
Procollagen-proline, 2-oxoglutarate 4-dioxygenase
Phosphatidyl glycerophosphate synthase
Glutamic oxaloacetic transaminase (GOT1)
Prostaglandine D synthase
Elongation factor eiF 4A
Exostose 2 (EXT2)
Crystallin alpha B
'CGI 146 protein'
Analysis of function
Among these genes, 15 corresponded to functions expected to be highly expressed in the heart, such as NADH dehydrogenase ubiquinone (NM_002489) for energy production, myosin (XM_027060, XM_033374, XM_032189 and XM_004995), tropomyosin (NM_000366) and actin (BC009978), involved in muscle contraction, and to proteins such as colligin (NM_001235) that interact with the abundant heart collagen. Thirteen of the remaining genes exhibited no obvious relationship to the physiology of the heart. Finally, 11 genes had no functional attribute (noted as X, NHF, KIAA or HSP).
Analysis of the dendrogram
Interestingly, most of the genes with related functions clustered together in the dendrogram. Four clusters were obvious: one of contractile proteins (isoforms of tropomyosin - TG72_14,, TG114, TG131 - and myosin TG132_7), a second of troponin isoforms (TG13 and TG154_9), a third of genes without a match in GenBank (NHF), and a fourth due to contamination by Escherichia coli sequences. These vector sequences were not masked in the previous step because they were not included in the RepeatMasker and RepBase databases. These four E. coli sequences were removed from further consideration.
Muscle contribution of cardiac-specific genes
The goal of this study is to identify genes specifically related to the heart physiology. However, this organ is a muscle, and is likely to share similar gene expression with other muscles. The protocol we used to identify heart-specific genes involved the comparison of EST frequencies computed in the merged heart libraries versus the frequencies computed in all other tissue types. The later pool included muscle libraries, but their contribution was diluted with other tissue types. It was thus possible that genes generally over-expressed in muscles were not eliminated in the initial identification of the heart genes. To address this problem, muscle-specific genes were identified by comparison of EST frequencies between all non-muscular tissues versus non-cardiac muscle libraries. Using the same p-value threshold of 0.03, 1,156 contigs were found overexpressed in muscle. Five of the previously identified heart-specific genes were found within this list (Figure 1). Unsurprisingly, they all corresponded to contractile proteins and four of them clustered together on the dendrogram.
To determine the genomic location of the 35 remaining candidate heart-specific genes, they were used as query sequences to search the human genome sequence. A cognate match was found for 32/35 (89%) of the genes. As expected, most of the genes had several partial matches in close proximity along the human genome, separated by some hundreds of nucleotides, corresponding to introns. Most of these genes have been previously linked to a specific chromosome by different techniques. To avoid mapping our candidate genes to potential pseudogenes, we only retained matches on the specific chromosome. The absence of matches can be attributed to the incomplete status of the human genome sequence and/or to the difficulty of identifying short segments of similarity spread over long genomic regions.
Genes linked to monogenic diseases with associated cardiovascular defects
OMIM reference for the disease
Identification of the responsible gene
160781 (heart formation defect)
150000 (enzyme deficiency)
601628 (bleeding disorder)
Modifier of von Willebrand factor
Collection and analysis of promoter sequences
As the 32 genes all appear (statistically) to be specifically expressed in the heart, one might suspect that they share some regulatory elements. Our final step was to analyze their regulatory regions. Core promoter regions were operationally defined as the 1,000 bases upstream of the transcription start site (TSS), if known, up to the end of the 5'-UTR (that is, to the site of translation initiation). The core promoter regions of 17 cardiac genes with known TSSs were extracted. This collection of promoters is the first database of heart promoter sequences and is available at . Three types of regulatory element were searched for: polymerase II promoter elements, known transcription factor sites and new motifs common to most of these promoter sequences.
Over-represented 'words' in the 17 cardiac promoters
The present analysis of cardiac ESTs identified 35 genes as being differentially expressed in the heart. After clustering these genes on the basis of the correlation of their expression profiles, genes with known related function appeared as close neighbors on the resulting tree. They might thus share regulatory regions. The initial analysis was done with a dbEST release of September 1999. To ensure the specificity of the originally identified 32 heart-specific genes, we revalidated them in the light of the most recent dbEST release (February 2002).
Studies based on ESTs and on variation in expression require rigorous statistical validation. As the identification of changes in expression level on the basis of quotients of very small relative abundance is not very meaningful, many methods have been developed to evaluate variation in expression level (see  for a review). We used a previously published test  estimating the probability of differential expression for a gene between two pools of ESTs (cardiac versus non-cardiac) and able to detect a weak differential expression provided the absolute number of tag counts is large enough. Assuming a probabilistic model, this test calculates the probability of observing y ESTs in library B given that we observed x ESTs in library A, a low probability indicating a high differential expression of the cognate gene over the two libraries. This test was independently validated by others and, in a comparative analysis of statistical tests evaluating differential gene expression, it was found to be the most appropriate for pair-wise comparisons of EST libraries .
Our approach suffers from several obvious limitations, shared by all EST-based analyses. First, mRNA level does not always correlate with protein abundance in the cell; thus the EST analyses are not representative of the proteome of a cell. However, these problems also apply to the more expensive and sophisticated microarray techniques. Second, the abundance of transcripts detected depends on the initial EST number: starting with 4,303 cardiac ESTs, only highly expressed genes are expected to show up in our final list of genes. Nevertheless, highly expressed genes are expected to have a significant impact on the physiology and pathology of the heart. Unlike other studies [15,16], we removed from our study genes represented by a single EST (singletons), thus decreasing the overall number of ESTs in consideration, but considerably reducing the danger of taking in the artifacts induced by this unreliable data.
Using an EST approach, Hwang et al.  characterized gene transcription and identified genes overexpressed in cardiac hypertrophy. They generated about 77,000 ESTs, half of them corresponding to 5,000 unique known genes and expressed in the heart. A large fraction of those genes may be represented by very-low-copy ESTs (singletons) that may arise from tissue contamination (for example by blood). Moreover, this count cannot be used to estimate the expression level of cardiac genes because the EST frequency is not given. A larger number of ESTs would thus be required to increase the sensitivity of our study. Although more cardiac ESTs were recently generated, none of them was used in our study because they were all generated from normalized libraries.
Protocols using EST numbers to estimate gene-expression levels were successfully used in previous studies [17,18,19]. For instance, Bortoluzzi et al.  reconstructed the human adult skeletal muscle transcriptional profile where they found a good agreement between their results and a SAGE (serial analysis of gene expression) experiment.
We compared our results to previous EST analyses of the heart transcriptome [15,16,20]. Each cardiac gene (except TG106, TG110, TG90 and TG94) given in our list was found in at least one of these studies. We also noticed that these lists are not exactly similar and that some genes were detected in one study only.
Expression in the heart of the 15 genes without any previous known cardiac function
Gene name and function
Crystallin alpha B
RSU - RSP 1
Prostaglandin D synthase
PPPP1-CB (Regulation of myosin)
PPPP1 -R12A (Regulation of myosin)
KIAA 0471_IDN4 (Similar to the actin-binding protein of the fly)
Procollagen proline-2 oxoglutarate-4-dioxygenase
We generated genes starting from ESTs found in cardiac libraries, thus focusing on genes expressed in the heart. In general, these genes can be expressed in other tissues as well. For example, genes involved in energy metabolism and in contraction are highly expressed in the heart, as well as in muscle and sperm . Those cell types require energy and involve muscle contraction or motility. As the heart is a vascular and a muscular organ, identifying 'truly' heart-specific genes (that is, those involved in vascular function) requires elimination of genes specifically overexpressed in generic muscular tissues. Some genes encode proteins with multiple splice isoforms (such as the contractile proteins); some of these are specific to skeletal muscle and others to heart muscle. Such genes were analyzed more precisely to find out exactly which isoform was involved. The isoforms found in this study are known to be expressed in the heart or in all muscle cells: no isoforms specific to skeletal muscle were found.
We identified 35 genes likely to be involved in heart-specific functions, either vascular or neuro-muscular. Five were previously known as disease genes with cardiovascular symptoms, and one gene, clustering close to other disease-linked genes in the dendrogram, lies near a locus associated with a bleeding disorder (OMIM: 60162). This direct validation of our approach allows us to propose that the remaining heart-specific genes identified in this study might be of biomedical interest. Such genes are candidates for further linkage analysis.
Through a computational analysis of the promoter regions of these cardiac genes, we identified a combination of five main motifs that could participate in their specific expression. These motifs should not be considered as individual elements but as a module of organized elements (in position, order and/or distance) controlling gene expression . As all the genes specifically expressed in a tissue are not expected to be regulated by the same elements, the combination revealed in this study may be involved in the regulation of a subset of cardiac genes. Of 17 genes, five have this combination in their promoters and may constitute this subset, or a part of this subset. The combination was searched for in the human sequences in the Eukaryotic Promoter Database (EPD). Eighteen promoters from EPD have this combination, half of them being promoters of cardiac genes and thus co-regulated with the genes identified in this study. Others may be promoters of genes whose cardiac activity has not previously been recognized.
Homologous genes in related organisms often share the same regulatory elements. As most of the mouse genome is now available, the motif combination was searched for in the promoters of mouse orthologs. No identical combination of elements was found in the orthologous mouse promoter regions, probably because of evolutionary divergence. The absence of identifiable conservation of these motifs suggests that they may be involved in the fine regulation of a subset of cardiac genes rather than in 'constitutive' cardiac expression.
Materials and methods
EST databases and contigs
Human heart EST libraries represented in dbEST
dbEST library identifier
Number of ESTs
Human heart cDNA library
Human heart cDNA library
Atrium cDNA library human heart
Atrium cDNA library human heart
Fetal heart normalized
Human heart cDNA normalized
Pooled human melanocyte, fetal heart, and pregnant uterus
Pooled human melanocyte, fetal heart, and pregnant uterus
Adult heart, male 25 years
Adult heart, subtracted
Fetal heart, subtracted
Fetal heart + brain + liver
Adult heart muscle
Differential gene expression
To assess its differential expression, every contig was compared to the total EST set for high stringency (P-value < 10-20). For each contig, the hit list of cognate matches was then separated into two groups: ESTs from cardiac libraries versus any other libraries. Given the count of cognate ESTs and the total number of ESTs in both groups, we evaluated the differential expression of the contig in heart.
The statistical significance of the difference in frequencies (x/N1, y/N2) between these two groups was computed according to :
where x and y are the numbers of ESTs, respectively, from cardiac libraries and from others matching the contig, and N1 and N2 are the total numbers of ESTs, respectively, from cardiac libraries and from others. A p-value < 3% threshold was used to classify contigs as selectively expressed in the heart. This limit was chosen as a compromise between the predicted rate of false positives (0.03 × 220 = 6.6) and retaining enough potential candidates for the remainder of the study. The p-value here is not taken as a way to assess the actual statistical significance of the result, but as the rational way to prioritize our study .
Expression profile and correlation
The expression profile of each contig was computed from the number of cognate ESTs in each library constituting the total EST set relative to the total number of ESTs in the library. All expression profiles were stored in a matrix with rows corresponding to contigs and columns corresponding to libraries. Element M ij of the matrix corresponds to the relative frequency of cognate ESTs for contig i in library j.
The similarity of expression profile between contigs was estimated by computing the value of Pearson's r coefficient in a pairwise manner between each row. This coefficient takes values within the [-1, +1] range. Values close to 0 indicate no correlation, positive values denote a positive correlation (contigs going up and down together), and negative values denote opposite patterns of contig expression. A matrix of pairwise gene distances was then derived from the correlation matrix.
Distance matrix and tree representation
The hierarchical classification of objects requires the calculation of a matrix of their pairwise distances. The contig correlation matrix constructed previously was turned into such a distance matrix by computing the Euclidean distance d between genes X and Y from the columns of the correlation matrix:
Identification of homologous sequences
Contigs were functionally annotated by querying GenBank (release 128.0) with the program BLAST, version 2.0.8. Cognate matches were initially identified using a threshold of 98% sequence identity, followed by an extensive bibliographical analysis of the matching entry.
To locate the genes on the human genome, we compared their sequences to the human draft, daily updated, and available online at the National Center for Biotechnology Information (NCBI) . If 70% of the query length matched the genomic sequence with a score > 200, the query sequence was considered as being successfully located in the human genome. As genes were built from ESTs, they represent gene transcripts. As expected, many of the genomic matches were found to be separated by intron sequences.
Collection and analysis of promoter sequences
We further analyzed the putative promoter/regulatory sequences of the candidate genes for which a transcription start site (TSS) was previously identified. Seventeen core promoter regions were thus extracted from the human genome sequence assembly as the 1,000 bases upstream of the TSS and the 5'-UTR. These sequences constitute the first collection of putative heart specific promoters and are available at . We analyzed these regulatory regions by locating polymerase II promoter elements and by searching for known and new regulatory elements common to all the sequences. As we were searching for new elements, we did not consider the known cardiac sites (such as GATA and Sp1).
Polymerase II promoter elements were determined with Tfbind , considering the TRANSFAC matrix V$CAAT, V$GC, V$TATA, and V$CAP only. We only retained the matches found at the expected locations: in the -105 to -70 region for the CAAT box, in the -74 to -45 region for the GC box, in the -20 to -30 region for the TATA box, and in the -5 to +5 region for the CAP box; +1 being the transcription start site (TSS). Known transcription factor sites (limited to the vertebrate matrix group) were searched for using MatInspector and the TRANSFAC database . Default parameters were used.
Over-represented words were identified using RSA-tools . Oligonucleotides (4 to 8 residues in size) were counted on both strands, to detect orientation-insensitive elements. The expected frequency was calculated from the human promoters of the EPD release 70 [34,35]. Only slight differences were observed when changing pseudo-weights from 0.10 to 0.20. Other parameters were kept to their default values.
MEME version 3.0 [36,37] was used to reveal new motifs that might be common to all the promoter sequences. Motifs with a number of sites between 2 and 300 and a width from 6 to 50 nucleotides were searched for.
The EPD database release 71  was used to get the background sequences.
K.M. was supported by a grant from AVENTIS Pharma and the region Provence-Alpes-Côte d'Azur. We thank F. Gosse, C. Notredame, H. Ogata and K. Suhre for critically reading the manuscript.
- Claverie JM: Computational methods for the identification of differential and coordinated gene expression. Hum Mol Genet. 1999, 8: 1821-1832. 10.1093/hmg/8.10.1821.PubMedView ArticleGoogle Scholar
- Brazma A, Jonassen I, Vilo J, Ukkonen E: Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 1998, 8: 1202-1215.PubMedPubMed CentralGoogle Scholar
- Anderson L, Seilhamer J: A comparison of selected mRNA and protein abundances in human liver. Electrophoresis. 1997, 18: 533-537.PubMedView ArticleGoogle Scholar
- Gygi SP, Rochon Y, Franza BR, Aebersold R: Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 1999, 19: 1720-1730.PubMedPubMed CentralView ArticleGoogle Scholar
- Kurian KM, Watson CJ, Wyllie AH: DNA chip technology. J Pathol. 1999, 187: 267-271. 10.1002/(SICI)1096-9896(199902)187:3<267::AID-PATH275>3.3.CO;2-R.PubMedView ArticleGoogle Scholar
- Okubo K, Hori N, Matoba R, Niiyama T, Fukushima A, Kojima Y, Matsubara K: Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nat Genet. 1992, 2: 173-179.PubMedView ArticleGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270: 467-470.PubMedView ArticleGoogle Scholar
- Audic S, Claverie JM: The significance of digital gene expression profiles. Genome Res. 1997, 7: 986-995.PubMedGoogle Scholar
- OMIM. [http://www3.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM]
- Cardiac gene database. [http://igs-server.cnrs-mrs.fr/Card_Gene/]
- Tsunoda T, Takagi T: Estimating transcription factor bindability on DNA. Bioinformatics. 1999, 15: 622-630. 10.1093/bioinformatics/15.7.622.PubMedView ArticleGoogle Scholar
- Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 2000, 28: 316-319. 10.1093/nar/28.1.316.PubMedPubMed CentralView ArticleGoogle Scholar
- van Helden J, Andre B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998, 281: 827-842. 10.1006/jmbi.1998.1947.PubMedView ArticleGoogle Scholar
- Romualdi C, Bortoluzzi S, Danieli GA: Detecting differentially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests. Hum Mol Genet. 2001, 10: 2133-2141. 10.1093/hmg/10.19.2133.PubMedView ArticleGoogle Scholar
- Bortoluzzi S, d'Alessi F, Danieli GA: A computational reconstruction of the adult human heart transcriptional profile. JMol Cell Cardiol. 2000, 32: 1931-1938. 10.1006/jmcc.2000.1227.View ArticleGoogle Scholar
- Hwang DM, Dempsey AA, Lee CY, Liew CC: Identification of differentially expressed genes in cardiac hypertrophy by analysis of expressed sequence tags. Genomics. 2000, 66: 1-14. 10.1006/geno.2000.6171.PubMedView ArticleGoogle Scholar
- Schmitt AO, Specht T, Beckmann G, Dahl E, Pilarsky CP, Hinzmann B, Rosenthal A: Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues. Nucleic Acids Res. 1999, 27: 4251-4260. 10.1093/nar/27.21.4251.PubMedPubMed CentralView ArticleGoogle Scholar
- Stekel DJ, Git Y, Falciani F: The comparison of gene expression from multiple cDNA libraries. Genome Res. 2000, 10: 2055-2061. 10.1101/gr.GR-1325RR.PubMedPubMed CentralView ArticleGoogle Scholar
- Bortoluzzi S, d'Alessi F, Romualdi C, Danieli GA: The human adult skeletal muscle transcriptional profile reconstructed by a novel computational approach. Genome Res. 2000, 10: 344-349. 10.1101/gr.10.3.344.PubMedPubMed CentralView ArticleGoogle Scholar
- Liew CC, Hwang DM, Fung YW, Laurenssen C, Cukerman E, Tsui S, Lee CY: A catalogue of genes in the cardiovascular system as identified by expressed sequence tags. Proc Natl Acad Sci USA. 1994, 91: 10645-10649.PubMedPubMed CentralView ArticleGoogle Scholar
- Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, et al: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA. 2002, 99: 4465-4470. 10.1073/pnas.012025199.PubMedPubMed CentralView ArticleGoogle Scholar
- Aydin S, Yilmaz Y, Odabas O, Sekeroglu R, Tarakcioglu M, Atilla MK: A further study of seminal plasma: lactate dehydrogenase and lactate dehydrogenase-X activities and diluted semen absorbance. Eur J Clin Chem Clin Biochem. 1997, 35: 261-264.PubMedGoogle Scholar
- Fessele S, Maier H, Zischek C, Nelson PJ, Werner T: Regulatory context is a crucial part of gene function. Trends Genet. 2002, 18: 60-63. 10.1016/S0168-9525(02)02591-X.PubMedView ArticleGoogle Scholar
- Boguski MS, Lowe TM, Tolstoshev CM: dbEST - database for "expressed sequence tags". Nat Genet. 1993, 4: 332-333.PubMedView ArticleGoogle Scholar
- Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000, 16: 418-420. 10.1016/S0168-9525(00)02093-X.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.PubMedView ArticleGoogle Scholar
- Huang X: A contig assembly program based on sensitive detection of fragment overlaps. Genomics. 1992, 14: 18-25.PubMedView ArticleGoogle Scholar
- Sokal R, Michener C: A statistical method for evaluating systematic relationship. Univ Kansas Sci Bull. 1958, 28: 1409-1438.Google Scholar
- Kuhner MK, Felsenstein J: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates [erratum: Mol Biol Evol 1995 May;12(3):525]. Mol Biol Evol. 1994, 11: 459-468.PubMedGoogle Scholar
- Entrez Genome. [http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/map_search]
- Tfbind. [http://tfbind.ims.u-tokyo.ac.jp/]
- TRANSFAC - the transcription factor database. [http://transfac.mirror.edu.cn/TRANSFAC/]
- Regulatory sequence analysis tools. [http://rsat.ulb.ac.be/rsat/]
- Praz V, Perier R, Bonnard C, Bucher P: The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data. Nucleic Acids Res. 2002, 30: 322-324. 10.1093/nar/30.1.322.PubMedPubMed CentralView ArticleGoogle Scholar
- Eukaryotic promoter database. [http://www.epd.isb-sib.ch/]
- Bailey TL, Gribskov M: Methods and statistics for combining motif match scores. J Comput Biol. 1998, 5: 211-221.PubMedView ArticleGoogle Scholar
- MEME. [http://meme.sdsc.edu/meme/website]