Large scale comparison of global gene expression patterns in human and mouse
© Zheng-Bradley et al.; licensee BioMed Central Ltd. 2010
Received: 10 September 2010
Accepted: 23 December 2010
Published: 23 December 2010
It is widely accepted that orthologous genes between species are conserved at the sequence level and perform similar functions in different organisms. However, the level of conservation of gene expression patterns of the orthologous genes in different species has been unclear. To address the issue, we compared gene expression of orthologous genes based on 2,557 human and 1,267 mouse samples with high quality gene expression data, selected from experiments stored in the public microarray repository ArrayExpress.
In a principal component analysis (PCA) of combined data from human and mouse samples merged on orthologous probesets, samples largely form distinctive clusters based on their tissue sources when projected onto the top principal components. The most prominent groups are the nervous system, muscle/heart tissues, liver and cell lines. Despite the great differences in sample characteristics and experiment conditions, the overall patterns of these prominent clusters are strikingly similar for human and mouse. We further analyzed data for each tissue separately and found that the most variable genes in each tissue are highly enriched with human-mouse tissue-specific orthologs and the least variable genes in each tissue are enriched with human-mouse housekeeping orthologs.
The results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse. The expression of groups of orthologous genes co-varies in the two species, both for the most variable genes and the most ubiquitously expressed genes.
Over the past two decades, both tissue specificity and the conservation of expression between orthologous genes have been much discussed but comparative analysis at the transcriptome level has produced ambiguous results. While studies suggested that orthologous genes do not share similar expression patterns [1–5], other groups reported the opposite observations [6–9]. In fact, gene-specific expression regulation is different in mouse and human. For instance, it has been shown that even for highly conserved and tissue-specific transcription factors, promoter-binding events are highly species specific, and binding patterns do not align between species . We took advantage of the vast amount of human and mouse gene expression data deposited in ArrayExpress to investigate possible correlation of global patterns between mouse and human orthologous genes at the expression level.
The challenge of comparing expression patterns of orthologous genes in different species is mainly due to different affinities of probes on different chips, leading to difficulties in comparing data from different platforms. Different approaches have been tried to compare gene expression patterns in different organisms (reviewed in ). Some studies used the same microarray for cross-hybridization in samples from different species to eliminate the variations in hybridization and scanning protocols. This approach typically used either a single-species array, to which samples from closely related species or subspecies were hybridized and expression levels of orthologous genes were measured [12, 13], or a custom-designed chip that contained probes from different species [14, 15]. Alternatively, many other studies made use of species-specific arrays to identify co-expressed groups of orthologous genes [4–6, 16, 17]. In such studies, how to minimize the platform effects was the key to meaningful comparison of the cross-species data. Some studies identified differentially expressed genes within species; then the resulting significant gene lists were compared cross-species to look for patterns of conservation [3, 18]. A few other studies used more sophisticated algorithms and analyzed combined data from different species at the same time to identify cell cycle genes with conserved expression patterns between species [19–21].
Our study used data generated on species-specific microarray platforms. Only human data from the Affymetrix HG-U133A array and mouse data from the Affymetrix MG_U74Av2 array were considered to exclude between-array variability within each species. These two whole genome arrays were selected because they have been used for the highest number of human and mouse samples in ArrayExpress. Raw data consisting of 5,372 and 1,323 high quality human and mouse CEL files were selected from ArrayExpress. Each CEL file corresponds to the hybridization of one biological sample. Since the data matrices are extremely large and the information content is very rich, we first normalized and filtered for human-mouse orthologous probesets, then used principal component analysis (PCA) to reduce the data dimensions. PCA has been often used to study high-dimensional data generated by genome-wide gene expression studies [22–25]. In an earlier PCA analysis of the 5,372 human hybridizations it was found that, on PCA scatter plots, samples in general clustered together based on tissue types. Despite the great diversity, the samples are predominantly clustered into the following classes of distinctive biological characteristics: hematopoietic system, malignancy samples including cell lines, neoplastic sample and non-neoplastic primary tissues, and nervous system. Specific classes of genes are expressed in different clusters . The study suggested that samples of similar physiological attributes have similar gene expression profiles globally and they would tend to group together on PCA scatter plots.
It is intriguing whether these major gene expression patterns are conserved across evolutionarily diverse species such as human and mouse. We answer this question positively and report a similar PCA analysis of the 1,323 mouse hybridizations. Similar to what was observed in the previous study of human data , the mouse samples also clustered on PCA scatter plots. The samples were loosely partitioned into a nervous system cluster, a muscle/heart cluster, a liver cluster and a cluster of samples with lower variability, including cell line samples. Since the distribution of samples on the scatter plots is driven by the underlying transcriptome, we anticipate that samples in each cluster have distinctive gene expression profiles. To compare gene expression profiles between human and mouse, the data from the two species were normalized and merged into a single data matrix based on orthologous gene pairings. The merged data matrix was subjected to PCA analysis. We observed that the clustering of samples in individual species is well preserved in the multi-species analysis; more interestingly, human and mouse share a very similar pattern of sample clustering. The resemblance of the human and mouse sample clusters was also observed in hierarchical clustering of Pearson correlation between human and mouse tissues. All observations suggest that, for at least a fraction of orthologous genes, the expression profiles are largely conserved between the two species. The speculation is supported by elevated gene expression correlation co-efficient between human and mouse orthologous genes comparing with a randomized negative control. Additional investigations allowed us to identify orthologous genes whose expression levels co-vary in the two species.
Results and discussion
Sample clustering analysis of the mouse dataset
Summary of probesets and probeset annotations for the platforms used in the study
Number of probesets
Number of annotated probesets
Number of Ensembl genes
Further analysis demonstrated that samples of a particular tissue type are always represented by multiple experiments (Additional files 1 and 2), suggesting that lab effects did not drive the tissue clustering. We conclude that, similarly to what has been observed in human, mouse samples from a given tissue class share similar global gene expression patterns, causing the samples to cluster together when they are projected to the top principal components. When profiling the transcriptome of thousands of samples from different tissues and different conditions, the subtle variations within the same class of samples give way to the grand differences between different sample classes.
Sample clustering analysis of combined human and mouse datasets
In the combined analysis, we observe the same cluster pattern as in the mouse-only analysis. The four predominant groups are a central cluster of mostly cell line samples, and three tissue-specific clusters: muscle/heart, nervous system, and liver samples (Figure 2). Human samples and mouse samples form the same major clusters, and the tissue-specific clusters of samples from each species are adjacent in the PCA plot. Similar sample clustering patterns were observed in scatter plots of other principal components; one example is components 1 and 2 in Additional file 4. Since the distance between two samples when projected onto the principal components is determined by the covariance of their gene expression profiles, we believe the similarity of the human and mouse tissue clusters reflect the correlation between the transcriptomes of human and mouse tissues. Our hypothesis is that, in the same types of tissues, orthologous genes are expressed in a correlated fashion at the global level in both species. The systematic shift of the locations between corresponding human and mouse tissue clusters may be explained by platform effects that remain after data normalization or it may reflect the genuine difference in expression patterns between the species.
Samples such as mammary gland and hematopoietic system were removed from the analysis presented in Figure 2 and Additional file 4 due to their one-sided presence in one species. Our initial PCA studies included these samples; the overall landscape of the PCA plot was different from what we have seen so far but the clustering of samples from nervous system, samples from muscle and heart, as well as the resemblance of such clusters between human and mouse is still evident (Additional file 5). Thus, we believe that the cross-species global gene expression similarity we observed is not due to sample filtering.
It is interesting to observe that all mouse clusters are closer to the center than their human counterparts (Figure 2; Additional files 4 and 5). The observation may reflect that the expression values on the mouse chip are not as widely diversified as those on the human chip; or may simply reflect that the mouse dataset scaled differently to the human dataset during normalization.
How the data were normalized before they were merged into a combined matrix has profound impact on the PCA landscape. In all PCA results we presented so far, the data were normalized by probeset across all samples to minimize the platform differences among samples; thus, the data are more comparable cross-species. If we normalized the human and mouse data matrices by sample, in the combined matrix, the platform difference is the largest variance captured in the top principal component (Additional file 3b), separating mouse samples and human samples into two distinctive areas (Additional file 6a). Within each species cluster, the tissue clusters are still preserved and the relative order of the tissue clusters is the same in the two species (Additional file 6b), reflecting the global gene expression resemblance of the two species.
Identification of expression correlation between orthologous genes of different species
Cross-platform comparison of gene expression data is always a challenge. Even for the same tissue type, human and mouse samples differ in many ways; thus, it is difficult to take a pair of orthologous genes between the two species and compare their expression levels directly. A condition that induces or suppresses the expression of a gene in one species may not be applicable to another species. To minimize sample and platform variations, we used a measurement called correlation of correlation coefficient or corCor . It compares transcriptome-wide correlation in two groups of corresponding probesets by calculating the vector of correlation coefficients for one probeset to all other probesets in each of the two groups separately, then calculating the correlation coefficient between these two vectors. In our study, the mouse data matrix of 1,267 samples and 6,180 probesets and the human data matrix of 2,557 samples and 6,180 probesets were compared by calculating corCor for every probeset (see Materials and methods). As a negative control, the expression values in the mouse and human data matrices were randomized and the corCor for each probeset was calculated between mouse and human.
In contrast to what we observed in Figure 4b, when corCor was measured between mouse and human samples within specific tissues, corCor distributions are not strongly deviating from the negative control (Additional file 8). We believe when samples are of a single tissue type and relatively homogenous, the platform effects and laboratory effects become more dominant and can mask the tissue-specific global expression patterns observed in analyses using much larger and heterogeneous datasets.
Comparison of the lists of genes that display the evolutionarily conserved expression patterns in different tissues as identified by us and by Chan and colleagues 
The functions of the enriched human mouse orthologs were examined by studying Gene Ontology (GO) term over-representation in the gene list using ONTO-EXPRESS . ONTO-EXPRESS uses the ontology tree and calculates statistical significance for each biological process as P-values. We found that the most variable genes shared by human and mouse tend to be genes with tissue-specific functions. For instance, for nervous system samples, the shared gene list contains genes involved in nervous system development and synaptic transmission (Additional file 10a). For muscle and heart samples, the over-represented GO terms in the most variable genes are muscle development, regulation of striated muscle contraction, ventricular cardiac muscle morphogenesis, cardiac muscle contraction, muscle filament sliding, and actin filament-based movement (Additional file 10b). For liver samples, liver-specific GO terms such as oxidation-reduction, lipid metabolic process, response to mercury ion, and cholesterol homeostatasis are enriched (Additional file 10c). This leads to the conclusion that genes with evolutionarily conserved expression patterns across species are mostly the ones performing highly tissue-specific functions and are expressed in specific tissues with limited cell types. This explains the observation made by others  and us that tissues with relatively homogenous composition of cell types, such as heart/muscle, liver, and nervous system, would be segregated when profiling large-scale gene expression data. On the other hand, the shared orthologs among the least variable genes tend to be housekeeping genes, such as genes controlling transcription, apoptosis, cell adhesion, cell differentiation and protein amino acid phosphorylation (Additional file 10d). Not surprisingly, the housekeeping genes are also expressed in a similar manner across species.
With large amounts of gene expression data obtained from public repositories, we investigated the transcriptomes of human and mouse across a large variety of experimental conditions. Where single experiments benefit from reducing experimental variability to discover gene-specific expression regulation, by instead selecting as wide a variety of experimental and sample conditions as possible, we can gain insights into regulation at a higher level of complexity. When analyzing samples from a large variety of tissues, such large-scale studies revealed that the patterns of global gene expression are strong enough to segregate samples based on key biological properties, despite vast variations in experiment conditions, genetic background, age, sex and other sample characteristics. The results confirmed the common belief that samples of similar tissue types share similarities at the transcriptome level. At the same time, the patterns of this segregation, as detected by PCA, are similar between mouse and human and indicate that, on a global level, the signals driving tissue specificity are similar between the species. It supports previous findings [6–9] that although mechanisms of individual gene regulation may be different between the species, global functional patterns are similar and identifiable with whole transcriptome analysis. In particular, like in our study, Chan and colleagues  observed in a cross-species comparison of five different vertebrates ranging from human to pufferfish that the expression profiles of orthologous genes across the five species in related tissues of different species were conserved; among other tissues, they also identified heart/muscle, central nervous system and liver as tissues with evolutionarily conserved gene expression profiles .
Our results provide strong evidence that, on a global level, gene expression patterns of human-mouse orthologs are conserved. The cross-species conservation of expression profiles of tissue-specific genes and housekeeping genes is the foundation for the similar landscapes of sample clustering between human and mouse in large-scale transcriptome comparison. A recent publication  documents that approximately half of measured subnetworks of transcription factors are conserved between human and mouse; this may at least partially explain the conservation of global gene expression patterns we observed in this study.
Materials and methods
Creating an integrated mouse gene expression dataset
We identified 2,290 CEL files generated on Affymetrix chip MG_U74Av2 from ArrayExpress; these are all from publicly available experiments deposited to ArrayExpress before May 2008. The quality of the CEL files was evaluated individually using the R simpleaffy package and four quality control measurements were produced: average background (AvgBg), scale factors (sfs), percent present (PP) and RNA degradation slope (RNAdeg). Arrays were selected for inclusion in this study based on these quantities using the following ranges: AvgBg, 20 to 150; PP, 25 to 65; RNAdeg, <1.7; sfs, 0.1 to 2.5 (suggested by ).
In addition to the simpleaffy assessments, the CEL files selected were further evaluated by probe level model (PLM) using the Bioconductor's affyPLM package. Two quality assessments were derived from the PLM fitting output: normalized unscaled standard error (nuse) and relative log expression (rle). The cutoffs were set as: nuse, 0.97 to 1.05; rle, -0.15 to 0.15. Arrays not passing these criteria were discarded from further analysis.
The resulting 1,323 CEL files were pre-processed using Bioconductor's RMA package  to create an integrated, normalized data matrix. Annotations for each sample were retrieved from the database and manually curated to ensure uniform representation and minimal redundancy. For instance, when in some experiments samples were originally annotated as 'hepatocyte samples', we would change the annotation to 'liver' for consistency. The annotations of the 1,323 samples were generalized so the whole dataset contains a limited number of unique categories of tissue type annotation, such as nervous system, reproductive system, immune system and so on. The integrated dataset was submitted to ArrayExpress and assigned accession [E-MTAB-27].
Merging human and mouse gene expression datasets
The high quality CEL files of 5,372 human samples tested on the HG-U133A microarray were selected and prepared as previously described . The high quality CEL files for mouse samples were selected as described above. The data were normalized separately for human and mouse in R using the justRMA function. In the resulting matrices, each column contains data for one sample and each row data for one probeset. The two matrices were then reduced to a subset of probesets representing orthologous genes between mouse and human. The pairing of these orthologous probesets was done based on gene orthologs obtained from Ensembl Compara . Since the probe effect is well known to be very significant in all microarray analyses, we chose to identify orthologous probesets by maximizing the number of probes with similar sequences as follows. For each orthologous gene pair, data for all probesets and their associated probes and probe sequences were retrieved from Affymetrix. Probes for each human gene were BLASTed against mouse probes of the corresponding orthologous gene using bl2seq, and the best one-to-one match was retained. Default settings were used with bl2seq except -W 7, -G 5, -E 2, -F = F. The human-mouse probeset pair with the most probe-probe top matches was selected to represent the ortholog pair on the probeset level.
After we discarded rows with non-orthologous probesets from the human and mouse matrices, the remaining data on each matrix were normalized either by probeset or by sample. To normalize by probeset, we first centered data row by row on median zero by subtracting the row median from each value in the row. Then the centered values were divided by median absolute deviation to scale the data. To normalize by sample, we used the same procedure but centered and scaled the data by columns instead of by rows; column median was used to center the data and column median absolute deviation was used to scale the data. After normalization either by probeset or by sample, the two data matrices of centered and scaled values were merged into one matrix by concatenating the sample columns of orthologous probesets. In the merged matrix, the rows are probesets and the columns are human and mouse samples.
Principal component analysis
PCA is a technique that transforms a dataset onto a linear space spanned by a number of orthogonal components, ordered by decreasing variance of the data when projected on it. The technique facilitates dimensionality reduction and noise filtering by the projection of data onto a number of the principal components, maximizing the variance retained. The function prcomp with default settings provided in the R statistic package was used to perform PCA on different data matrices throughout this study. The results were visualized by scatter plots.
The combined data matrix of 2,557 human samples and 1,267 mouse samples created as described above was used for hierarchical clustering. The matrix contains gene expression values centered and scaled by probeset. Each sample in the matrix is assigned to one of 13 general tissue categories that are well represented in both species so the total number of annotation types is 26 (tissue combining species). We extracted 26 submatrices containing data from samples of 26 different annotation types; Pearson correlation coefficients were calculated for 26 × 26 permutations of the submatrices; for each pair of submatrices, a mean correlation coefficient was taken and placed in a 26 × 26 matrix. Hierarchical clustering of the samples in the matrix was performed by R function heatmap.2.
Calculation of corCor
For a gene A on the human array composed of n genes, we computed its pair wise Spearman correlation coefficient with every gene on the same chip, giving a vector v(A) of length n - 1. Given gene A' is the ortholog of gene A on the mouse array, we similarly computed its pair wise correlation coefficient with every mouse gene as v(A') of length n - 1. The correlation coefficient between v(A) and v(A'), corCor, provides an indication of whether A and A' are correlated in mouse and human on the transcriptome level, regardless of the vast sample variations. The higher the absolute corCor value, the stronger correlation of the orthologous genes is; negative corCor indicates negative correlation. The R package MergeMaid was used for this analysis .
AB is a senior team leader and senior scientist at EMBL-EBI and serves on the board of FGED (Functional Genomics Data) Society.
correlation of correlation coefficient
principal component analysis
probe level model.
The study is funded by the MUGEN consortium (grant LSHG-CT-2005-005203) and the ENGAGE consortium (grant HEALTH-F4-2007-201413 from the European Commission FP7 program). We thank Margus Lukk for sharing his experience in analyzing large-scale expression data, and Wolfgang Huber, Richard Bourgon, Misha Kapushesky, Nils Gehlenborg, and Angela Goncalves for discussions and technical help.
- Yanai I, Graur D, Ophir R: Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. Omics. 2004, 8: 15-24. 10.1089/153623104773547462.PubMedView ArticleGoogle Scholar
- Jordan IK, Marino-Ramirez L, Koonin EV: Evolutionary significance of gene expression divergence. Gene. 2005, 345: 119-126. 10.1016/j.gene.2004.11.034.PubMedPubMed CentralView ArticleGoogle Scholar
- Han ES, Hickey M: Microarray evaluation of dietary restriction. J Nutr. 2005, 135: 1343-1346.PubMedGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
- Rustici G, Mata J, Kivinen K, Lio P, Penkett CJ, Burns G, Hayles J, Brazma A, Nurse P, Bahler J: Periodic gene expression program of the fission yeast cell cycle. Nat Genet. 2004, 36: 809-817. 10.1038/ng1377.PubMedView ArticleGoogle Scholar
- Chan ET, Quon GT, Chua G, Babak T, Trochesset M, Zirngibl RA, Aubin J, Ratcliffe MJ, Wilde A, Brudno M, Morris QD, Hughes TR: Conservation of core gene expression in vertebrate tissues. J Biol. 2009, 8: 33-10.1186/jbiol130.PubMedPubMed CentralView ArticleGoogle Scholar
- Xing Y, Ouyang ZQ, Kapur K, Scott MP, Wong WH: Assessing the conservation of mammalian gene expression using high-density exon arrays. Mol Biol Evol. 2007, 24: 1283-1285. 10.1093/molbev/msm061.PubMedView ArticleGoogle Scholar
- Liao BY, Zhang JZ: Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol. 2006, 23: 1119-1128. 10.1093/molbev/msj119.PubMedView ArticleGoogle Scholar
- Liao BY, Zhang JZ: Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol. 2006, 23: 530-540. 10.1093/molbev/msj054.PubMedView ArticleGoogle Scholar
- Odom DT, Dowell RD, Jacobsen ES, Gordon W, Danford TW, MacIsaac KD, Rolfe PA, Conboy CM, Gifford DK, Fraenkel E: Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat Genet. 2007, 39: 730-732. 10.1038/ng2047.PubMedPubMed CentralView ArticleGoogle Scholar
- Lu Y, Huggins P, Bar-Joseph Z: Cross species analysis of microarray expression data. Bioinformatics. 2009, 25: 1476-1483. 10.1093/bioinformatics/btp247.PubMedPubMed CentralView ArticleGoogle Scholar
- Whiteford CC, Bilke S, Greer BT, Chen QR, Braunschweig TA, Cenacchi N, Wei JS, Smith MA, Houghton P, Morton C, Reynolds CP, Lock R, Gorlick R, Khanna C, Thiele CJ, Takikita M, Catchpoole D, Hewitt SM, Khan J: Credentialing preclinical pediatric xenograft models using gene expression and tissue microarray analysis. Cancer Res. 2007, 67: 32-40. 10.1158/0008-5472.CAN-06-0610.PubMedView ArticleGoogle Scholar
- Nuzhdin SV, Wayne ML, Harmon KL, McIntyre LM: Common pattern of evolution of gene expression level and protein sequence in Drosophila. Mol Biol Evol. 2004, 21: 1308-1317. 10.1093/molbev/msh128.PubMedView ArticleGoogle Scholar
- Vallee M, Robert C, Methot S, Palin MF, Sirard MA: Cross-species hybridizations on a multi-species cDNA microarray to identify evolutionarily conserved genes expressed in oocytes. BMC Genomics. 2006, 7: 113-10.1186/1471-2164-7-113.PubMedPubMed CentralView ArticleGoogle Scholar
- Oshlack A, Chabot AE, Smyth GK, Gilad Y: Using DNA microarrays to study gene expression in closely related species. Bioinformatics. 2007, 23: 1235-1242. 10.1093/bioinformatics/btm111.PubMedView ArticleGoogle Scholar
- Bergmann S, Ihmels J, Barkai N: Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2004, 2: E9-10.1371/journal.pbio.0020009.PubMedPubMed CentralView ArticleGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302: 249-255. 10.1126/science.1087447.PubMedView ArticleGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.PubMedPubMed CentralView ArticleGoogle Scholar
- Alter O, Brown PO, Botstein D: Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc Natl Acad Sci USA. 2003, 100: 3351-3356. 10.1073/pnas.0530258100.PubMedPubMed CentralView ArticleGoogle Scholar
- Lu Y, Rosenfeld R, Bar-Joseph Z: Identifying cycling genes by combining sequence homology and expression data. Bioinformatics. 2006, 22: e314-322. 10.1093/bioinformatics/btl229.PubMedView ArticleGoogle Scholar
- Lu Y, Mahony S, Benos PV, Rosenfeld R, Simon I, Breeden LL, Bar-Joseph Z: Combined analysis reveals a core set of cycling genes. Genome Biol. 2007, 8: R146-10.1186/gb-2007-8-7-r146.PubMedPubMed CentralView ArticleGoogle Scholar
- Ringner M: What is principal component analysis?. Nat Biotechnol. 2008, 26: 303-304. 10.1038/nbt0308-303.PubMedView ArticleGoogle Scholar
- Alter O, Brown PO, Botstein D: Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA. 2000, 97: 10101-10106. 10.1073/pnas.97.18.10101.PubMedPubMed CentralView ArticleGoogle Scholar
- Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001, 7: 673-679. 10.1038/89044.PubMedPubMed CentralView ArticleGoogle Scholar
- Lukk M, Kapushesky M, Nikkila J, Parkinson H, Goncalves A, Huber W, Ukkonen E, Brazma A: A global map of human gene expression. Nat Biotechnol. 2010, 28: 322-324. 10.1038/nbt0410-322.PubMedPubMed CentralView ArticleGoogle Scholar
- ArrayExpress Archive. [http://www.ebi.ac.uk/arrayexpress/]
- Large scale comparison of global gene expression patterns in human and mouse, supplementary data. [http://www.ebi.ac.uk/~zheng/Genome_Biology_Paper/]
- The Integrative Correlation Coefficient: a Measure of Cross-study Reproducibility for Gene Expressionea Array Data. [http://www.bepress.com/jhubiostat/paper152]
- Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. Genomics. 2003, 81: 98-104. 10.1016/S0888-7543(02)00021-6.PubMedView ArticleGoogle Scholar
- Ravasi T, Suzuki H, Cannistraci CV, Katayama S, Bajic VB, Tan K, Akalin A, Schmeier S, Kanamori-Katayama M, Bertin N, Carninci P, Daub CO, Forrest AR, Gough J, Grimmond S, Han JH, Hashimoto T, Hide W, Hofmann O, Kamburov A, Kaur M, Kawaji H, Kubosaki A, Lassmann T, van Nimwegen E, MacPherson CR, Ogawa C, Radovanovic A, Schwartz A, Teasdale RD, et al: An atlas of combinatorial transcriptional regulation in mouse and man. Cell. 2010, 140: 744-752. 10.1016/j.cell.2010.01.044.PubMedView ArticleGoogle Scholar
- Bolstad BM, Collin F, Brettschneider J, Simpson K, Cope L, Irizarry RA, Speed TP: Quality assessment of Affymetrix GeneChip data in bioinformatics and computational biology solutions using R and Bioconductor. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Edited by: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S. 2005, Springer, 33-49. full_text.View ArticleGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.PubMedView ArticleGoogle Scholar
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19: 327-335. 10.1101/gr.073585.107.PubMedPubMed CentralView ArticleGoogle Scholar
- Cope L, Zhong X, Garrett E, Parmigiani G: MergeMaid: R tools for merging and cross-study validation of gene expression data. Stat Appl Genet Mol Biol. 2004, 3: Article29-PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.