Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis
- Reinhard Hoffmann^{1}Email author,
- Thomas Seidl^{2} and
- Martin Dugas^{3}
https://doi.org/10.1186/gb-2002-3-7-research0033
© Hoffmann et al., licensee BioMed Central Ltd 2002
Received: 7 February 2002
Accepted: 24 April 2002
Published: 14 June 2002
Abstract
Background
Oligonucleotide microarrays measure the relative transcript abundance of thousands of mRNAs in parallel. A large number of procedures for normalization and detection of differentially expressed genes have been proposed. However, the relative impact of these methods on the detection of differentially expressed genes remains to be determined.
Results
We have employed four different normalization methods and all possible combinations with three different statistical algorithms for detection of differentially expressed genes on a prototype dataset. The number of genes detected as differentially expressed differs by a factor of about three. Analysis of lists of genes detected as differentially expressed, and rank correlation coefficients for probability of differential expression shows that a high concordance between different methods can only be achieved by using the same normalization procedure.
Conclusions
Normalization has a profound influence of detection of differentially expressed genes. This influence is higher than that of three subsequent statistical analysis procedures examined. Algorithms incorporating more array-derived information than gene-expression values alone are urgently needed.
Keywords
Background
cDNA or oligonucleotide microarrays have made possible the measurement of thousands of mRNA levels in parallel, enabling researchers for the first time to generate comprehensive cellular gene-expression profiles. Among several competing techniques, photolithographically synthesized high-density oligonucleotides are widely used. Current chip layouts allow for the parallel measurement of > 12,000 gene-expression levels on a single array. In this approach, every gene is represented by a set of oligonucleotides perfectly matching the target sequence (PM oligo) and by a corresponding set with a 1 base-pair (bp) mismatch in a central position (MM oligo). The latter serves as an internal control for hybridization specificity. Relative transcript abundance is reported as the so-called 'average difference' value, that is the average of all PM-MM differences across the gene-specific set of probes [1,2]. An alternative approach fits a linear model onto the differences between PM and MM hybridization intensities and takes a model-based expression value as a measure of transcript abundance [3,4].
The technique is standardized in such a way that generation of gene-expression data is straightforward and quite easy to do. Analysis of processed fluorescence-intensity data, in contrast, is not. Analysis of a typical microarray experiment involves the following steps: pre-scaling of the fluorescence intensity across the different arrays belonging to one experiment to correct for differences in probe labeling, probe concentration, hybridization efficiency, and potentially other factors (in the context of microarray analysis, this process is generally termed normalization); detection of differentially expressed genes; in the case of experimental setups comparing more than two conditions, a clustering step to group together genes with similar expression patterns; and higher-level analysis, for example by combining functional annotations of genes having predefined interesting expression patterns with previous knowledge about the experimental system under investigation.
Most frequently, high-density oligonucleotide data are normalized by a simple 'global scaling' procedure. This involves multiplication of every gene-expression value with a constant factor so that the mean intensities of the arrays to be compared are identical. A conceptually related approach involves fitting a linear regression model on the data and scaling the fluorescence intensities so that the resulting regression model has a slope of 1 and a y-intercept of 0 [5]. This approach suffers from two significant drawbacks: first, it relies on the implicit assumption that the total mRNA content of different cell types compared is the same. This is not always the case, especially if cell types of different size and/or cell-cycle status are compared. Control of this effect is attempted by loading identical amounts of cRNA onto the chips. However, it has been shown that the mean expression level on any array can be subject to significant variation across arrays [6]. Second, the normalization is linear and cannot account for nonlinearity in the underlying data. Previous studies [7,8] show that simple linear regression models incompletely fit the data and that two or more linear models with different slope for different ranges of fluorescence-intensity values result in a better fit.
Two conceptually related solutions to these problems have been proposed. They assume that an 'invariant set' exists, containing genes that do not change significantly between two experimental conditions. First, all fluorescence values on the arrays are ranked according to intensity. Then, items with similar ranks between two arrays are identified and considered unchanged. These items are used for nonlinear normalization. This procedure can be performed either on the feature level (taking raw fluorescence values as input) [3,4] or on the probe set level, taking average-difference values as input [8].
A similar multitude of strategies exist to detect differentially expressed genes. The easiest approach is to define genes as differentially expressed that change more than an arbitrarily chosen threshold. More sophisticated analyses additionally apply statistical tests such as Student's t-test for comparisons between two experimental conditions. However, this and other parametric tests rely on certain assumptions, namely that the underlying data are normally distributed with equal variances across experimental conditions [9]. These assumptions must not necessarily be met, and analysis of our own (T.S. and R.H., unpublished observations) and other [10] datasets show that they are often not fulfilled. Non-parametric tests such as the Mann-Whitney test, in contrast, do not rely on such strong assumptions, but a larger number of replicate experiments is desirable.
A particular problem is the analysis of 'multiclass experiments' containing more than two experimental conditions, such as cellular developmental stages. Many researchers carry out pairwise comparisons of all possible pairs of combinations, resulting in a list of genes that are detected as differentially expressed at least once. This leads to increased type-I error rates, with the final data set having a type-I error rate up to 1-(1-α)^{ n }, where α is the type-I error rate of individual pairwise comparisons and n is the number of pairwise comparisons [9]. Five pairwise comparisons at the 95% confidence level thus result in a confidence level for the resulting dataset of 77%. Classical statistics offer ANOVA algorithms for such problems. Here, differential gene expression is detected by comparing variances within experimental conditions to variances across experimental conditions [9]. Both parametric (F) and nonparametric (H or Kruskal-Wallis) tests exist, with the associated problems described above.
Recently, an alternative procedure for detection of differentially expressed genes, called significance analysis of microarrays (SAM), has been described [11]. Here, a relative difference in gene expression is computed, incorporating means and standard deviations across experimental conditions. Next, the dataset is permuted several times, and the relative difference is computed again, on the basis of the permuted datasets. For the majority of genes, these two values are approximately equal. For some genes, however, the difference between the two scores exceeds a certain threshold parameter. These genes are called differentially expressed. A false-discovery rate [12] can be computed on the basis of how many genes are called in the permuted datasets with the given threshold.
Obviously, there are a large number of analysis options for gene-expression data. The influence of normalization and statistical analysis on the detection of differentially expressed genes has not been investigated to date. In this study, we carry out a thorough comparison of different normalization and statistical procedures to define the key components for detection of differentially expressed genes in a multiclass experiment.
Results
The aim of the present study was to evaluate different normalization and statistical analysis methods for their influence on detection of differentially expressed genes. We focused on a typical multiclass experiment. The dataset used comprises high-density oligonucleotide array-derived gene-expression data of five consecutive cellular populations of an ordered cellular differentiation pathway. The biology-oriented analysis and interpretation of the dataset has been described elsewhere [13].
Normalization
The two normalization methods based on invariant features produce a significant amount of scatter compared to the not normalized data (Figure 1a,1b), especially the model-based expression values as compared to the traditional average difference values. However, the model-based expression values calculated after invariant feature normalization (y-axis in Figure 1b) differ by two factors from the not normalized average difference values (x-axis in Figure 1b). The scatter in Figures 1a and b reflects data processing on the fluorescence level with recalculation of gene-expression metrics after normalization, in contrast to the other methods. Comparing average difference and model-based expression values derived after invariant feature normalization (Figure 1c), it becomes evident that model-based expression values tend to be higher than average difference values in the low-signal area of the plot. Thus, low-abundance genes give higher signals when model-based expression values are used. This might reflect either a greater sensitivity of the model-based approach or an overestimation of the expression level.
The invariant set normalization method results in very similar slopes for the subA and subB arrays, respectively (Figure 1d). Since pre-computed average difference values are used, the normalization does not result in recalculation of the expression-level values from fluorescence data, resulting in less scatter than in Figure 1a,1b. The global scaling method, as expected, produces two 'straight lines' of data points representing different normalization factors for the two array types (Figure 1e).
Statistical evaluation and detection of differentially expressed genes
The four normalized datasets were subjected to three popular methods for identifying differentially expressed genes, namely parametric and nonparametric ANOVA models and the permutation-based SAM procedure. This resulted in 12 datasets containing data about probability of differential gene expression.
However, the confidence level alone is seldom used for detection of differentially expressed genes. Usually, a fold-change criterion as well as an absolute difference criterion is added. Figure 3b shows what percentage of the genes shown in Figure 3a also pass additional criteria (twofold change and absolute difference of at least 100 units). These additional criteria lead to a reduction of the number of detected genes in all datasets, but to a markedly different extent. On average, 78, 82 and 88% of the genes detected using only the 99% confidence criterion are still detected when applying the additional criteria in the parametric ANOVA, nonparametric ANOVA, and SAM datasets, respectively (Figure 3b). In contrast, this holds true for only 63% of the genes normalized with the invariant probe set method. Genes detected by the combination with parametric ANOVA are most significantly affected, with only 54% fulfilling the additional criteria.
Results of different methods of statistical analysis
Invariant feature (AD) | Invariant feature (MBEV) | Invariant set | Global scaling | |
---|---|---|---|---|
(a) Mean | 3.55 | 2.45 | 1.59 | 3.37 |
Median | 2.09 | 1.46 | 1.30 | 2.12 |
Max | 429.01 | 667.16 | 71.63 | 848.28 |
(b) Mean FC | ||||
F | 15.05 | 9.69 | 3.06 | 10.28 |
KW | 18.48 | 12.24 | 3.35 | 12.21 |
SAM | 21.91 | 13.75 | 3.63 | 13.08 |
Median FC | ||||
F | 5.77 | 2.97 | 2.08 | 3.89 |
KW | 7.89 | 3.63 | 2.25 | 4.77 |
SAM | 9.20 | 4.11 | 2.41 | 4.92 |
(c) Mean | 428.71 | 750.20 | 324.17 | 384.47 |
Median | 85.52 | 259.83 | 122.04 | 102.37 |
Max | 25141.89 | 24616.80 | 28650.70 | 24794.22 |
(d) Mean difference | ||||
F | 2248.04 | 3346.51 | 1427.59 | 1784.93 |
KW | 2545.81 | 3881.37 | 1655.16 | 2044.61 |
SAM | 3157.21 | 4451.51 | 1827.97 | 2216.36 |
Median difference | ||||
F | 1116.52 | 1954.57 | 664.49 | 954.38 |
KW | 1328.38 | 2444.28 | 815.85 | 1138.39 |
SAM | 1865.47 | 2927.57 | 952.92 | 1248.04 |
Percentage of genes identical among all genes detected by different combinations of normalization and statistical analysis
Invariant feature; AD, F | Invariant feature; AD, KW | Invariant feature; AD, SAM | Invariant feature; MBEV, F | Invariant feature; MBEV, KW | Invariant feature; MBEV, SAM | Invariant set; F | Invariant set; KW | Invariant set; SAM | Global scaling; F | Global scaling; KW | Global scaling; SAM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Invariant feature; AD, F | 93 | 100 | 78 | 83 | 92 | 81 | 83 | 88 | 64 | 74 | 78 | |
Invariant feature; AD, KW | 74 | 95 | 67 | 79 | 82 | 74 | 79 | 81 | 55 | 67 | 69 | |
Invariant feature; AD, SAM | 56 | 66 | 51 | 59 | 74 | 58 | 61 | 66 | 41 | 50 | 54 | |
Invariant feature; MBEV, F | 69 | 75 | 82 | 96 | 100 | 77 | 80 | 82 | 58 | 67 | 70 | |
Invariant feature; MBEV, KW | 62 | 74 | 79 | 80 | 95 | 71 | 75 | 77 | 51 | 62 | 64 | |
Invariant feature; MBEV, SAM | 56 | 63 | 81 | 68 | 77 | 63 | 66 | 70 | 44 | 53 | 57 | |
Invariant set; F | 59 | 69 | 77 | 63 | 70 | 76 | 98 | 100 | 56 | 67 | 70 | |
Invariant set; KW | 56 | 67 | 74 | 60 | 68 | 74 | 90 | 96 | 52 | 66 | 66 | |
Invariant set; SAM | 55 | 64 | 74 | 58 | 65 | 73 | 85 | 89 | 49 | 61 | 66 | |
Global scaling; F | 80 | 87 | 92 | 81 | 86 | 90 | 95^{*} | 96^{*} | 98^{*} | 100 | 100 | |
Global scaling; KW | 72 | 83 | 88 | 73 | 82 | 85 | 89 | 96^{*} | 95^{*} | 78 | 93 | |
Global scaling; SAM | 70 | 78 | 88 | 71 | 77 | 84 | 85 | 89 | 94^{*} | 72 | 86 |
Rank correlations for probability of differential expression between all possible combinations of normalization and statistical analysis procedures
Invariant feature; AD, F | Invariant feature; AD, KW | Invariant feature; AD, SAM | Invariant feature; MBEV, F | Invariant feature; MBEV, KW | Invariant feature; MBEV, SAM | Invariant set; F | Invariant set; KW | Invariant set; SAM | Global scaling; F | Global scaling; KW | Global scaling; SAM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Invariant feature; AD, F | 1.000 | 0.922 | 0.995 | 0.512 | 0.495 | 0.512 | 0.545 | 0.549 | 0.546 | 0.626 | 0.618 | 0.623 |
Invariant feature; AD, KW | 0.922 | 1.000 | 0.915 | 0.487 | 0.487 | 0.487 | 0.540 | 0.555 | 0.541 | 0.604 | 0.621 | 0.600 |
Invariant feature; AD, SAM | 0.995 | 0.915 | 1.000 | 0.517 | 0.500 | 0.517 | 0.553 | 0.555 | 0.553 | 0.625 | 0.617 | 0.623 |
Invariant feature; MBEV, F | 0.512 | 0.487 | 0.517 | 1.000 | 0.941 | 1.000 | 0.440 | 0.435 | 0.440 | 0.443 | 0.444 | 0.446 |
Invariant feature; MBEV, KW | 0.495 | 0.487 | 0.500 | 0.941 | 1.000 | 0.941 | 0.431 | 0.433 | 0.431 | 0.428 | 0.435 | 0.432 |
Invariant feature; MBEV, SAM | 0.512 | 0.487 | 0.517 | 1.000 | 0.941 | 1.000 | 0.440 | 0.436 | 0.440 | 0.443 | 0.444 | 0.447 |
Invariant set; F | 0.545 | 0.540 | 0.553 | 0.440 | 0.431 | 0.440 | 1.000 | 0.933 | 1.000 | 0.738 | 0.730 | 0.749 |
Invariant set; KW | 0.549 | 0.555 | 0.555 | 0.435 | 0.433 | 0.436 | 0.933 | 1.000 | 0.933 | 0.719 | 0.743 | 0.729 |
Invariant set; SAM | 0.546 | 0.541 | 0.553 | 0.440 | 0.431 | 0.440 | 1.000 | 0.933 | 1.000 | 0.739 | 0.731 | 0.750 |
Global scaling; F | 0.626 | 0.604 | 0.625 | 0.443 | 0.428 | 0.443 | 0.738 | 0.719 | 0.739 | 1.000 | 0.925 | 0.994 |
Global scaling; KW | 0.618 | 0.621 | 0.617 | 0.444 | 0.435 | 0.444 | 0.730 | 0.743 | 0.731 | 0.925 | 1.000 | 0.918 |
Global scaling; SAM | 0.623 | 0.600 | 0.623 | 0.446 | 0.432 | 0.447 | 0.749 | 0.729 | 0.750 | 0.994 | 0.918 | 1.000 |
Discussion
The present study examined the influence of normalization and statistical analysis on detection of differentially expressed genes in a oligonucleotide microarray experiment. The dataset used describes five different cellular stages of an ordered differentiation pathway [13,14]. We focused on statistical algorithms designed for proper analysis of such multiclass experimental designs.
A first striking observation is that the number of genes detected as differentially expressed varies by a factor of almost three, depending on which combination of normalization method and statistical analysis has been carried out. The genes detected by a confidence criterion alone show large differences in mean and median fold changes, and thus show different susceptibility to the use of additional criteria for filtering. This affects primarily the dataset normalized by the invariant probe set method. Also, genes in this dataset show smaller fold changes, and the maximum fold change achieved is 6- to 11-fold lower than in the other datasets. Most probably, this is due to the shifting of data as a first step in the normalization so that only 2% of the raw values are below 20. This assigns a higher value to each data point while preserving the difference between any two data points, effectively reducing the ratio.
Analysis of the identity of probe sets indicates that the normalization method has a very high influence of detection of differentially expressed genes. In those cases where one gene is detected by two or more different combinations of normalization and statistical analysis algorithms, these combinations usually employ the same normalization. Moreover, genes are detected as differentially expressed to a high percentage in different combinations of analysis strategies only when the same normalization has been applied. Applying the same statistical algorithms, in contrast, does not have such a profound effect. Finally, a high rank correlation for probability of differential expression can only be achieved if the same normalization procedure has been applied. Thus, normalization appears to have a higher influence on the set of differentially expressed genes than the choice of statistical algorithm.
A few points should be kept in mind when interpreting the results presented here. First, the dataset chosen consists of five independent replicate experiments, a situation rarely encountered in microarray experiments.
Second, the dataset has been derived from highly purified cell populations. This situation is also different from most other microarray experiments. It might therefore be that the effects described here are even more pronounced in different experimental settings investigating less well-defined materials.
Third, the samples have been subjected to two rounds of RNA amplification. Samples prepared according to the standard Affymetrix protocols might behave differently. However, we do not expect this to be a significant issue, as the hybridization behavior of amplified and standard probes has been shown by us and other groups [15,16] to be similar. Moreover, we have high confidence in our dataset, as many genes with known expression patterns are detected in concordance with earlier results [13,17,18].
Fourth, the present analysis follows the general practice of detecting differentially expressed genes solely on the basis of differences in the average difference value. This represents only a small proportion of the information generated by the analysis of the fluorescence images. Additional information about cross-hybridization, fractions of probe pairs contributing to the signal, and 'presence' or 'absence' of a transcript, among others, is readily available [1,2]. Unfortunately, no consensus exists on how to incorporate this additional information in a setting that cannot be handled by the manufacturer's software. Anecdotally, individual groups apply their own, often arbitrarily chosen, criteria to increase confidence in the results [19,20]. Data-analysis algorithms employing as much information as possible with incorporation of replicate experiments and the ability to analyze more than two conditions simultaneously are urgently needed.
Finally, testing more than 13,000 hypotheses on only five different conditions constitutes a significant multiple-testing problem. It is commonly accepted in multivariate statistics that the number of hypothesis should not exceed the number of parameters. Thus, when testing such a high number of hypotheses, the probability of at least one falsely rejected null hypothesis (the so-called family-wise error rate) is high. Although multiple solutions to this problem have been proposed (like SAM, controlling the false-discovery rate rather than the type-I error rate) [11,12], to date no consensus exists on how to deal with that problem in the context of gene-expression analysis.
The question naturally arises of which combination of algorithms is 'best' for analyzing gene-expression data. There is probably no general answer. One has to balance sensitivity, which attempts to detect all differentially expressed genes, against specificity, which attempts to reduce the number of false positives as much as possible. This is nicely illustrated by the set of genes mentioned above that are known to change. All of the 21 genes examined so far are detected in at least one method combination. However, two genes are detected by only one combination, and only seven of the genes known to change during B-cell differentiation are detected by all 12 combinations of methods. Thus, the more specific an algorithm is, the more likely is a loss of sensitivity. However, with the high number of differentially expressed genes typically detected in a microarray experiment, specificity might be a major issue.
As most of the genes detected by the permutation-based SAM method are enclosed in the ANOVA models, this algorithm appears to be inherently more specific than the classical ones. Regarding normalization, the invariant-feature method with calculation of average difference values yields the smallest set of genes. However, this set is not a subclass of the genes detected after normalization with other methods (Table 2), as is the case for SAM. Ideally, a researcher would have a set of genes with known differential expression and a set known not to be differentially expressed. This could be used to define the conditions for analysis. In the absence of such a training set, it is probably a wise decision to use the algorithms likely to result in the most specific analysis.
Materials and methods
Gene-expression dataset
The B-cell precursor gene-expression dataset described here has been published in detail previously [13]. Total femoral bone marrow cells of 5-6-week-old C57/BL6 mice (n = 4 per experiment) were divided into into three equal samples. Cells were stained and five populations representing consecutive cellular differentiation stages were sorted. These stages were: pre-BI cells (c-Kit^{+} B220^{+}), large pre-BII cells (surface immunoglobulin (sIg)^{-} CD25^{+} B220^{+} large), small pre-BII cells (sIg^{-} CD25^{+} B220^{+} small), immature B (sIgM^{+} B220^{lo}) and mature B cells (sIg^{+} B220^{hi}) [14]. A total of 50,000 (pre-BI, large pre-BII) or 150,000 (small pre-BII, immature and mature B cells) cells were sorted directly into TRIzol RNA isolation reagent (Life Technologies) at 50,000 cells/500 μl TRIzol. A cell purity of ≥ 98% was routinely achieved. RNA was then subjected to two rounds of in vitro transcription-based RNA amplification as described earlier [13,16,21]. Affymetrix Mu11k subA and subB GeneChip^{®} arrays were hybridized, washed, stained and scanned according to the manufacturer's specifications. Five independent replicate experiments were performed; thus, a total of 50 chips is included in the current analysis (5 conditions × 5 replicates × 2 chip layouts). Scanned raw data images were processed with Affymetrix GeneChip v3.2 software, resulting in processed image (.cel) and numerical (.chp) files. The entire dataset can be obtained from the NCBI at [22] under accession GSE13.
Normalization
Four different normalization procedures were used. All normalizations were carried out separately for the subA and subB arrays, respectively. After normalization, all gene-expression values below 20 were set to 20 to eliminate low-level signals. SubA and subB chip types were combined into one gene-expression matrix.
Global scaling
For global scaling, average difference values were extracted from the .chp files. All average difference values of every chip were summed up, and the mean of these sums across all chips of the same layout was calculated. The ratio of the actual average difference sum for any given chip and the mean of all average difference sums across all chips with the same layout served as a correction factor for this chip, with which all the average difference values were multiplied.
Invariant feature normalization and model-based expression values
For invariant-feature normalization, the program dchip [3,4] was used. Processed image (.cel) files served as input, and normalization was carried out according to the developer's specifications. Briefly, the program first identified a baseline array with median overall fluorescence intensity. Next, for every array, invariant features (defined as all the features with similar ranks of fluorescence intensity between two arrays) were identified. Finally, a piecewise linear running median line based on the invariant features was calculated and used as the normalization curve. After normalization, both traditional average difference values and model-based expression values (MBEV) were calculated and exported to Microsoft Excel.
Invariant set normalization
For invariant-set normalization on the probe-set level, all average difference levels were extracted from the .chp files and imported into 'The Equalizer' [8]. Normalization was performed according to the developer's specifications. Briefly, all values were first shifted (by adding a constant value) so that only 2% of the data points had an intensity below 20. Next, values with similar ranks (± 15) between two arrays were identified, taking the first array of the set as baseline. A curve was fitted on this similar-rank subset. Finally, all data points were shifted so that the original similar-rank subset has a slope of 1. Normalized values were exported to Microsoft Excel.
Statistical analysis
Three different methods were used for detection of differentially expressed genes. The F-test for parametric ANOVA and the H (Kruskal-Wallis) test for nonparametric ANOVA were implemented in Microsoft Excel using standard formulas [9]. The permutation-based method SAM [11] is freely available to academic researchers as an add-in for Microsoft Excel. To enable comparisons between SAM and the two ANOVA approaches, we considered a median false-detection rate of 1% in SAM as comparable to a 99% confidence level in ANOVA. Maximum fold changes and maximum differences in gene expression were calculated from the minimum and the maximum of the population-wise means across the five cellular populations examined.
Authors’ Affiliations
References
- Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, et al: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996, 14: 1675-1680.PubMedView ArticleGoogle Scholar
- Wodicka L, Dong H, Mittmann M, Ho MH, Lockhart DJ: Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat Biotechnol. 1997, 15: 1359-1367.PubMedView ArticleGoogle Scholar
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA. 2001, 98: 31-36. 10.1073/pnas.011404098.PubMedPubMed CentralView ArticleGoogle Scholar
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2001, 2: research0032.1-0032.11. 10.1186/gb-2001-2-8-research0032.Google Scholar
- Fambrough D, McClure K, Kazlauskas A, Lander ES: Diverse signaling pathways activated by growth factor receptors induce broadly overlapping, rather than independent, sets of genes. Cell. 1999, 97: 727-741.PubMedView ArticleGoogle Scholar
- Hill AA, Brown EL, Whitley MZ, Tucker-Kellogg G, Hunter CP, Slonim DK: Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls. Genome Biol. 2001, 2: research0055.1-0055.13. 10.1186/gb-2001-2-12-research0055.View ArticleGoogle Scholar
- Schadt EE, Li C, Su C, Wong WH: Analyzing high-density oligonucleotide gene expression array data. J Cell Biochem. 2000, 80: 192-202. 10.1002/1097-4644(20010201)80:2<192::AID-JCB50>3.0.CO;2-W.PubMedView ArticleGoogle Scholar
- Stuart RO, Bush KT, Nigam SK: Changes in global gene expression patterns during development and maturation of the rat kidney. Proc Natl Acad Sci USA. 2001, 98: 5649-5654. 10.1073/pnas.091110798.PubMedPubMed CentralView ArticleGoogle Scholar
- Zar JH: Biostatistical Analysis, 4th edn. Upper Saddle River, NJ: Prentice Hall. 1999Google Scholar
- Long AD, Mangalam HJ, Chan BY, Tolleri L, Hatfield GW, Baldi P: Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. analysis of global gene expression in Escherichia coli K12. J Biol Chem. 2001, 276: 19937-19944. 10.1074/jbc.M010192200.PubMedView ArticleGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMedPubMed CentralView ArticleGoogle Scholar
- Benjamini X, Hochberg X: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B. 1995, 57: 289-300.Google Scholar
- Hoffmann R, Seidl T, Neeb M, Rolink A, Melchers F: Changes in gene expression profiles in developing B cells of murine bone marrow. Genome Res. 2002, 12: 98-111. 10.1101/gr.201501.PubMedPubMed CentralView ArticleGoogle Scholar
- Rolink A, Grawunder U, Winkler TH, Karasuyama H, Melchers F: IL-2 receptor alpha chain (CD25, Tac) expression defines a crucial stage in pre-B cell development. Int Immunol. 1994, 6: 1257-1264.PubMedView ArticleGoogle Scholar
- Baugh LR, Hill AA, Brown EL, Hunter CP: Quantitative analysis of mRNA amplification by in vitro transcription. Nucleic Acids Res. 2001, 29: E29-10.1093/nar/29.5.e29.PubMedPubMed CentralView ArticleGoogle Scholar
- Luo L, Salunga RC, Guo H, Bittner A, Joy KC, Galindo JE, Xiao H, Rogers KE, Wan JS, Jackson MR, et al: Gene expression profiles of laser-captured adjacent neuronal subtypes [Erratum Nat Med 1999 Mar;5(3):355]. Nat Med. 1999, 5: 117-122. 10.1038/4806.PubMedView ArticleGoogle Scholar
- Grawunder U, Leu TM, Schatz DG, Werner A, Rolink AG, Melchers F, Winkler TH: Down-regulation of RAG1 and RAG2 gene expression in preB cells after functional immunoglobulin heavy chain rearrangement. Immunity. 1995, 3: 601-608.PubMedView ArticleGoogle Scholar
- Melchers F, Rolink A: B-Lymphocyte Development and Biology. In Fundamental Immunology. Edited by WE Paul. Philadelphia/New York: Lippincott-Raven;. 1999, 183-224.Google Scholar
- Mills JC, Syder AJ, Hong CV, Guruge JL, Raaii F, Gordon JI: A molecular profile of the mouse gastric parietal cell with and without exposure to Helicobacter pylori. Proc Natl Acad Sci USA. 2001, 98: 13687-13692. 10.1073/pnas.231332398.PubMedPubMed CentralView ArticleGoogle Scholar
- Ehrt S, Schnappinger D, Bekiranov S, Drenkow J, Shi S, Gingeras TR, Gaasterland T, Schoolnik G, Nathan C: Reprogramming of the macrophage transcriptome in response to interferon-gamma and Mycobacterium tuberculosis. Signaling roles of nitric oxide synthase-2 and phagocyte oxidase. J Exp Med. 2001, 194: 1123-1140. 10.1084/jem.194.8.1123.PubMedPubMed CentralView ArticleGoogle Scholar
- Eberwine J, Yeh H, Miyashiro K, Cao Y, Nair S, Finnell R, Zettel M, Coleman P: Analysis of gene expression in single live neurons. Proc Natl Acad Sci USA. 1992, 89: 3010-3014.PubMedPubMed CentralView ArticleGoogle Scholar
- Gene Expression Omnibus. [http://www.ncbi.nlm.nih.gov/geo/]