- Research
- Open Access
Normalization and analysis of DNA microarray data by self-consistency and local regression
- Thomas B Kepler^{1}Email author,
- Lynn Crosby^{2} and
- Kevin T Morgan^{3}
https://doi.org/10.1186/gb-2002-3-7-research0037
© Kepler et al., licensee BioMed Central Ltd 2002
- Received: 20 February 2002
- Accepted: 17 April 2002
- Published: 28 June 2002
Abstract
Background
With the advent of DNA hybridization microarrays comes the remarkable ability, in principle, to simultaneously monitor the expression levels of thousands of genes. The quantiative comparison of two or more microarrays can reveal, for example, the distinct patterns of gene expression that define different cellular phenotypes or the genes induced in the cellular response to insult or changing environmental conditions. Normalization of the measured intensities is a prerequisite of such comparisons, and indeed, of any statistical analysis, yet insufficient attention has been paid to its systematic study. The most straightforward normalization techniques in use rest on the implicit assumption of linear response between true expression level and output intensity. We find that these assumptions are not generally met, and that these simple methods can be improved.
Results
We have developed a robust semi-parametric normalization technique based on the assumption that the large majority of genes will not have their relative expression levels changed from one treatment group to the next, and on the assumption that departures of the response from linearity are small and slowly varying. We use local regression to estimate the normalized expression levels as well as the expression level-dependent error variance.
Conclusions
We illustrate the use of this technique in a comparison of the expression profiles of cultured rat mesothelioma cells under control and under treatment with potassium bromate, validated using quantitative PCR on a selected set of genes. We tested the method using data simulated under various error models and find that it performs well.
Keywords
- Additional Data File
- Core Gene
- Local Regression
- KBrO3
- Naive Method
Background
Among the most fascinating open questions in biology today are those associated with the global regulation of gene expression, itself the basis for the unfolding of the developmental program, the cellular response to insult and changes in the environment, and many other biological phenomena. The answers to some of these questions have been moved a few steps closer to realization with the advent of DNA hybridization microarrays [1,2,3,4,5,6]. These tools allow the simultaneous monitoring of the expression levels of hundreds to tens of thousands of genes - sufficient numbers to measure the expression of all of the genes in many organisms, as is now being done in the eukaryote Saccharomyces cerevisiae [7,8].
If we designate the intensity of a given spot in the microarray as I and the abundance of the corresponding mRNA in the target solution as A, we have, under ideal circumstances,
I = NA + error (1)
where N is a constant, unknown normalization factor. When comparing two different sets of intensities, these factors (or at least their relative sizes) must be determined in order to make a relative comparison of the abundances A.
The simple normalization techniques commonly used at this time assume Equation 1. Under these conditions, normalization amounts to the estimation of the single multiplicative constant N for each array. This task can be implemented by whole-array methods, using the median or mean of the spot intensities or by the inclusion of control mRNA.
We have found in a variety of different hybridization systems that the response function is neither sufficiently linear, nor consistent among replicate assays; the relationship between the intensity and the abundance is more complicated than that found in Equation 1. There may, for example, be a constant term, interpretable as background:
I = N_{0} + N_{1}A + error, (2)
or the intensity may saturate at large abundance:
Both these situations render simple ratio normalizations inadequate. The problems are not obviated by the use of 'housekeeping' genes as controls. First, their quantitative stability is not a priori assured, nor has such stability been demonstrated empirically, and second, even if such genes were found, the nonlinearity of the response is not addressed by this technique. Neither can extrinsic controls (such as bacterial mRNA spiked into human targets) ensure adequate normalization, as the relative concentration of control to target mRNA cannot itself be known with sufficient accuracy. Even simultaneous two-color probes on the same microarray do not eliminate the problems of normalization because of variation in the relative activity and incorporation of the two fluorescent dyes.
One possible approach to the normalization problem would be to obtain detailed quantitative understanding of each step in the process in order to develop a mechanistic model for the response function. This approach is almost certainly important for the optimization of array design, but may not be necessary for data analysis. Alternatively, one may use the vast quantity of data generated and the assumption of self-consistency to estimate the response function semi-parametrically.
The underlying idea is that the majority of genes will not have their expression levels changed appreciably from one treatment to the next (Figure 1). Clearly, there may be some treatment pairs for which this is not a reasonable assumption, but we argue that as long as the cell is alive, the basic mechanism of cell maintenance must continue; the relevant gene products must be kept at relatively stable levels. This approach can be viewed as a generalization of the method of using 'housekeeping' genes to normalize the array. But rather than choosing a particular set of genes beforehand, assuming that their expression levels are constant across treatments, we assume that there is a stable background pattern of activity, that there is a transcriptional 'core', and identify its constituent genes statistically for each experiment.
The essential contrast between our method based on self-consistency and that based on control genes determined a priori is concisely captured in the following flow diagrams.
Normalization by controls identified a priori
- 1.
Assume that some genes will not change under the treatment under investigation.
- 2.
Identify these core genes in advance of the experiment (housekeeping genes, extrinsic controls)
- 3.
Normalize all genes against these genes assuming they do not change
- 4.
Done.
Normalization by self-consistency
- 1.
Assume that some genes will not change under the treatment under investigation.
- 2.
Initially designate all genes as core genes.
- 3.
Normalize (provisionally) all genes against the core genes under the assumption that the true abundance of the core genes does not change.
- 4.
Determine which genes appear to remain unchanged under this normalization; make this set the new core.
- 5.
If the new core differs from the previous core, then go to step 3.
- 6.
Else: done.
Modeling and estimation
We concentrate here on the experimental design with two treatment groups and two or more replicate arrays per group. Generalization to more than two groups is straightforward. Comparisons made without replicate arrays are also possible, and much of the methodology discussed here can be applied in that case as well, but the lack of true replicates introduces unique non-trivial problems that will not be considered here.
The basic model
Let Y_{ ijk } = logI_{ ijk } denote the logarithm of the measured intensity of the kth spot in the jth replicate assay of the ith treatment group. Thus, k ranges from 1 to G, the number of genes per array, j ranges from 1 to r_{ i }, the number of replicate arrays within the ith treatment group, and i takes values from 1 to the number of treatment groups. The examples in this paper use two treatment groups. The logarithmic transformation converts a multiplicative normalization constant to an additive normalization constant. We also find that this transformation renders the error variances more homogeneous than they are in the untransformed data. Then the error model corresponding to Equation 1 is:
Y_{ ijk } = υ_{ ij } + α_{ k } + δ_{ ik } + σ_{0}ε_{ ijk } (4)
Estimation by self-consistency
Estimation of the parameters in Equation 4 is carried out in an iteratively reweighted least-squares (IRLS) procedures. First, let c_{ k } indicate the assignment of the kth gene to the core set: c_{ k } = 0 if gene k is not in the core and c_{ k } = 1/ G if gene k is in the core, where G is the number of genes in the core. The vector c is thus normalized: Σ_{ k }c_{ k } = 1. These indicators play the role of weights in an IRLS. Although they do depend on other estimated parameters, in each iteration the weights are treated as constants, depending only on parameter estimates from the previous iteration.
The notion of self-consistency arises in the combined processes of identifying the core and normalizing the data: the choice of genes belonging to the core depends on the normalization, and the optimal normalization depends on which genes are identified with the core.
We start by minimizing the core sum of squares (SS_{ C }):
where a and n are the estimators for and , respectively; overbars indicate averages over the dotted subscripts, for example, .
The normalized and scaled data are now given by
Note that if all of the genes are placed in the core, we have
as expected.
Now we estimate the differential treatment effects by minimizing the residual sum of squares,
Note that the matrix, d, of differential treatment effects obeys Σ_{ i }r_{ i }d_{ ik } = 0, as we would hope.
Self-consistency requires that the vector of core indicators c depend on the estimated differential treatment effects, d. We have tried several methods for implementing an appropriate dependence and find that one of the simplest schemes works very well. We simply fix the proportion of genes in the core, rank the genes by the square of the estimated differential treatment effect and remove from the core for the next iteration those genes in the 1 - quantile.
We carry out the estimation iteratively. We start with c_{ k } = 1/G for all k (all genes in the core) and estimate _{ ik } by Equation 10. We then update c according to Equation 11 and repeat the estimation of with this new c. We stop when c does not change from one iteration to the next.
The local regression model
What we find in the analysis of experimental data, however, is that Equation 1 with N constant is not adequately realistic. A more flexible approach that covers the contingencies of Equations 1-3 and many others, is to generalize Equation 4 to
Y_{ ijk } = υ_{ ij } (α_{ k }) + α_{ k } + δ_{ ik } + σ(α_{ k })ε_{ ijk }. (12)
Local regression
Local regression is a generalization of the intuitive idea of smoothing by using a moving average. In local regression, one goes beyond computing the local average of a set of measured points by estimating, at each value of the predictor variables, all of the coefficients in a Pth-order regression in which the regression coefficients themselves are slowly varying functions of the predictor variable. Computation of a moving average is thus a zeroth order local regression. The availability of inexpensive powerful computing has sparked renewed interest in local regression techniques and its theoretical underpinnings have been extensively elucidated [9,10,11].
f(u'; (u)) = _{0} (u)+ _{1}(u)(u-u') + ... + _{ P }(u)(u-u')^{ P }. (13)
For fixed u, f(u'; (u)) is a polynomial in u' with coefficients _{ i }(u). These coefficients will be constrained to vary slowly with u, the quantitative rates of change specified by a parameter introduced below. Second, we estimate (u) as
where b is the vector of estimators for . In other words, we estimate the coefficients and evaluate the function at u'=u. The terms of order greater than 0 vanish, but the estimates for the remaining zeroth-order terms depend nevertheless on the estimated higher-order coefficients, as follows. Given a dataset consisting of n pairs (u_{ i },v_{ i }), i ∈ (1,...,n), we estimate the coefficients at a point u (not necessarily corresponding to any u_{ i } in the dataset), by minimizing a weighted sum-of-squares over :
The weighting functions w are given by
where W is a symmetric function having a simple maximum at the origin, strictly decreasing on [0,1] and vanishing for u ≥ 1. For our application in this paper, we use the efficiently computed tricube function
The function h is known as the bandwidth, and controls just how slowly f varies with u. We choose the bandwidth so as to give equal span at all points u. The span is defined as the proportion of points u_{ i } contained in a ball of radius h(u). This choice of bandwidth function is used in Loess regression [11]. For all of the computations in this paper, we have used a span of 0.5.
b_{ i }(u) = L_{ i } (u)v (18)
Where L_{ i } is the linear operator appropriate to the ith coefficient and v is the vector with components _{ k }. Note that the L_{ i } depends on the order P of the local regression. For any given value of P, the L_{ i } can be explicitly written down, but quickly become algebraically complicated.
The local regression estimate of f(u; (u)) is
Because of this linearity, the sampling distributions for these coefficients are known and we can compute their sampling variances in a straightforward manner [11].
To adapt this method to the problem of normalization, and simultaneously to implement self-consistency, we take for the weighting functions the product of a tricube and a core indicator:
where c_{ k } is the core indicator as given in Equation 11 and the a_{ k } are given by Equation 6. In these terms, the local regression estimate n of v is given by
with the normalized data given by
and the differential treatment effects by
Again, we have Σ_{ i }r_{ i }d_{ ik } = 0. The core indicator vector c is then iterated to fixation as described in the previous section but with Σ_{ i }r_{ i }d_{ ik }^{2} compared against s^{2}( _{ k }) where s^{2}( ) is the estimated local variance, discussed in the next section.
Local variance estimation
In addition to local nonlinearities in the response curve, we also find that the data are heteroscedastic: the error variance shows a clear dependence on the estimated abundance. The logarithmic transformation removes a substantial part of this dependence, but does not flatten it out entirely. One might try an a priori accounting of the sources of error and thereby provide a parametric model for it, but the number of potential error sources is large, so we instead choose a flexible error model and estimate local variance by again using local regression. The technique involves computing the local likelihood and the effective residual degrees of freedom and is described in detail in [11]. Their ratio of the local likelihood and the effective degrees of freedom provides a smooth estimate of the local variance. The estimated residuals are not strictly linear functions of Y because of the implicit dependence of the indicator vector c on the data Y and because of our use of the estimator a, rather than a strictly independent variable, as the predictor for the local regression. We expect these corrections due to nonlinearities to be small and thus neglect them in our estimates of the local variance.
At this stage, we have computed a first-order approximate solution for the estimation problem. We may now perform another iteration (in addition to the iterated solution for the core indicator c) to improve the approximation, reweighting the data by the inverse of the estimated local variance. Our experience, however, has been that the first-order corrections are sufficient and the higher-order corrections are more difficult to compute and make little difference in the final analysis. For the applications and validation tests that follow, we use just the first-order corrections.
Pairwise expression-level comparisons
We perform individual pairwise hypothesis tests for each spot in the array by computing the statistic
where s(a_{ k }) is the square-root of the local variance at the mean relative expression value a_{ k }. We test z as a standard normal under the null hypothesis of no expression difference.
Validation
Results and discussion
The gene-expression pattern observed for rat mesothelial cells was indicative of oxidative stress, mitotic arrest and possibly increased apoptosis. (All changes listed are significant at the 0.05 level). Oxidative-stress-responsive genes for heme oxygenase-1 (HO-1), quinone reductase/NMOR/DT diaphorase (QR), growth arrest and DNA damage 45 (GADD45), heat-shock protein 70 (HSP70), among others, showed increased expression, as did transcriptional regulatory genes for c-Jun, c-Fos, Jun D, Jun B, c-Myc and inhibitory κB subunit (IκB). Proteasome components involved in protein repair (Rδ, RC10-II, C3, RC-7, HR6B ubiquitin-conjugating enzyme and ubiquitin) and genes for DNA repair proteins proliferating cell nuclear antigen (PCNA), mismatch repair protein 2 homolog (Msh2), and 0-6 methylguanine DNA methyltransferase were upregulated. The lipid peroxide excision enzyme phospholipase A2 (PLA2) exhibited increased expression, as did apoptogenic genes for tumor necrosis factor υ (TNF-υ), inhibitory nitric oxide synthase 1 (iNOS1) and Fas ligand (FasL). Other components involved in apoptosis including the anti-apoptotic B-cell lymphoma 2 (Bcl-2), and the pro-apoptotic Bcl-2-associated X protein υ (bax υ), Bcl-XL/Bcl-2 associated death promoter homolog (Bad) and Bcl-2 related ovarian killer protein (bok) (at 12 hours), and cell-cycle control elements known as cyclins (at 4 and 12 hours), were downregulated. Several genes that inhibit the cell from entering the cell cycle were increased significantly at both time points.
Confirmation by quantitative PCR
Quantitative PCR analysis confirmed nine gene changes. The tenth, PLA2, could not be confirmed because of lack of signal in both treatment groups and was therefore likely to be due to a problem in the PCR for that gene [12].
Morphologic analysis revealed complete mitotic arrest by 4 hours post-exposure, with increased numbers of condensed cells with pyknotic nuclei, believed to be apoptotic. Strong HO-1-specific staining was observed in treated cells, whereas control cells showed weak nonspecific staining, or no staining at all.
Statistical characteristics of the data
Assessment of algorithm performance on data simulated according to the homoscedastic error model
Power | Rate of false positives | RMS bias (×10^{-2}) | ||||||
---|---|---|---|---|---|---|---|---|
f | q | Naive | NoSeCoLoR | Naive | NoSeCoLoR | 5th percentile | 95th percentile | |
10 | 1.5 | 0 | 0.318 | 0.315 | 1.024 | 1.035 | 0.937 | 1.710 |
10 | 1.5 | 1 | 0.127 | 0.300 | 0.929 | 0.933 | 16.559 | 17.872 |
10 | 2.5 | 0 | 0.989 | 0.974 | 1.004 | 1.181 | 1.524 | 3.292 |
10 | 2.5 | 1 | 0.689 | 0.971 | 0.955 | 0.968 | 15.776 | 17.163 |
20 | 1.5 | 0 | 0.327 | 0.314 | 0.975 | 1.002 | 1.079 | 2.226 |
20 | 1.5 | 1 | 0.129 | 0.295 | 0.883 | 0.973 | 16.380 | 17.742 |
20 | 2.5 | 0 | 0.985 | 0.939 | 1.000 | 1.662 | 3.359 | 5.763 |
20 | 2.5 | 1 | 0.684 | 0.941 | 0.889 | 1.298 | 15.279 | 16.823 |
Assessment of algorithm performance on data simulated according to the heteroscedastic error model (Equation 26)
Power | Rate of false positives | RMS bias (×10^{-2}) | ||||||
---|---|---|---|---|---|---|---|---|
f | q | Naive | NoSeCoLoR | Naive | NoSeCoLoR | 5th percentile | 95th percentile | |
10 | 1.5 | 0 | 0.312 | 0.346 | 1.577 | 0.890 | 0.933 | 1.669 |
10 | 1.5 | 1 | 0.130 | 0.342 | 0.775 | 0.784 | 16.536 | 17.763 |
10 | 2.5 | 0 | 0.982 | 0.939 | 1.482 | 0.970 | 1.474 | 3.447 |
10 | 2.5 | 1 | 0.683 | 0.939 | 0.749 | 0.855 | 15.740 | 17.271 |
20 | 1.5 | 0 | 0.313 | 0.345 | 1.600 | 0.878 | 0.930 | 2.091 |
20 | 1.5 | 1 | 0.128 | 0.324 | 0.784 | 0.803 | 16.320 | 17.722 |
20 | 2.5 | 0 | 0.983 | 0.905 | 1.560 | 1.367 | 3.113 | 5.967 |
20 | 2.5 | 1 | 0.685 | 0.909 | 0.751 | 1.078 | 15.299 | 16.821 |
Assessment of algorithm performance on data simulated according to a model with homoscedastic multiplicative error plus additive (background) error
Power | Rate of false positives | RMS bias (×10^{-2}) | ||||||
---|---|---|---|---|---|---|---|---|
f | q | Naive | NoSe-CoLoR | Naive | NoSe-CoLoR | 5th percentile | 95th percentile | |
10 | 1.5 | 0 | 0.266 | 0.380 | 1.607 | 1.089 | 1.840 | 6.824 |
10 | 1.5 | 1 | 0.127 | 0.317 | 7.791 | 0.888 | 7.227 | 34.413 |
10 | 2.5 | 0 | 0.628 | 0.636 | 1.687 | 1.117 | 1.859 | 8.019 |
10 | 2.5 | 1 | 0.292 | 0.630 | 9.987 | 0.970 | 9.842 | 37.617 |
20 | 1.5 | 0 | 0.275 | 0.384 | 1.468 | 1.031 | 2.006 | 6.927 |
20 | 1.5 | 1 | 0.126 | 0.296 | 8.857 | 0.895 | 9.741 | 34.407 |
20 | 2.5 | 0 | 0.635 | 0.646 | 1.361 | 1.384 | 2.228 | 7.120 |
20 | 2.5 | 1 | 0.282 | 0.608 | 8.887 | 1.063 | 10.778 | 34.203 |
In addition to the experiments reported here, we have examined data from several other microarray platforms and find that in terms of the heteroscedasticity and apparent bias, they are qualitatively similar (not shown).
Simulation studies
To determine the reliability of our methods, we generated simulated data under a number of models based on the statistical characteristics of the data obtained in our hybridization experiments. All of the simulated data was produced using FORTRAN programs calling IMSL subroutines for sorting, cubic spline interpolation and random number generation.
Homoscedastic error model
In the first set of tests, the data were generated by simulations of the model
What we find (Table 1) is that the power of the test for the naive analysis is diminished by the presence of bias. For the local-regression analysis (NoSeCoLoR), the power is unaffected by the presence of bias. Furthermore, when the proportion, , of affected genes among all genes is small ( = 10%), the power of the two methods is about the same. When = 20%, the naive method has slightly better power when bias is absent.
Heteroscedastic error model
In this case (Table 2), we find as before that bias diminishes the power of the naive procedure, but not that of NoSeCoLoR. In addition, the rate of false positives is now notably high for the naive method. NoSeCoLoR yields consistently smaller false-positive rates, although when large proportions of genes are affected and have large effect size, the rate of false positives with NoSeCoLoR is also larger than nominal.
Compound error model
The model given by Equation 12 is intended to be flexible and to be a reasonable approximation to a variety of models. One particularly common source of nonlinearity is additive error (on the untransformed data), or background with nonzero mean (Equation 2). We have therefore simulated data according to a model given by
I_{ ijk } = exp {α_{ k } + v_{ ij } + δ_{ ik } + ε_{ ijk }} + exp {ζ_{ ij } + η_{ ijk }} (26)
It is in this simulation that the naive method fails most dramatically. For all datasets, the naive method gives false-positive rates significantly greater than nominal, some as much as ten-fold higher than nominal. NoSeCoLoR has much better error rates, although as seen before, performance starts to suffer when larger numbers of spots are affected. The power of comparisons using NoSeCoLoR is again much more resistant to changes in the effective bias level (c in Table 3) than is the naive method.
Conclusions
We have presented a method for normalizing microarray data that relies on the statistical consistency of relative expression levels among a core set of genes that is not identified in advance, but inferred from the data itself. The normalization and variance estimation are both performed using local regression. We are then able to perform standard comparison tests. This technique reveals biologically plausible expression-level differences between control mesotheliomas and mesotheliomas treated with a potent inducer of oxidative stress. The expression changes identified by our normalization methodology were confirmed by quantitative PCR in all cases but one where there was no detectable PCR amplification at all.
Our simulation studies show that our normalization technique performs well. The worst case occurs when the response curve is perfectly linear, the variance constant and a large proportion of genes experiences sizable expression-level changes. Under these conditions, our method has a false-positive rate somewhat greater than nominal and self-consistent normalization without local regression performs slightly better than that with local regression. On the other hand, our method is insensitive to bias and heteroscedasticity, both of which have a significant deleterious effect on the naive method. Furthermore, bias and heteroscedasticity are both measurably present in all data that we have examined from microarrays from a number of different manufacturers and from several different laboratories. In these cases, local regression performs better than self-consistency alone. When the data are generated by an additive-plus-multiplicative error model, the naive method completely breaks down, whereas our method continues to perform well.
We have applied these methods to the analysis of microarray data in toxicogenomic studies [12,14], where the results made good biological sense and, where relevant, were confirmed by subsequent experimentation. All data-analytic techniques benefit from extensive use and assessment using several platforms and diverse biological systems. To facilitate this process for the methods described here, and to provide them to the interested research community, we have made the software used to implement them available for non-commercial use [13].
DNA hybridization microarrays promise unprecedented insight into many areas of cell biology, and statistical methods will be essential for making sense of the vast quantities of information contained in their data. Efficient and reliable normalization procedures are an indispensable component of any statistical method; further development and analysis of error models for microarray data will be a worthwhile investment.
Materials and methods
Clontech microarrays
This is a brief description of the experimental methods; complete details can be found in [12]. Immortalized rat peritoneal mesothelial cells (Fred-Pe) developed in-house were grown in mesothelial cell culture media as previously described [12] for several months before experiment with weekly subculturing. Cells plated at 1 × 10^{7} cells/150 mm dish in 30 ml media were grown for 24 h and treated with the pre-determined ED_{50} concentration of 6 mM KBrO_{3} for 4 or 12 h. Cells were detached using a cell lifter and centrifuged at 175g for 3 min. The supernatant (medium) was removed by aspiration and cells were re-suspended in 1 ml sterile PBS and frozen at -80°C until RNA extraction. The Atlas Pure Total RNA protocol for poly(A)^{+} mRNA extraction was used. Samples were hybridized in manufacturer-supplied hybridization solution (Clontech ExpressHyb) for 30 min at 68°C. After hybridization, the membranes were washed, removed, wrapped in plastic wrap, and placed against a rare-earth screen for 24 h, followed by phosphoim-ager detection and AtlasImage analysis before application of the software tools described in this paper.
Quantitative PCR
Confirmation by Taqman (Perkin-Elmer) quantitative PCR was performed for nine selected genes as described in [12]. The genes selected for confirmation were those for cyclin D1, GADD45, GPX, HO-1, HSP70, Mdr-1, QR, prostaglandin H synthase (PGHS), p21WAF1/CIP1 and PLA2. Two control and two treated samples from the 4-h time point, and two control and one treated from the 12-h time point, were analyzed. Each plate contained duplicate wells of each gene, and 16 no-template control (NTC) wells divided evenly among four quadrants.
Analysis
Software for the implementation of the statistical estimation and testing procedures described above was written in FORTRAN and run on desktop PCs [13]. Additional statistical computations were performed using S-plus 4.5 (MathSoft).
Additional data files
The additional data files available or from [13] consist of several files for implementing the methods described here: NoSeCoLoR.exe is the executable file, compiled for Windows, for the program itself; NoSe-CoLoR-The-Manual.pdf is the user's guide and contains information on input formatting and the interpretation of output files; README.txt contains instructions for installation and start-up;. there are several sample input files and associated output files.
Declarations
Acknowledgements
This work was supported by grant number MCB 9357637 from the National Science Foundation (T.B.K.) and by a research grant from Glaxo-Wellcome, Inc. (T.B.K.).
Authors’ Affiliations
References
- Fodor SP, Rava RP, Huang XC, Pease AC, Holmes CP, Adams CL: Multiplexed biochemical assays with biological chips. Nature. 1993, 364: 555-556. 10.1038/364555a0.PubMedView ArticleGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270: 467-470.PubMedView ArticleGoogle Scholar
- DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PP, Ray M, Chen Y, Su YA, Trent JM: Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nat Genet. 1996, 14: 457-460.PubMedView ArticleGoogle Scholar
- Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996, 14: 1675-1680.PubMedView ArticleGoogle Scholar
- DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278: 680-686. 10.1126/science.278.5338.680.PubMedView ArticleGoogle Scholar
- Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Hudson J, Boguski MS, et al: The transcriptional program in the response of human fibroblasts to serum. Science. 1999, 283: 83-87. 10.1006/abio.2000.4611.PubMedView ArticleGoogle Scholar
- Wodicka L, Dong H, Mittmann M, Ho MH, Lockhart DJ: Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat Biotechnol. 1997, 15: 1359-1367.PubMedView ArticleGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.PubMedPubMed CentralView ArticleGoogle Scholar
- Cleveland WS, Devlin SJ: Locally weighted regression: An approach to regression analysis by local fitting. J Am Stat Assoc. 1988, 83: 596-610.View ArticleGoogle Scholar
- Loader CR: Local likelihood density estimation. Annls Statistics. 1996, 24: 1602-1618. 10.1214/aos/1032298287.View ArticleGoogle Scholar
- Loader CR: Local Regression and Likelihood. New York: Springer-Verlag;. 1999Google Scholar
- Crosby LM, Hyder KS, DeAngelo AB, Kepler TB, Gaskill B, Benavides GR, Yoon L, Morgan KT: Morphologic analysis correlates with gene expression changes in cultured F344 rat mesothelial cells. Toxicol Appl Pharmacol. 2000, 169: 205-221. 10.1006/taap.2000.9049.PubMedView ArticleGoogle Scholar
- NoSeCoLor: normalization by self-consistency and local regression, (software and documentation). [ftp://ftp.santafe.edu/pub/kepler/]
- Morgan KT, Ni H, Brown HR, Yoon L, Qualls CW, Crosby LM, Reynolds R, Gaskill B, Anderson SP, Kepler TB, et al: Application of cDNA microarray technology to in vitrotoxicology and the selection of genes for a real time RT-PCR-based screen for oxidative stress in Hep-G2 cells. Toxicol Pathol. 2002, Google Scholar