Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments
© Townsend and Hartl, licensee BioMed Central Ltd 2002
Received: 23 May 2002
Accepted: 3 October 2002
Published: 20 November 2002
Methods of microarray analysis that suit experimentalists using the technology are vital. Many methodologies discard the quantitative results inherent in cDNA microarray comparisons or cannot be flexibly applied to multifactorial experimental design. Here we present a flexible, quantitative Bayesian framework. This framework can be used to analyze normalized microarray data acquired by any replicated experimental design in which any number of treatments, genotypes, or developmental states are studied using a continuous chain of comparisons.
We apply this method to Saccharomyces cerevisiae microarray datasets on the transcriptional response to ethanol shock, to SNF2 and SWI1 deletion in rich and minimal media, and to wild-type and zap1 expression in media with high, medium, and low levels of zinc. The method is highly robust to missing data, and yields estimates of the magnitude of expression differences and experimental error variances on a per-gene basis. It reveals genes of interest that are differentially expressed at below the twofold level, genes with high 'fold-change' that are not statistically significantly different, and genes differentially regulated in quantitatively unanticipated ways.
Anyone with replicated normalized cDNA microarray ratio datasets can use the freely available MacOS and Windows software, which yields increased biological insight by taking advantage of replication to discern important changes in expression level both above and below a twofold threshold. Not only does the method have utility at the moment, but also, within the Bayesian framework, there will be considerable opportunity for future development.
Methods for analysis of cDNA microarray data include those that cluster hierarchically  by principles of self-organization  or by k-means . These methods yield enormous amounts of information about similarities of cell state and coordination of gene regulation, and are useful for grouping genes or transcriptional profiles by similarity. They have the limitation that although experimental replication enhances the significance of groupings observed, the groupings do not inherently quantify signal and noise. A fold-value cutoff originally was used for this purpose , and held double duty as a signifier of true signal and a boundary beyond which observed fold-measures were considered to be reflective of actual fold-change. Other approaches use likelihood-based methods [5,6] to obtain P-values for gene expression differences in replicated comparisons. These methods make the assumptions and have the power of model-based statistics, but as yet are not formulated to handle more than two genotypes, environments, or developmental states within a single, cohesive framework.
One method for analyzing experiments that involve numerous treatments is the use of analysis of variance on microarray data. Methods have been developed that can yield a profusion of information about the sources of experimental variation [7,8] or, at a biological level, about the proportion of variation in expression profile attributable to biological factors such as sex or genotype . These methods can estimate the magnitude of effects as well as significance, but also impose considerable constraints on experimental design , and they are not robust to missing or excluded data.
Volcano plots  have highlighted well the important distinction between biological and statistical significance. There are effects that may be biologically important that may not be statistically significant, and vice versa. Because many microarray experiments can have a complex and unbalanced design, owing to the technical failure of certain hybridizations and the iterative nature of the work itself, we have developed an approach for assessing statistical significance that could potentially use all the available observations in any transitively connected design. Our goal is to identify effects of biologically significant magnitude to statistically significant precision.
To that end, we introduce a Bayesian analysis of gene expression level (BAGEL) model for statistical inference of gene expression and demonstrate its utility by re-examining cDNA microarray data on the response of yeast to ethanol shock , on transcriptional regulation by SNF2 and SWI1 , and on zinc regulation .
Results and discussion
The observed MSN4 ratios in two replicate microarrays were inconsistent (0.6- and 2-fold). This, in fact, gives very weak support to the hypothesis that the nodes are expressed at nearly the same level. This is in part because, conditional on higher error variances, larger differences in expression become increasingly likely. This effect is seen more dramatically in the contour plot for HSF1, for which the highly dispersed, somewhat inconsistent ratios of 0.9-fold, 3-fold and 5-fold were observed. Consistent data, even when relatively dispersed (ratios 3-fold, 5-fold and 10-fold), as for GCR1, shows this effect but with a greater slope in the likelihood surface. Although Figure 2a depicts two-dimensional surfaces from an ordinary treatment-reference experimental design with a common variance term, it shows typical simple topologies of the likelihood surface. Increasing the amount of data in larger datasets accelerates reliable convergence upon the stationary distribution of the Markov chain, which is required for inference of the posterior distributions of the parameters of interest. Furthermore, in larger datasets, these posterior unconditional distributions of the parameters are unimodal. By inference, the multidimensional likelihood surfaces are expected to be fairly simple.
These likelihood plots clearly convey the most probable expression levels. To determine statistical significance, we examine the posterior distributions of the parameters, as determined by Markov chain Monte Carlo simulation. In fact, the credible intervals for the expression level for all three of these genes overlap (Figure 2b). There is not enough information in the replicated comparisons of each gene's ratio of expression to constrain the variance parameter to a small enough value so that expression levels could be inferred to lie within a small range.
There are many reasons why data points in microarray experiments are wisely excluded from analysis. Whole sectors occasionally fail due to premature drying during hybridization, spots are occasionally malformed, and experimental signal is frequently so low compared to background that a spot is best excluded from the data. Only 2,041 genes of 6,138 in these data passed the spot background-foreground acceptance criteria (see Materials and methods) in all three experiments. Of these genes, 65 were significantly different by a gene-by-gene criterion, demanding nonoverlapping 95% confidence intervals. A further 2,337 genes had one observation missing; their credible intervals were appropriately wider after BAGEL analysis of the data and only 22 were statistically significant. Genes with one acceptable observation were not analyzed. Use of an informative prior distribution on the variance, however, would allow such analysis.
Pairwise comparisons of gene-expression level category by mean log2 ratio and by Bayesian estimation with statistical significance
Mean log2 ratio
versus mid-log growth
swi1Δ and snf2Δ
swi1Δ and snf2Δ
Deletion of SWI1 and SNF2
Whether these genes represent all of the genes controlled by the Snf/Swi complex depends, of course, on what level of difference is deemed biologically significant, which in many cases depends upon the gene of interest. For many transcription factors, it may be that small changes in expression level cause very large changes in biological state [25,26,27], and for many metabolic genes, very large changes in expression may change pathway throughput very little . Note also that these criteria do not distinguish between cis and trans effects on expression level. Presumably, many trans effects percolate into much or all of the genome in very small ways; some of these effects may be overwhelmed by systematic error induced by the technology, but most should be detectable with sufficient replication.
More important, there are just two genes in this global dataset for which the credible intervals of expression levels in snf2 and swi1 mutants do not overlap. This answers a question posed by Sudarsanam et al. as to whether the differences in gene expression between these mutants that they observed were due to variation in microarray measurements or to real differences between the mutants . Nearly all differences they observed in the transcriptional profiles of these two mutants are potentially due to variation in microarray experiments. A scatterplot of the log2 estimates of gene-expression levels for genes significantly different from wild type, in the two deletion mutants compared to wild type, yields a linear correlation of 0.97 (Figure 6b). The large number of genes meagerly expressed compared to wild type in both deletion mutants is consistent with the roles of Snf2p and Swi1p as transcriptional activators. In addition, globally, not a single gene shows a contrasting change in expression in comparison to wild type. This affirms the conclusion  that the two genes work almost entirely in concert in the media conditions tested. The expression levels of the two genes detected by BAGEL as having significantly different expression in the two mutants, ECM33 and YOL154w, are shown in Figure 7b. The small magnitude of the difference observed in ECM33 between the two strains may indicate that this highly statistically significant difference is biologically irrelevant, but certainly further investigation of the significantly different gene-expression levels observed in YOL154w is warranted.
Note that although there is esthetic appeal to having the measurements reported on an absolute basis of transcripts per cell, little additional information is obtained by this transformation for most experimental questions, as the result is simply a rescaling of the y-axis for each gene. Also, whereas the statistical significance of BAGEL results is robust to considerable deviation from general linearity of cDNA microarray results, the quantitative result using SAGE counts as a basis depends critically upon the linearity of cDNA microarray measurements. Lastly, it is unclear how to scale meagerly expressed genes that had a count of zero in a SAGE analysis. PHO4 is such a gene; in Figure 8b, we have arbitrarily charted a frequency that would be appropriate if it were present at approximately one order of magnitude below the SAGE experiment's detection threshold of 0.1 molecules per cell. Neither PHO2 nor PHO4 shows significant evidence of differential expression in snf2Δ/swi1Δ mutants. Consideration of the raw ratio data led Sudarsanam et al.  to thus conclude that PHO5 gene expression is directly controlled by Snf2p/Swi1p. PHO3, a 'constitutive' acid phosphatase, shows the same relative abundance across snf2Δ/swi1Δ mutants as does PHO5 (data not shown). The 87% identity of the nucleotide sequence of these genes means that cross-hybridization may confound inference about their regulation. However, unlike PHO5, there is currently no evidence that PHO3 is regulated by PHO2 or PHO4. PHO5 may share a direct Snf2p/Swi1p regulatory mechanism with PHO3.
Analysis presented here supports the result that SWI1 and SNF2 work almost entirely in concert within the cell in rich medium . As Sudarsanam et al. noted, however, the proteins may not always be produced together in all conditions. Moreover, it should be noted that their experimental design has the least power to detect these specific differences. This is because all comparisons of expression level between the mutant strains are transitive comparisons, which inherit the variance associated with the intermediary wild-type expression level as well as the variance associated with the expression levels of the two mutant strains. The result is that on a log scale the credible intervals of gene-expression level are broader for the two mutants than for the wild type. This outcome is a recurrent problem with a repeated reference-sample experimental design: one learns the most information about the reference sample, which is frequently arbitrary and not necessarily of interest. An ideal addition to the experimental design in Figure 1b would be several direct comparisons of the mutant strains. Then, both transitive and direct information would contribute to the statistical power of a BAGEL analysis for any comparison. It is also clear that an increase in number of comparisons would yield more power to detect differentially expressed genes, and would indeed find more of them. Every single gene that was detected as significantly differentially expressed in only one of the two mutants compared to wild type in the rich medium was significantly different in expression level only in the swi1Δ mutant. The swi1Δ mutant comparison to wild type had one more replicate than did the comparison of the snf2Δ mutant (Figure 1b).
Lyons et al. examined wild-type and zap1 mutant strains of yeast growing in cultures containing three different concentrations of zinc (, and Figure 1c). Zap1p is a transcriptional activator that appears to regulate transcription of the zinc-uptake system genes in response to zinc . Using the nonoverlapping 95% credible interval criterion, a BAGEL analysis on this data reveals 469 genes significantly more abundantly expressed in cells grown in zinc-deficient medium compared to cells grown in zinc-supplemented medium, and reveals 261 genes significantly less abundant in the same comparison. This is a total of about 10% of the genome, and is two thirds of the number found by use of an averaged twofold criterion. A considerable number of the genes viewed as abundant by a twofold criterion, then, are not significantly different by the credible interval criterion (Table 1, Figure 6c, lower left and lower right quadrants). Moreover, 42 of the genes significantly different by the credible-interval criterion are significantly different at a ratio of below twofold (Figure 6c, densely packed between the dashed lines delimiting a twofold change). As one might generally expect, making ZAP1 nonfunctional in a zinc-deficient environment creates a similar relative effect on most significantly differentially expressed genes to that created by providing zinc to a zinc-deficient wild-type strain (Figure 6d).
A common technique to discover candidate genes of interest in microarray studies is to pick out overlapping sets of genes expressed at a ratio greater than some cutoff from replicate and/or different microarray experiments (for example, Figure 1 of , Figure 1 of ). BAGEL supplants any need to examine overlapping gene sets in replicate experiments. Moreover, it provides a statistically rigorous method for comparison of multiple different experiments. The equivalent of an overlap in two lists of highly expressed genes is that neither of the credible intervals for gene-expression level in a given gene in two experimental conditions overlaps the reference condition.
For an analysis such as this one, the statistical power could be improved if expression nodes were compared to more than two neighbors. Specifically, direct comparisons between wild type in 3 mM Zn and zap1 in 3 mM Zn would complete a circle of comparisons, increasing transitive information on all expression nodes. Cross-circle comparisons would also contribute considerable power. Generally, eliminating 'ends' of chains of comparisons should be a goal of any cDNA microarray experimental design.
Small datasets such as the ethanol dataset are very good candidates for use of a more informative prior distribution to keep variances within a reasonable range and yield better results. Analysis of larger datasets indicates that true variances for microarray data within this model are not larger than one. The impact of using less-vague priors, especially for the variance, when there are few comparisons, is under investigation by a number of researchers. A hierarchical Bayesian model  has been used to analyze ratio data and provide 95% confidence intervals for the log ratio of gene expression from reference to control. This method assumes normal distributions of the log ratios rather than ratios of normal distributions. It has a hierarchical structure that allocates error variance among microarrays and experiments. The authors suggest use of calibration data, or, alternatively, empirical evaluation of the distribution of variances across all other genes in order to construct a prior. A subset of genes of nearly equal intensity can be used to form a prior for variances . This prior was used by the authors to input reliable variances in t-tests of significance. Promising advances have been made on Bayesian methods for correcting misleading fold-change measurements made from low-intensity spots , using a gamma rather than normal model of ratio results.
All these methods have considerable potential to be incorporated as priors into a framework such as that presented here, so that the prior may be applied to multiple samples from different genotypes, environments or developmental states. Priors such as those above should result in smaller credible intervals and detection of increasingly significant differences because they curtail the exploration of unrealistically high variances that small datasets have too few observations to rule out 'on their own'. Continued work in this area, using an increased amount of non-ratio data provided from scanned microarrays, should be very fruitful . Furthermore, posterior distributions from such analyses of gene-expression level have subsequent use in Bayesian methodologies for clustering  and tumor identification .
In summary, the model-based approach we have implemented can accommodate complex and unbalanced experimental designs. Some research will continue to be carried out comparing just two samples multiple times. However, complex designs will increase in popularity as investigators explore multiple genotypes, environments and developmental states within a single research project . The utility of this approach in determining levels of gene expression may be maximized if these designs incorporate certain features.
First, compare samples of direct interest directly. When interested in the differences between two samples, compare them to each other rather than to an arbitrary reference sample [7,8]. Whenever possible, study a few expression nodes thoroughly, rather than many superficially.
Second, replicate each comparison at least once . Whether this is done directly by incorporation of dye during reverse transcription, or, preferably, by labeling incorporated amino-allyl-dUTP, reverse the dye labeling to ameliorate any dye effect thereof [9,10,18].
Third, eliminate 'ends' of comparison chains by carrying out hybridizations comparing one end to another. This allows reconciliation of transitive data around a circle of comparisons. The more circles created, the more reconciliation occurs. The smaller the circumference of the circle created, the stronger the transitive power.
Fourth, connect nodes otherwise distantly located on a chain of comparisons with extra cross-comparisons. The number of 'extra' comparisons to make depends on what size of effect is of interest. The observation of a small but significant effect on key regulatory genes may be of greater biological interest than the same observation on a metabolic enzyme. The appropriate weighing of the cost of additional comparisons against the greater precision of measurement depends critically upon the question being asked.
For time-course experiments or any other experiment with an explicit ordered x-axis these guidelines may still be followed, as long as replicate comparisons are made among nodes. Inferred estimates at each node are assessed independently of location along the x-axis, so that regressions across them are valid. Ultimately, experimental design may be subject to limitation owing to lack of resources or experimental failure. Fortunately, within a framework such as that developed here, missing spots or missing comparisons do not require any special consideration or any change in methodology. Credible intervals acquired for less well determined genes or less well determined expression nodes are correspondingly larger. This quantitative information on gene-expression levels tendered by a thorough analysis of microarray results should be carefully considered in assessments of the biological effects of genetic or environmental differences upon cellular state.
Materials and methods
For the ethanol-shock dataset , raw data from GenePix files was processed as follows. Any spot was excluded from analysis if both the Cy3 and Cy5 fluorescence signals were within two standard deviations of the distribution of intensities of the background pixels for that spot. These low-intensity spots are those most aberrant in fold-change and are those for which the magnitude is adjusted most by the model of Newton et al. . Expression values were normalized by linear scaling of the Cy5 values so that the mean Cy5 and Cy3 values of nonexcluded spots were equal. Two of three experiments thereby achieved a linear log-log intensity plot for included spots, with slope approximately 1. The third was linearized by exponentiation of the Cy3 channel to 0.8, before normalization of the means.
For other experiments, ratio and spot pass criteria were used as reported in the papers. In the dataset released by Sudarsanam et al. , one of the three reported microarray hybridizations between wild-type and snf2 nodes was excluded from analysis because it had an anomalous global mean log ratio of -0.26, whereas for all others that value was very nearly zero. For the zinc-regulation dataset , no pass criterion had been used to ensure each spot on each microarray carried considerably more signal than noise. The data appeared to be of high quality. However, in a few cases, one or two misleading data points from low-intensity spots may have led to especially high gene-by-gene error variance estimates and thus concealed otherwise significant differences.
The best normalization method and spot pass criteria are highly dependent on cDNA microarray protocol, methodology, experimental experience and analytical resources. As long as normalization method and spot pass criteria are applied uniformly within a dataset, the resulting ratios should be appropriate for analysis by the model described here.
In microarray experiments, the original idea was that, with current technology, spots on a cDNA microarray had a number of confounding pseudolinear terms - whose variation from experiment to experiment could be minimized but not eliminated - which contributed to the intensity measurements observed when a hybridization was scanned. Model parameters under this scheme differ from those used for high-density oligonucleotide microarrays . These terms included the density and size of the cDNA deposition, the correspondingly larger or smaller amount of labeled mRNA hybridized to the microarray, the hybridization conditions and the sequence of the gene . With these assumptions, the post-normalization intensity in one fluorescence channel at a reporter spot may be modeled linearly as
When comparing just two samples, ratio measurements are nearly as good as absolute data. However, when more than one genotype or environmental condition or cell developmental stage is examined, ratio measurements rapidly become cumbersome because comparing across numerous states requires a common unit of measurement. Therefore it is of interest to use these ratio measurements within a statistical model to estimate gene-expression levels in a common (if arbitrary) unit, and also to assess the significance of such a difference.
Consistent with the original interpretation of cDNA microarray data that privileged ratio over absolute quantification , the observed ratios of intensities, y ij , maybe modeled as
This can be true even if the distribution of intensities across spots and arrays is decidedly not Gaussian, because confounding factors which vary linearly or multiplicatively or even interactively across spots or arrays are commonly presumed not to be different between two labeled samples hybridized to a single spot . Note that samples may, under this formulation, have different variances as well as different means.
The ratio of standard normal distributions is a Cauchy distribution, which has the unpleasant property that it has no moments. The ratio of nonstandard normal distributions is not much better. Fortunately, the infinite tails of the normal distributions that result in this property are not generally observed in real data; in fact, a model that allows negative gene expression levels is not valid. The joint normal distribution may be truncated at a considerable distance from its peak, along an elliptical probability contour within the positive quadrant , yielding a ratio z ij distributed approximately as
Let us consider how this result may be used. An ideal statistical framework for the analysis of microarray ratio data could be used to analyze microarray data of any experimental design including any number of treatments, genotypes, or developmental states; would be highly robust to missing data; and would yield estimates of the magnitude of expression differences and measures of statistical significance across all treatments, genotypes, and developmental states.
The number of expression nodes, n, is equal to the number of permutations of strain, treatment, and developmental state that are examined. Unless informative prior information about expression levels or error variance is used, the following (minimal) requirements must be met. First, every node of interest must be present in at least one comparison. Second, every node of interest must be connected to every other node of interest by an unbroken chain of comparisons. And third, there must be as many comparisons as there are mean and variance parameters to be estimated.
If separate error variances for each sample are to be estimated as well as means, the last requirement indicates that there must be at least 2n - 1 measurements when each expression node is assumed to have an independent variance, and there must be at least n measurements when each expression node is assumed to have the same error variance. A few measurements beyond the minimum contribute greatly to the power to detect differences in gene expression and to the ease with which significance of results is ascertained within the Bayesian framework. Figure 1 shows the comparison structure of experiments examined in this paper. The three-dimensional matrix of ratio results from these comparisons, Z, may be constructed, with dimensions i denoting the sample labeled with one fluorophore, j denoting the sample labeled with another, and k denoting the replicate ordinate of that particular comparison. Then, for any continuous structure of comparisons among the nodes of interest, the likelihood density for the parameters μ l and σ l 2, 1 ≤ l ≤ n, is, by Bayes' rule,
where g( i , i 2, j , j 2) is the prior distribution of the parameters, and where the probability f(z ijk ) of empty elements in the data matrix Z is evaluated as 1.
Appropriate informative priors for the variance of microarray data are currently under investigation by a number of groups [34,35,36]. An informative prior must be clearly justified in order to prevent inappropriate conclusions of statistical significance. In this paper, a noninformative prior distribution, uniform across positive real numbers, has been used for both the expression levels and for their variance. In theory, we use a uniform prior for the variance, bounded from 0 to 100. In practice, the upper limit, beyond 20 or so, makes no difference, as such high values are very improbable and are never sampled by the chain in the datasets analyzed here. This uniform prior gives the microarray data itself the greatest impact on the inferred means and variances, and implies that credible intervals constructed are close to those that would be found by maximum likelihood if analytical integration of the full multidimensional parameter surface were feasible. A frequently used 'noninformative' prior such as -1  is in this case not desirable, because, in practice, the most likely variances observed are so small that this prior has a considerable impact on the posterior distribution.
Fortunately, we may use the constant denominator of the Bayes' rule formulation (Equation 5) to assert that
Our proposed values are constructed in two separate steps. First, two of the n gene-expression level parameters from are chosen at random. A step size is drawn at random from a triangular distribution centered at zero with range [- μ,+ μ]. The first of the two chosen parameters is incremented by the chosen step size, and the second is decremented by the same quantity, so that is maintained, where the apostrophe indicates a proposed parameter value. In the next iteration, the variance parameters, l 2 are incremented by an amount drawn at random from a triangular distribution with range to form . Because these operations have probabilities of transitions from the current state to the proposed state equal to the probabilities that the converse transitions would have, this proposal scheme satisfies Hastings  and can be implemented in the Metropolis  algorithm. Thus the conjecture is accepted for the next state of the Markov chain if
Otherwise the original state is retained for the next iteration of the Markov chain.
These steps are repeated over many generations in order to 'burn in' the chain, so that it converges from the initial parameter settings to a stationary distribution. Subsequently, states are sampled from the chain at regular intervals to build a posterior distribution for each parameter, integrated across the probable states of all other parameters. An easy-to-use stand-alone software program entitled BAGEL, which implements this Bayesian analysis of gene expression levels on MacOS or Windows platforms, is available on the web with an online manual . It accepts tab-delimited text files of ratio data as input.
Output from the BAGEL software is in the form of a tab-delimited text file with one header row. Each row thereafter displays the results for a single gene, including columns with the estimate of expression level for each sample (the median of the posterior distribution); the additions and subtractions to make 95% upper and lower bounds on that estimate; the stationary acceptance rates for the Monte Carlo steps for that gene; and a column that reads 'TRUE' when those rates are acceptable. Further columns contain the posterior probabilities for whether that gene's expression level in each expression node is greater, or lesser, than that gene's expression level in each other expression node.
Additional data files
Estimates and credible intervals for expression levels of all genes assayed in these experiments in all conditions are available with the online version of this manuscript as tab-delimited text output files, with columns of data as described in the methods section. The files are entitled 'EtOH.txt', 'SwiSnfMin.txt' and 'SwiSnfRich.txt', and 'Zinc.txt'.
We thank Colin Meiklejohn and Jose Ranz for beta-tasting our BAGELs, and for helpful comments on the manuscript. Thanks also to Rob Kulathinal, Takeo Kasuga, and two anonymous reviewers for helpful comments. We thank Mark Reimers, Erin Conlon, and the Hartl, Wakeley, and Liu labs for helpful discussion. NIH grants 60035 and HG02150 to D.L.H. are gratefully acknowledged. J.P.T. was supported during this work by a Harvard Merit Fellowship.
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.PubMedPubMed CentralView ArticleGoogle Scholar
- Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999, 96: 2907-2912. 10.1073/pnas.96.6.2907.PubMedPubMed CentralView ArticleGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet. 1999, 22: 281-285. 10.1038/10343.PubMedView ArticleGoogle Scholar
- DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278: 680-686. 10.1126/science.278.5338.680.PubMedView ArticleGoogle Scholar
- Chen Y, Dougherty ER, Bittner ML: Ratio-based decisions and the quantitative analysis of cDNA microarray images. J Biomed Optics. 1997, 2: 364-374. 10.1117/1.429838.View ArticleGoogle Scholar
- Ideker T, Thorsson V, Siegel A, Hood L: Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J Comput Biol. 2000, 7: 805-817. 10.1089/10665270050514945.PubMedView ArticleGoogle Scholar
- Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol. 2000, 7: 819-837. 10.1089/10665270050514954.PubMedView ArticleGoogle Scholar
- Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS: Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol. 2001, 8: 625-637. 10.1089/106652701753307520.PubMedView ArticleGoogle Scholar
- Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, Gibson G: The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet. 2001, 29: 389-395. 10.1038/ng766.PubMedView ArticleGoogle Scholar
- Kerr MK, Churchill GA: Statistical design and the analysis of gene expression microarray data. Genet Res. 2001, 77: 123-128. 10.1017/S0016672301005055.PubMedGoogle Scholar
- Alexandre H, Ansanay-Galeote V, Dequin S, Blondin B: Global gene expression during short-term ethanol stress in Saccharomyces cerevisiae. FEBS Lett. 2001, 498: 98-103. 10.1016/S0014-5793(01)02503-0.PubMedView ArticleGoogle Scholar
- Sudarsanam P, Iyer VR, Brown PO, Winston F: Whole-genome expression analysis of snf/swi mutants of Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2000, 97: 3364-3369. 10.1073/pnas.050407197.PubMedPubMed CentralView ArticleGoogle Scholar
- Lyons TJ, Gasch AP, Gaither LA, Botstein D, Brown P, Eide D: Genome-wide characterization of the Zap1p zinc-responsive regulon in yeast. Proc Natl Acad Sci USA. 2000, 97: 7957-7962. 10.1073/pnas.97.14.7957.PubMedPubMed CentralView ArticleGoogle Scholar
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000, 11: 4241-4257.PubMedPubMed CentralView ArticleGoogle Scholar
- Shamji A, Kuruvilla FG, Schreiber S: Partitioning the transcriptional program induced by rapamycin among the effectors of the Tor proteins. Curr Biol. 2000, 10: 1574-1580. 10.1016/S0960-9822(00)00866-6.PubMedView ArticleGoogle Scholar
- Furlong EE, Andersen EC, Null B, White KP, Scott MP: Patterns of gene expression during Drosophila mesoderm development. Science. 2001, 293: 1629-1633. 10.1126/science.1062660.PubMedView ArticleGoogle Scholar
- Ramdas L, Coombes KR, Baggerly K, Abruzzo L, Highsmith WE, Krogmann T, Hamilton S, Zhang W: Sources of nonlinearity in cDNA microarray expression measurements. Genome Biol. 2001, 2: research0047.1-0047.7. 10.1186/gb-2001-2-11-research0047.View ArticleGoogle Scholar
- Yang YH, Speed T: Design issues for cDNA microarray experiments. Nat Rev Genet. 2002, 3: 579-588.PubMedGoogle Scholar
- Treger JM, Schmitt AP, Simon JR, McEntee K: Transcriptional factor mutations reveal regulatory complexities of heat shock and newly identified stress genes in Saccharomyces cerevisiae. J Biol Chem. 1998, 273: 26875-26879. 10.1074/jbc.273.41.26875.PubMedView ArticleGoogle Scholar
- Mansure JJC, Panek AD, Crowe LM, Crowe JH: Trehalose inhibits ethanol effects on intact yeast cells and liposomes. Biochim Biophys Acta. 1994, 1191: 309-316. 10.1016/0005-2736(94)90181-3.PubMedView ArticleGoogle Scholar
- Mansure J, Souza R, Panek A: Trehalose metabolism in Saccharomyces cerevisiae. Biotechnol Lett. 1997, 19: 1201-1203. 10.1023/A:1018433820097.View ArticleGoogle Scholar
- Thevelein JM: Regulation of trehalose mobilization in fungi. Microbiol Rev. 1984, 48: 42-59.PubMedPubMed CentralGoogle Scholar
- Türkel S, Liao X-B, Farabaugh PJ: GCR1-dependent transcriptional activation of yeast retrotransposon Ty2-917. Yeast. 1997, 13: 917-930. 10.1002/(SICI)1097-0061(199708)13:10<917::AID-YEA148>3.3.CO;2-7.PubMedView ArticleGoogle Scholar
- Herrero P, Flores L, de la Cera T, Moreno F: Functional characterization of transcriptional regulatory elements in the upstream region of the yeast GLK1 gene. Biochem J. 1999, 343: 319-325. 10.1042/0264-6021:3430319.PubMedPubMed CentralView ArticleGoogle Scholar
- Gibson G: Epistasis and pleiotropy as natural properties of transcriptional regulation. Theor Popul Biol. 1996, 49: 58-89. 10.1006/tpbi.1996.0003.PubMedView ArticleGoogle Scholar
- Gibson G, van Helden S: Is function of the Drosophila homeotic gene Ultrabithorax canalized?. Genetics. 1997, 147: 1155-1168.PubMedPubMed CentralGoogle Scholar
- Doebley J, Lukens L: Transcriptional regulators and the evolution of plant form. Plant Cell. 1998, 10: 1075-1082. 10.1105/tpc.10.7.1075.PubMedPubMed CentralView ArticleGoogle Scholar
- Hartl DL, Dykhuisen DE, Dean AM: The limits of adaptation: the evolution of selective neutrality. Genetics. 1985, 111: 655-674.PubMedPubMed CentralGoogle Scholar
- Velculescu VE, Zhang L, Vogelstein B, Kinzler K: Serial analysis of gene expression. Science. 1995, 270: 484-487.PubMedView ArticleGoogle Scholar
- Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai M, Bassett DE, Hieter P, Vogelstein B, Kinzler KW: Characterization of the yeast transcriptome. Cell. 1997, 88: 243-251.PubMedView ArticleGoogle Scholar
- Stollberg J, Urschitz J, Urban Z, Boyd CD: A quantitative evaluation of SAGE. Genome Res. 2000, 10: 1241-1248. 10.1101/gr.10.8.1241.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhao H, Eide DJ: Zap1p, a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae. Mol Cell Biol. 1997, 17: 5044-5052.PubMedPubMed CentralView ArticleGoogle Scholar
- Bernstein BE, Tong JK, Schreiber SL: Genomewide studies of histone deacetylase function in yeast. Proc Natl Acad Sci USA. 2000, 97: 13708-13713. 10.1073/pnas.250477697.PubMedPubMed CentralView ArticleGoogle Scholar
- Tseng GC, Oh M-K, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001, 29: 2549-2557. 10.1093/nar/29.12.2549.PubMedPubMed CentralView ArticleGoogle Scholar
- Baldi P, Long A: A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics. 2001, 17: 509-519. 10.1093/bioinformatics/17.6.509.PubMedView ArticleGoogle Scholar
- Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol. 2001, 8: 37-52. 10.1089/106652701300099074.PubMedView ArticleGoogle Scholar
- Barash Y, Friedman N: Context-specific Bayesian clustering for gene expression data. J Comput Biol. 2002, 9: 169-191. 10.1089/10665270252935403.PubMedView ArticleGoogle Scholar
- Ibrahim J, Chen M, Gray R: Bayesian models for gene expression with DNA microarray data. J Am Stat Assoc. 2002, 97: 88-89. 10.1198/016214502753479257.View ArticleGoogle Scholar
- Lee M-LT, Kuo FC, Whitmore GA, Sklar J: Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci USA. 2000, 97: 9834-9839. 10.1073/pnas.97.18.9834.PubMedPubMed CentralView ArticleGoogle Scholar
- Theilhaber J, Bushnell S, Jackson A, Fuchs R: Bayesian estimation of fold-changes in the analysis of gene expression: the PFOLD algorithm. J Comput Biol. 2001, 8: 585-614. 10.1089/106652701753307502.PubMedView ArticleGoogle Scholar
- Eisen MB, Brown PO: DNA arrays for analysis of gene expression. Methods Enzymol. 1999, 303: 179-205.PubMedView ArticleGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed T: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30: e15-10.1093/nar/30.4.e15.PubMedPubMed CentralView ArticleGoogle Scholar
- Quackenbush J: Compuatational analysis of microarray data. Nat Rev Genet. 2001, 2: 418-427. 10.1038/35076576.PubMedView ArticleGoogle Scholar
- Fieller EC: The distribution of the index in a normal bivariate population. Biometrika. 1932, 24: 428-440.View ArticleGoogle Scholar
- Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis. 1995, Boca Raton: Chapman & Hall/CRCGoogle Scholar
- Hastings WK: Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970, 57: 97-109.View ArticleGoogle Scholar
- Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equation of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1092.View ArticleGoogle Scholar
- Hartl Lab Publications - Software. [http://www.oeb.harvard.edu/hartl/lab/publications/software.html]
- Gelman A, Roberts GO, Gilks WR: Efficient Metropolis jumping rules. In Bayesian Statistics 5. Edited by: Bernardo JM, Berger JO, Dawid AP, Smith AFM. 1996, Oxford: Oxford University Press, 599-607.Google Scholar