A tool for comparing different statistical methods on identifying differentially expressed genes
© BioMed Central Ltd 2004
Received: 7 December 2004
Published: 8 December 2004
Many different statistical methods have been developed to deal with two groupcomparison microarray experiments. Most often, a substantial number of genes may be selected or not, depending on which method was actually used. Practical guidance on the application of these methods is therefore required. We developed a procedure based on bootstrap and a criterion to allow viewing and quantifying differences between method-dependent selections. We applied this procedure on three datasets that cover a range of possible sample sizes to compare three well known methods, namely: t-test, LPE and SAM.
Our visualization method and associated variability conformation rate (VCR) criterion show that standard t-test is appropriate for large sample sizes to allow accurate variance estimates. LPE borrows strength from neighboring genes to estimate the variances and is therefore more appropriate for small sample sizes whenever gene variances are similar for similar gene intensity levels. SAM has both advantages of considering gene specific variance like t-test and adjusting multiple tests by permutation based false discovery rate. However, for small sample sizes and in cases of numerous expressed genes, the distribution based on permutated datasets may not approximate the null distribution well, resulting in an inaccurate false discovery rate. Moreover, genes with low variances may be filtered because of the fudge factor.
We proposed using VCR to assess different statistical methods available for analyzing microarray data and developed a bootstrap method - on which our criterion is based - to estimate the 2-d distribution of treated vs. control gene intensity levels, under the null hypothesis that there is no difference between the treatment and control group. The biological evaluation of selected genes according to one or another method confirmed that this criterion is indeed appropriate to help identifying the most suitable method.
Additional data files
The following additional data files are provided with this article: Additional data file 1, depicting a table showing the overlap among different methods for the yeast data; Additional data file 2, showing additional Figure 1; Additional data file 3, showing additional Figure 2; Additional data file 4, showing additional Figure 3; Additional data file 5, showing additional Figure 4.