Erratum to: Multiclass classification of microarray data with repeated measurements: application to cancer
- Ka Yee Yeung1Email author and
- Roger E Bumgarner1
Published: 3 January 2006
The original article was published in Genome Biology 2003 4:R83
Summary of prediction accuracy results
Data | Parameters | EWUSC | USC | SC | Published results |
---|---|---|---|---|---|
NCI 60 data* | ρ0 | NA | 0.6 | 1.0 | NA |
Δ | NA | 0.6 | 0.9 | NA | |
# relevant genes | NA | 2,116 (2315) | 3,998 | 200 | |
Prediction accuracy | NA | 72% | 72% | ~40–60% [4] | |
Multiple tumor data (estimated optimal parameters) † | ρ0 | 0.8 | 0.8 | 1.0 | NA |
Δ | 4.8 (5.6) | 4 (5.6) | 8.8 | NA | |
# relevant genes | 241 (680) | 356 (735) | 3902 | All genes | |
Prediction accuracy | 93% | 82%(85%) | 63%(78%) | 78% [5] | |
Multiple tumor data (global optimal parameters) ‡ | ρ0 | 0.9 | 0.9 | 1.0 | NA |
Δ | 0 | 0 | 0.4 | NA | |
# relevant genes | 1,622 (1626) | 1634 | 7129 | All genes | |
Prediction accuracy | 74% (78%) | 74% | 59%(74%) | 78% [5] | |
Breast cancer data | ρ0 | 0.6 (0.7) | 0.6 | 1.0 | NA |
Δ | 0.80 | 0.55 (1.15) | 0.5 (1.1) | NA | |
# relevant genes | 189 (271) | 1,114 (82) | 3,193(187) | 70 | |
Prediction accuracy | 84% (89%) | 84% (79%) | 84% | 89% [6] |
A corrected figure showing the comparison of prediction accuracy of USC and SC on the NCI 60 data. The percentage of prediction accuracy is plotted against the number of relevant genes using the USC algorithm at ρ0 = 0.6 and the SC algorithm (USC at ρ0 = 1.0). The horizontal axis is shown on a log scale. Because no independent test set is available for this data, we randomly divided the samples in each class into roughly three parts multiple times, such that a third of the samples are reserved as a test set. Thus the training set consists of 43 samples and the test set of 18 samples. The graph represents typical results over these multiple random runs.
A corrected figure showing the prediction accuracy on the multiple tumor data using the EWUSC algorithm over the range of Δ from 0 to 20. The percentage of classification errors is plotted against Δ on (a) the full training set (96 samples) and (c) the test set (27 samples). In (b) the average percentage of errors is plotted against Δ on the cross-validation data over five random runs of fourfold cross-validation. In (d), the number of relevant genes is plotted against Δ. Different colors are used to specify different correlation thresholds (ρ0 = 0.6, 0.7, 0.8, 0.9 or 1). Optimal parameters are inferred from the cross-validation data in (b).
A corrected figure showing the comparison of prediction accuracy of EWUSC (ρ0 = 0.8), USC (ρ0 = 0.8), SVM and SC algorithms on the multiple tumor data. The horizontal axis shows the total number of distinct genes selected over all binary SVM classifiers on a log scale. Some results are not available on the full range of the total number of genes. For example, the maximum numbers of selected genes for EWUSC and USC are roughly 1,000. The reported prediction accuracy is 78% [5] using all 16,000 available genes on the full data. The EWUSC algorithm achieves 85% prediction accuracy with only 77 genes. With 241 genes, EWUSC produces 93% prediction accuracy.
A corrected figure showing the comparison of prediction accuracy of EWUSC, USC and SC on the breast cancer data. The percentage of prediction accuracy is plotted against the number of relevant genes using the EWUSC algorithm at ρ0 = 0.6, the USC algorithm at ρ0 = 0.6 and the SC algorithm (USC at ρ0 = 1.0). Note that the horizontal axis is shown on a log scale.
The major conclusions and observations in the original manuscript [1] remain valid with the revised implementation. Our EWUSC and USC algorithms represent improvements over the SC algorithm. In general, fewer genes are required to produce comparable prediction accuracy. On the multiple tumor data, our EWUSC and USC algorithms produce higher prediction accuracy using fewer relevant genes compared to published results. The revised software implementation is available on our web site [3]. Note: the revised version (1.0) of the software was placed on the web site on May 9, 2005.
Notes
Declarations
Authors’ Affiliations
References
- Yeung KY, Bumgarner RE: Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol. 2003, 4: R83-10.1186/gb-2003-4-12-r83.PubMedPubMed CentralView ArticleGoogle Scholar
- Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA. 2002, 99: 6567-6572. 10.1073/pnas.082099299.PubMedPubMed CentralView ArticleGoogle Scholar
- Supplementary Web Site: Multiclass classification of microarray data with repeated measurements: application to cancer. [http://www.expression.washington.edu/publications/kayee/shrunken_centroid]
- Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002, 97: 77-87. 10.1198/016214502753479248.View ArticleGoogle Scholar
- Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, et al: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA. 2001, 98: 15149-15154. 10.1073/pnas.211566398.PubMedPubMed CentralView ArticleGoogle Scholar
- van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 415: 530-536. 10.1038/415530a.View ArticleGoogle Scholar