Erratum to: Multiclass classification of microarray data with repeated measurements: application to cancer

Yeung, Ka Yee; Bumgarner, Roger E

doi:10.1186/gb-2005-6-13-405

Erratum
Published: 03 January 2006

Erratum to: Multiclass classification of microarray data with repeated measurements: application to cancer

Ka Yee Yeung¹ &
Roger E Bumgarner¹

Genome Biology volume 6, Article number: 405 (2006) Cite this article

5052 Accesses
3 Citations
Metrics details

The Original Article was published on 24 November 2003

After the publication of this work [1], we discovered programming errors in our software implementation of the proposed error-weighted, uncorrelated shrunken centroid (EWUSC) algorithm and the uncorrelated shrunken centroid (USC) algorithm. We have corrected these errors, and the updated results are summarized in the revised Table 6.

Table 6 Summary of prediction accuracy results

Full size table

On the NCI 60 data, both Figure 1 in [1] and the revised Figure 1 showed that USC generally produces higher prediction accuracy than the 'shrunken centroid' algorithm (SC) [2] using the same number of relevant genes. Using the revised software implementation, USC requires fewer (2,116 instead of 2,315 as reported in [1]) genes to achieve 72% accuracy. The number of genes required by SC to achieve the same prediction accuracy remains the same (3,998).

Figure 2 shows the results of applying EWUSC to the training set, four-fold cross-validation data, and test set of the multiple tumor data over a range of shrinkage thresholds (Δ) and correlation thresholds (ρ₀). The revised Figure 2 shows the same general trend as Figure 2 in [1]: the percentage of errors is reduced when ρ₀ < 1 over most values of Δ on the training set, cross-validation data and test set; Figure 2d shows that the number of relevant genes is drastically reduced when genes with correlation threshold above 0.9 are removed. The values of the optimal shrinkage thresholds (Δ) determined from the cross-validation results have changed using the revised implementation. Specifically, the optimal shrinkage threshold values (Δ) for both EWUSC and USC are reduced to 4.8 and 4 respectively (see revised Table 6). The numbers of relevant genes selected by EWUSC and USC are reduced and the resulting prediction accuracy for both USC and SC are also reduced in the revised results. In the case of using the global optimal parameters when Δ = 0, the EWUSC in the revised implementation selected slightly fewer genes (1,622 instead of 1,626) at the expense of slightly lower prediction accuracy (74% instead of 78%). Figure 4 compares the prediction accuracy on the test set of the multiple tumor data using the EWUSC and USC algorithms at the estimated optimal correlation threshold (ρ₀ = 0.8), the SC algorithm and the Support Vector Machine (SVM). The general observations previously reported in [1] still hold with the revised Figure 4. First, USC produces higher prediction accuracy than SC using the same number of relevant genes. Second, EWUSC generally produces higher prediction accuracy than USC using the same number of relevant genes. In fact, the performance of EWUSC is stronger than previously reported in [1] when the number of genes is small.

Figure 5 shows the comparison of prediction accuracy of EWUSC, USC, and SC on the breast cancer data. With the revised implementation, the optimal correlation threshold (ρ₀) is changed from 0.7 in [1] to 0.6 (see revised Table 6). The observation reported in [1] that EWUSC produces higher prediction accuracy on the test set than USC and SC when the number of relevant genes is small still holds. The numbers of relevant genes selected by USC and SC are significantly larger with the revised implementation (see revised Table 6).

The major conclusions and observations in the original manuscript [1] remain valid with the revised implementation. Our EWUSC and USC algorithms represent improvements over the SC algorithm. In general, fewer genes are required to produce comparable prediction accuracy. On the multiple tumor data, our EWUSC and USC algorithms produce higher prediction accuracy using fewer relevant genes compared to published results. The revised software implementation is available on our web site [3]. Note: the revised version (1.0) of the software was placed on the web site on May 9, 2005.

References

Yeung KY, Bumgarner RE: Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol. 2003, 4: R83-10.1186/gb-2003-4-12-r83.
Article PubMed PubMed Central Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA. 2002, 99: 6567-6572. 10.1073/pnas.082099299.
Article PubMed CAS PubMed Central Google Scholar
Supplementary Web Site: Multiclass classification of microarray data with repeated measurements: application to cancer. [http://www.expression.washington.edu/publications/kayee/shrunken_centroid]
Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002, 97: 77-87. 10.1198/016214502753479248.
Article CAS Google Scholar
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, et al: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA. 2001, 98: 15149-15154. 10.1073/pnas.211566398.
Article PubMed CAS PubMed Central Google Scholar
van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 415: 530-536. 10.1038/415530a.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA
Ka Yee Yeung & Roger E Bumgarner

Authors

Ka Yee Yeung
View author publications
You can also search for this author in PubMed Google Scholar
Roger E Bumgarner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ka Yee Yeung.

Additional information

The online version of the original article can be found at 10.1186/gb-2003-4-12-r83

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yeung, K.Y., Bumgarner, R.E. Erratum to: Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 6, 405 (2006). https://doi.org/10.1186/gb-2005-6-13-405

Download citation

Published: 03 January 2006
DOI: https://doi.org/10.1186/gb-2005-6-13-405

Erratum to: Multiclass classification of microarray data with repeated measurements: application to cancer

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

About this article

Cite this article

Share this article

Genome Biology

Contact us