Skip to main content

Advertisement

Table 6 Summary of prediction accuracy results

From: Multiclass classification of microarray data with repeated measurements: application to cancer

Data Parameters EWUSC USC SC Published results
NCI 60 data* ρ0 NA 0.6 1.0 NA
  Δ NA 1.0 1.0 NA
  Number of relevant genes NA 2,315 3998 200
  Prediction accuracy NA 72% 72% ~40-60% [23]
Multiple tumor data (estimated optimal parameters) ρ0 0.8 0.8 1.0 NA
  Δ 5.6 5.6 8.8 NA
  Number of relevant genes 680 735 3902 All genes
  Prediction accuracy 93% 85% 78% 78% [10]
Multiple tumor data (global optimal parameters) ρ0 0.9 0.9 1.0 NA
  Δ 0 0 0.4 NA
  Number of relevant genes 1626 1634 7129 All genes
  Prediction accuracy 78% 74% 74% 78% [10]
Breast cancer data ρ0 0.7 0.6 1.0 NA
  Δ 0.80 1.15 1.1 NA
  Number of relevant genes 271 82 187 70
  Prediction accuracy 89% 79% 84% 89% [14]
  1. The optimal parameters (ρ0 and Δ), number of relevant genes chosen, and prediction accuracy for the NCI 60 data, multiple tumor data and breast cancer data are summarized here. Both EWUSC (error-weighted, uncorrelated shrunken centroid) and USC (uncorrelated shrunken centroid) were motivated by SC (shrunken centroid) [17]. Both EWUSC and USC take advantage of interdependence between genes by removing highly correlated relevant genes. EWUSC makes use of error estimates or variability over repeated measurements. SC [17] is equivalent to USC at ρ0 = 1. The optimal parameters (Δ, ρ0) for EWUSC are estimated from the cross-validation results of EWUSC, while the optimal parameters (Δ, ρ0) for USC are independently estimated from the cross-validation results of USC. Entries with the minimum number of selected genes or highest prediction accuracy across all methods are highlighted in boldface type. *Since no repeated measurements or error estimates are available, EWUSC is not applicable to the NCI 60 data. In addition, there is no separate test set available for the NCI 60 data, typical results of random partitions of the original 61 samples into training and test sets are shown. The prediction accuracy and number of relevant genes are produced using optimal parameters (Δ, ρ0) estimated by visual observation of 'bends' in the random cross-validation curves. The prediction accuracy and number of relevant genes are produced using global optimal parameters, that is (Δ, ρ0) that produces the minimum average numbers of cross-validation errors over all Δ and all ρ0.