Skip to main content

Table 6 Summary of prediction accuracy results

From: Multiclass classification of microarray data with repeated measurements: application to cancer

Data

Parameters

EWUSC

USC

SC

Published results

NCI 60 data*

ρ0

NA

0.6

1.0

NA

 

Δ

NA

1.0

1.0

NA

 

Number of relevant genes

NA

2,315

3998

200

 

Prediction accuracy

NA

72%

72%

~40-60% [23]

Multiple tumor data (estimated optimal parameters)

ρ0

0.8

0.8

1.0

NA

 

Δ

5.6

5.6

8.8

NA

 

Number of relevant genes

680

735

3902

All genes

 

Prediction accuracy

93%

85%

78%

78% [10]

Multiple tumor data (global optimal parameters)

ρ0

0.9

0.9

1.0

NA

 

Δ

0

0

0.4

NA

 

Number of relevant genes

1626

1634

7129

All genes

 

Prediction accuracy

78%

74%

74%

78% [10]

Breast cancer data

ρ0

0.7

0.6

1.0

NA

 

Δ

0.80

1.15

1.1

NA

 

Number of relevant genes

271

82

187

70

 

Prediction accuracy

89%

79%

84%

89% [14]

  1. The optimal parameters (ρ0 and Δ), number of relevant genes chosen, and prediction accuracy for the NCI 60 data, multiple tumor data and breast cancer data are summarized here. Both EWUSC (error-weighted, uncorrelated shrunken centroid) and USC (uncorrelated shrunken centroid) were motivated by SC (shrunken centroid) [17]. Both EWUSC and USC take advantage of interdependence between genes by removing highly correlated relevant genes. EWUSC makes use of error estimates or variability over repeated measurements. SC [17] is equivalent to USC at ρ0 = 1. The optimal parameters (Δ, ρ0) for EWUSC are estimated from the cross-validation results of EWUSC, while the optimal parameters (Δ, ρ0) for USC are independently estimated from the cross-validation results of USC. Entries with the minimum number of selected genes or highest prediction accuracy across all methods are highlighted in boldface type. *Since no repeated measurements or error estimates are available, EWUSC is not applicable to the NCI 60 data. In addition, there is no separate test set available for the NCI 60 data, typical results of random partitions of the original 61 samples into training and test sets are shown. The prediction accuracy and number of relevant genes are produced using optimal parameters (Δ, ρ0) estimated by visual observation of 'bends' in the random cross-validation curves. The prediction accuracy and number of relevant genes are produced using global optimal parameters, that is (Δ, ρ0) that produces the minimum average numbers of cross-validation errors over all Δ and all ρ0.