Skip to main content

Table 5 Comparison of classification accuracy results from EWUSC, USC and SC on synthetic datasets at optimal parameters

From: Multiclass classification of microarray data with repeated measurements: application to cancer

 

α

Number of measurements

λ

EWUSC

USC

SC

 

(a) Different noise levels with four repeated measurements

0.1

4

Low

100%

100%

100%

Average % CV prediction accuracy

    

100%

100%

100%

% prediction accuracy

    

10

24

72

Number of genes

    

(18, 0.8)

(17, 0.7)

(17.5, 1)

(Δ, ρ)

 

0.1

4

High

100%

100%

100%

Average % CV prediction accuracy

    

100%

100%

100%

% prediction accuracy

    

8

16

22

Number of genes

    

(12.5, 0.9)

(12.5, 0.9)

(12.5, 1)

(Δ, ρ)

 

1

4

Low

100%

100%

100%

Average % CV prediction accuracy

    

100%

100%

100%

% prediction accuracy

    

144

119

124

Number of genes

    

(2.8, 0.5)

(3.1, 0.6)

(3.1, 1)

(Δ, ρ)

 

1

4

High

100%

100%

100%

Average % CV prediction accuracy

    

100%

100%

100%

% prediction accuracy

    

89

120

122

Number of genes

    

(1.9, 0.5)

(2.6, 0.6)

(2.6, 1)

(Δ, ρ)

 

2

4

Low

96.8%

99.0%

98.8%

Average % CV prediction accuracy

    

97.5%

100.0%

100.0%

% prediction accuracy

    

270

326

326

Number of genes

    

(1.1, 0.5)

(1, 0.4)

(1.2, 1)

(Δ, ρ)

 

2

4

High

93.3%

98.8%

99.0%

Average % CV prediction accuracy

    

92.5%

97.5%

97.5%

% prediction accuracy

    

186

159

159

Number of genes

    

(1, 0.7)

(1.5, 0.5)

(1.5, 1)

(Δ, ρ)

(b) Different numbers of repeated measurements at high biological noise levels

2

1

Low

NA

99.5%

99.5%

Average % CV prediction accuracy

    

NA

100.0%

100.0%

% prediction accuracy

    

NA

285

304

Number of genes

    

NA

(1.2, 0.5)

(1.2, 1)

(Δ, ρ)

 

2

1

High

NA

96.5%

95.5%

Average % CV prediction accuracy

    

NA

92.5%

92.5%

% prediction accuracy

    

NA

258

282

Number of genes

    

NA

(1.2, 0.5)

(1.2, 1)

(Δ, ρ)

 

2

8

Low

99.8%

100.0%

100.0%

Average % CV prediction accuracy

    

100.0%

100.0%

100.0%

% prediction accuracy

    

246

220

221

Number of genes

    

(1.3, 0.5)

(1.4, 0.5)

(1.4, 1)

(Δ, ρ)

 

2

8

High

98.3%

99.0%

99.0%

Average % CV prediction accuracy

    

97.5%

100.0%

100.0%

% prediction accuracy

    

171

242

245

Number of genes

    

(1, 0.4)

(1.3, 0.5)

(1.3, 1)

(Δ, ρ)

 

2

20

Low

99.8%

100.0%

100.0%

Average % CV prediction accuracy

    

100.0%

100.0%

100.0%

% prediction accuracy

    

226

296

325

Number of genes

    

(1.3, 0.5)

(1.2, 0.6)

(1.2, 1)

(Δ, ρ)

 

2

20

High

99.8%

100.0%

100.0%

Average % CV prediction accuracy

    

100.0%

100.0%

100.0%

% prediction accuracy

    

221

252

252

Number of genes

    

(0.9, 0.6)

(1.3, 0.5)

(1.3, 1)

(Δ, ρ)

  1. Synthetic datasets were generated at different levels of biological noise (α) and technical noise (λ). The average percentage of cross validation (% CV) accuracy, the percentage of prediction accuracy on the test set, the number of relevant genes at the optimal parameters (Δ, ρ0) are shown. For each synthetic dataset, the algorithm with the maximum percentage of average cross validation accuracy, maximum prediction accuracy, or the minimum number of relevant genes is shown in bold. (a) Typical classification accuracy results using synthetic datasets with four repeated measurements at different biological noise levels (α = 0.1, 1 or 2) and difference technical noise levels (λ = 1, 5 or 10). When the biological noise level is low (α = 0.1), EWUSC consistently achieves the same prediction accuracy using fewer relevant genes at various technical noise levels. However, at medium biological noise level (α = 1), EWUSC typically outperforms USC and SC at high technical noise level and not at low technical noise level. When the biological noise level is high (α = 2), EWUSC is often not the method of choice. (b) Typical classification accuracy results using synthetic datasets at high biological noise level (α = 2) with 1, 8, or 20 repeated measurements at different technical noise levels. When there is no repeated measurement (the number of repeated measurements = 1), there are no variability estimates over repeated measurements and hence, EWUSC is reduced to USC. The results with four repeated measurement at α = 2 are shown in (a). Our results over multiple synthetic datasets showed that EWUSC only outperforms USC with a large number of repeated measurements (20) at high biological noise (α = 2). We also showed that USC typically outperforms SC by choosing a smaller number of relevant genes in most scenarios (over different biological and technical noise levels, and different numbers of repeated measurements).