Skip to main content

Advertisement

Table 5 Comparison of classification accuracy results from EWUSC, USC and SC on synthetic datasets at optimal parameters

From: Multiclass classification of microarray data with repeated measurements: application to cancer

  α Number of measurements λ EWUSC USC SC  
(a) Different noise levels with four repeated measurements 0.1 4 Low 100% 100% 100% Average % CV prediction accuracy
     100% 100% 100% % prediction accuracy
     10 24 72 Number of genes
     (18, 0.8) (17, 0.7) (17.5, 1) (Δ, ρ)
  0.1 4 High 100% 100% 100% Average % CV prediction accuracy
     100% 100% 100% % prediction accuracy
     8 16 22 Number of genes
     (12.5, 0.9) (12.5, 0.9) (12.5, 1) (Δ, ρ)
  1 4 Low 100% 100% 100% Average % CV prediction accuracy
     100% 100% 100% % prediction accuracy
     144 119 124 Number of genes
     (2.8, 0.5) (3.1, 0.6) (3.1, 1) (Δ, ρ)
  1 4 High 100% 100% 100% Average % CV prediction accuracy
     100% 100% 100% % prediction accuracy
     89 120 122 Number of genes
     (1.9, 0.5) (2.6, 0.6) (2.6, 1) (Δ, ρ)
  2 4 Low 96.8% 99.0% 98.8% Average % CV prediction accuracy
     97.5% 100.0% 100.0% % prediction accuracy
     270 326 326 Number of genes
     (1.1, 0.5) (1, 0.4) (1.2, 1) (Δ, ρ)
  2 4 High 93.3% 98.8% 99.0% Average % CV prediction accuracy
     92.5% 97.5% 97.5% % prediction accuracy
     186 159 159 Number of genes
     (1, 0.7) (1.5, 0.5) (1.5, 1) (Δ, ρ)
(b) Different numbers of repeated measurements at high biological noise levels 2 1 Low NA 99.5% 99.5% Average % CV prediction accuracy
     NA 100.0% 100.0% % prediction accuracy
     NA 285 304 Number of genes
     NA (1.2, 0.5) (1.2, 1) (Δ, ρ)
  2 1 High NA 96.5% 95.5% Average % CV prediction accuracy
     NA 92.5% 92.5% % prediction accuracy
     NA 258 282 Number of genes
     NA (1.2, 0.5) (1.2, 1) (Δ, ρ)
  2 8 Low 99.8% 100.0% 100.0% Average % CV prediction accuracy
     100.0% 100.0% 100.0% % prediction accuracy
     246 220 221 Number of genes
     (1.3, 0.5) (1.4, 0.5) (1.4, 1) (Δ, ρ)
  2 8 High 98.3% 99.0% 99.0% Average % CV prediction accuracy
     97.5% 100.0% 100.0% % prediction accuracy
     171 242 245 Number of genes
     (1, 0.4) (1.3, 0.5) (1.3, 1) (Δ, ρ)
  2 20 Low 99.8% 100.0% 100.0% Average % CV prediction accuracy
     100.0% 100.0% 100.0% % prediction accuracy
     226 296 325 Number of genes
     (1.3, 0.5) (1.2, 0.6) (1.2, 1) (Δ, ρ)
  2 20 High 99.8% 100.0% 100.0% Average % CV prediction accuracy
     100.0% 100.0% 100.0% % prediction accuracy
     221 252 252 Number of genes
     (0.9, 0.6) (1.3, 0.5) (1.3, 1) (Δ, ρ)
  1. Synthetic datasets were generated at different levels of biological noise (α) and technical noise (λ). The average percentage of cross validation (% CV) accuracy, the percentage of prediction accuracy on the test set, the number of relevant genes at the optimal parameters (Δ, ρ0) are shown. For each synthetic dataset, the algorithm with the maximum percentage of average cross validation accuracy, maximum prediction accuracy, or the minimum number of relevant genes is shown in bold. (a) Typical classification accuracy results using synthetic datasets with four repeated measurements at different biological noise levels (α = 0.1, 1 or 2) and difference technical noise levels (λ = 1, 5 or 10). When the biological noise level is low (α = 0.1), EWUSC consistently achieves the same prediction accuracy using fewer relevant genes at various technical noise levels. However, at medium biological noise level (α = 1), EWUSC typically outperforms USC and SC at high technical noise level and not at low technical noise level. When the biological noise level is high (α = 2), EWUSC is often not the method of choice. (b) Typical classification accuracy results using synthetic datasets at high biological noise level (α = 2) with 1, 8, or 20 repeated measurements at different technical noise levels. When there is no repeated measurement (the number of repeated measurements = 1), there are no variability estimates over repeated measurements and hence, EWUSC is reduced to USC. The results with four repeated measurement at α = 2 are shown in (a). Our results over multiple synthetic datasets showed that EWUSC only outperforms USC with a large number of repeated measurements (20) at high biological noise (α = 2). We also showed that USC typically outperforms SC by choosing a smaller number of relevant genes in most scenarios (over different biological and technical noise levels, and different numbers of repeated measurements).