Skip to main content
Figure 3 | Genome Biology

Figure 3

From: From co-expression to co-regulation: how many microarray experiments do we need?

Figure 3

Effect of the number of microarray experiments on the compendium data subset with 215 genes. We compared the extent of co-regulated genes using different numbers of microarray experiments on the subset of compendium data with 215 genes. In order to produce typical datasets with E experiments (where E = 5, 10, 20, 50, 100), we randomly sampled (with replacement) 100 different subsets of E experiments from the compendium data with 215 genes and 273 experiments. The ability to identify co-regulated genes from clustering results is summarized by the median z-scores over the 100 randomly sampled datasets. A high median z-score indicates a high proportion of co-regulated genes from clustering results compared to those from random partitions. (a) We compared the median z-scores using different numbers of experiments (E) from hierarchical complete-link over a range of different numbers of clusters (from 5 to 100). The transcription factor database SCPD is used as the evaluation criterion for co-regulated genes. The median z-scores generally increase as E increases over different numbers of clusters. This shows that higher proportions of co-regulated genes are identified on microarray datasets with higher numbers of experiments. (b) Using SCPD as our evaluation criterion, we compared the median z-scores using different numbers of experiments (E) and different clustering algorithms (hierarchical average-link and complete-link using correlation, model-based clustering algorithms MCLUST and IMM on standardized data) on the compendium data subset with 215 genes at 25 clusters. We estimated the optimal number of clusters on this dataset to be 25 using IMM, and we observed similar results at different numbers of clusters. (c) Using ChIP data as our evaluation criterion, we compared the median z-scores using different numbers of experiments (E) and different clustering algorithms on the compendium data subset with 215 genes at 25 clusters. Using either SCPD or ChIP as our evaluation criterion, the median z-scores typically increase as E increases, and MCLUST typically produces relatively high median z-scores.

Back to article page