Application of independent component analysis to microarrays

Table 2 The seven most significant linear ICA clusters from the yeast cell cycle data (Dataset 2)

Cluster	Number of ORFs	GO/KEGG functional categories	Number of ORFs within functional category	p value (log₁₀)
1	215	Protein biosynthesis (175)	93	-60.1
	217	Structural constituent of ribosome (118)	83	-67.6
	157	Cytosolic ribosome (94)	83	-73.1
	229	Ribosome (96)	83	-82.5
5	208	Cell cycle (220)	61	-19.5
	202	DNA-directed DNA polymerase (13)	7	-4.7
	115	Replication fork (30)	16	-9.6
	229	Cell cycle (58)	18	-6.6
11	2,072	Sulfur amino acid metabolism (12)	11	-11.1
	211	Structural constituent of cytoskeleton (25)	11	-5.9
	125	Spindle (32)	18	-10.62
9	209	Ribosome biogenesis (38)	15	-7.1
	207	RNA binding (75)	7	-3.4
	111	Nucleus (334)	54	-7.3
7	198	Glutamine family amino acid biosynthesis (11)	8	-6.8
	99	Mitochondrion (353)	22	-3.8
3	209	Protein folding (26)	11	-5.7
	212	Heat shock protein (14)	9	-6.7
11	199	DNA unwinding (10)	6	-4.5
	192	ATP-dependent DNA helicase (7)	6	-6.0
	85	Pre-replicative complex (8)	6	-5.7
	216	Cell cycle (58)	13	-3.6

The cluster IDs are shown, where cluster C_i,1in Equation 3 is denoted by 2i-1 and a cluster C_i,2is denoted by 2i. The number of genes in the cluster that have at least one annotation in GO or KEGG are listed along with the functional category with the smallest p-value among those in each annotation system. Four annotation systems are used: biological process (GO), molecular function (GO), cellular component (GO) and KEGG. Numbers in parentheses show the number of genes within the functional category that are present in the microarray data. Functional categories with p-values higher than 10^-3 are discarded, and those with values higher than 10^-7 are not considered to be significant. The number of genes shared by the cluster and the functional category is shown with the log₁₀ of the p-values corresponding to each functional category for the cluster.

ISSN: 1474-760X