DGEclust: differential expression analysis of clustered count data

Table 1 Biological homogeneity index scores for the CAGE dataset

Software	Clustering	Number	Number	BHI (BP)	BHI (CC)	BHI (MF)	BHI (all)
		of DE genes	of clusters
DGEclust	Hierarchical	2,177	1	0.07	0.08	0.08	0.21
	Hierarchical ^∗		17	0.05	0.09	0.07	0.20
	k-means		32	0.05	0.07	0.08	0.20
DESeq2	Hierarchical	7,109	1	0.06	0.08	0.08	0.20
	k-means		59	0.06	0.08	0.07	0.21
edgeR	Hierarchical	5,705	1	0.06	0.08	0.08	0.20
	k-means		53	0.06	0.07	0.08	0.21

We computed the BHI scores for each GO domain (biological process, molecular function and cellular component), as well as an overall score. k-means and hierarchical clustering were applied to the regularised log-transformed counts for all genes that were called DE between at least one pair of brain regions by each of the three examined methods, i.e. DGEclust, DESeq2 and edgeR. For k-means, we used an optimal number of clusters equal to \(\sqrt {N_{\textit {DE}}/2}\), where N _DE is the number of DE genes. For the hierarchical clustering, we used average linkage and a Euclidean distance metric with a cutoff distance of 0.5 to obtain an optimal clustering. For DGEclust, we also applied hierarchical clustering using an internally computed similarity matrix. This is indicated with an asterisk (^∗). The highest score in each GO domain is indicated in bold. BHI, biological homogeneity index; BP, biological process; CC, cellular component; DE, differentially expressed; GO, gene ontology; MF, molecular function.

ISSN: 1474-760X