DNA motifs in human promoters predict tissue-specific expression. (A) Area under the receiver operating characteristic (ROC) curve for 79 models trained and tested on promoters of genes highly expressed in 79 different tissues. The AUC is an overall summary of diagnostic accuracy. AUC equals 0.5 when the ROC curve corresponds to random chance and 1.0 for perfect accuracy. Reliable models (with median AUC ≥0.6) are displayed in red, while unreliable models (with median AUC ≤0.6) are displayed in gray. Models were evaluated in a five-fold cross-validation setting. (B) Motifs with the greatest predictive power for the liver model. The weights w of the motifs (see Materials and methods) are given in red. Motif weights have been scaled to [-1, 1], where 1 represents the scaled weight of the motif with highest predictive power, and -1 the scaled weight of the motif with the lowest negative predictive power (signs are preserved; see Materials and methods). The names of the features are listed near the baseline of the graph. For comparison, we include weights w for the same motif in the lung, caudate nucleus, thymus models (in different shades of gray). Similarities among the genes that were used to train the models - which reflect functional relatedness among tissues - explain similarities in the predictive power of the motif. Thus, 15% of genes that are highly expressed in liver are also highly expressed in lung, while less than 5% are in caudate nucleus and thymus.