Alternative definitions of the regulatory region and their effect on the prediction of gene regulation. (a) Receiver operating characteristic (ROC) curves showing how CLB2 cluster genes rank compared to all other genes using the forkhead probability matrix and two different definitions of the regulatory region. ROC curves plot the fraction of true positives that meet a threshold value (here, a given GOMER score) against the fraction of false positives that meet that same threshold. The thick line plots a ROC curve for a regulatory region defined as the sequence between 650 base pairs (bp) 5' to the ORF and 150 bp 3' to the start of the ORF; the thin line plots a ROC curve for a regulatory region defined as the sequence between 1,000 bp and 500 bp 5' to the ORF. The latter definition of the regulatory region has no predictive value as reflected in the nearly diagonal ROC curve (area under the ROC curve (ROC AUC) of approximately 0.5). (b) Schematics of a conventional uniform weight function and a Gaussian weight function. (c) Comparison of the uniform weight function and (d) the Gaussian weight function for several hundred combinations of parameter values. The contoured areas are shaded according to ROC AUC value as indicated on the scale. To facilitate comparison, the regulatory regions defined by the uniform weight function are plotted in terms of the center of the region, analogous to the center of the Gaussian distribution. Center values are expressed as distance from the open reading frame (ORF); negative values are 5' to the ORF start. For the Gaussian function, weights below 1/1,000th the maximum value are rounded down to 0.