Receiver operating characteristic (ROC) curves for two non-coding RNA prediction algorithms, ClosingBp (Bradley RK, Uzilov AV, Skinner M, Bendaña YR, Barquist L and Holmes I, submitted) and EVOFOLD  (implemented using XRATE), using GSIMULATOR and SIMGENOME models to estimate the false positive discovery rate. These curves illustrate the general principle that the more realistic a simulation model, the higher the estimated false positive rate (FPR). This trend is independent of the gene-prediction algorithm used. The upper panes show results for GSIMULATOR: it is seen that more complex indel length distributions (N) and, in particular, context-dependence (K) both increase the FPR. The lower panes show results for SIMGENOME and component models, where the FPR is increased by including gaps (which amplify fluctuations in information content, due to their typically being treated as 'missing information') and genomic features (some of which evolve at a slower rate than neutral sequence). The reason that the asymptotic sensitivity is less than 1.0 is that our benchmark used a sliding-window approach, predicting at most one non-coding RNA (ncRNA) in each window. Our set of real ncRNAs was taken from multi-genome Drosophila alignments produced by the PECAN program ; in each case, to ensure a fair comparison, we took a window of the PECAN alignment surrounding the annotated ncRNA, with the size of this window matching the size of the sliding-window that was used on the simulated null data. Some of the positive ncRNAs in these PECAN-aligned windows score so poorly under the gene prediction model - for example, due to inaccuracies in the PECAN alignment of that window - that the predicted ncRNA is consistently placed in the wrong location within the window. These real ncRNAs are, therefore, never detected, no matter how low the scoring threshold, setting an upper limit on the achievable sensitivity.