Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: seqQscorer: automated quality control of next-generation sequencing data using machine learning

Fig. 5

Unbiased and generalizable models. a Performance of species-specific models (optimal specialized models) in cross-species predictions demonstrates model generalization to other species. ROC curves show the classification performance for different species-assay combinations using all features. The solid lines represent cases in which data from the same species was used to define training and testing sets. The dashed lines show the performance on cases in which the species defining the training data differs from the species defining the testing data; e.g., “HS → MM” means the training set contains only data from human (Homo sapiens, HS), while the testing data is only from mouse (Mus musculus, MM). Legends also show corresponding auROC values. b Correlation of predictive performance of the optimal generic model trained on all data and features compared to different specialized models demonstrates lack of bias. For each feature set, performance of the generic model is detailed across the data subsets (green bars) and compared to specialized models trained for each combination of feature set and data subset (blue bars). Error bars show standard deviations derived from 10-fold cross-validations within the grid search. Feature sets: RAW (raw data), MAP (genome mapping), LOC (genomic localization), TSS (transcription start sites profile), ALL (all features). ROC, receiver operating characteristics; auROC, area under ROC curve

Back to article page