Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: 2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Fig. 4

Machine learned sequence information improves identification of genuine splice junctions. a Outline of the LR model training process. Sequences from splice junctions were extracted from the reference genome and used as training data (i.e., explanatory variables). Training labels (i.e., the response variable) were generated by the first decision tree model. Independent models were trained for 5′ donor and 3′ acceptor sites and cross-validation used to generate out-of-bag predictions for all sites. b Flowchart visualization of the second decision tree model. Nodes (decisions) and leaves (outcomes) are colored based on the relative ratio of real and spurious splice junctions. c Confusion matrix showing the ratios of correct and incorrect predictions of the second decision tree model on splice junctions extracted from simulated Arabidopsis read alignments

Back to article page