Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Easy-Prime: a machine learning–based prime editor design tool

Fig. 1

Overview of Easy-Prime design and machine learning model evaluation. a Cas9 activity feature is predicted by DeepSpCas9 score (purple box). (2) Oligo features (yellow box) are the GC content and sequence length of the PBS and RTT. (3) Target mutation features (cyan box) are whether the target mutation disrupts the PAM sequence, whether the ngRNA spacer sequence matches to the edited protospacer sequence, and the numbers of mismatches, deletions, and insertions. (4) Position features (pink box) are the distance between the ngRNA and the sgRNA (ngRNA_pos), the distance between the target mutation and the sgRNA (Target_pos), and the number of nucleotides downstream of the desired edit (target_end_flank). (5) RNA folding features are the maximal pairing probability between each of the first 10 bp of the RTT and the scaffold sequence based on RNAplfold [29]. b A machine learning workflow for data preprocessing, feature extraction, and model training and evaluation. c and d are correlation scatter plots of the true PE efficiency (x-axis) and the predicted efficiency (y-axis). c Train-test-split evaluation for the PE2 model and nested cross-validation evaluation for the PE3 model. d An independent PE data used for a third-party data evaluation for the PE3 model. “R” is spearman correlation coefficient. “r” is Pearson correlation coefficient

Back to article page