Skip to main content


Fig. 5 | Genome Biology

Fig. 5

From: CRISPRO: identification of functional protein coding sequences based on genome editing dense mutagenesis

Fig. 5

CRISPR score prediction performance on independent datasets. a Feature importance in CRISPRO prediction GBDT model by information gain when a feature is used to split the combined training data (Munoz et al. and Doench et al. datasets). Positional nucleotide features are 0-indexed (i.e., nucleotide 0 is in position 1 of the spacer sequence, dinucleotide 0 corresponds to positions 1 and 2 of spacer, where position 20 is PAM proximal). Inset shows pairwise Spearman correlation coefficient for all numerical and binary features in CRISPRO training set. b Spearman correlation per gene of predicted as compared to observed CRISPR functional scores in independent datasets not observed in training for Doench score and CRISPRO prediction GBDT model. c, d Scatter plots for ZBTB7A and MYB of scaled observed guide RNA scores, CRISPRO prediction scores, and Doench scores, with LOESS regression shown by blue lines compared to position in protein. Protein-level and mRNA-level annotations aligned underneath

Back to article page