The combination of different cytidine deaminases with distinct Cas proteins extends the scope of base editing in different sequence contexts or backgrounds; however, it also results in variable targeting preferences, which hampers the direct comparison of BEs. To solve this problem, we selected five BEs, including BE3 [1], eBE-S3 [11], BE4max [15], hA3A-eBE-Y130F [6], and dCpf1-eBE [5], to compare their base editing efficiency and product purity at the same genomic target sites. These selected five BEs have similar widths of editing window (~ 5 bp) for comparison (Fig. 1b).
At three previously reported target sites [5] that can be edited by all five BEs, BE4max and hA3A-eBE-Y130F induced higher C-to-T editing frequencies than the other examined BEs in 293FT cells (Additional file 1: Figure S1a, b), while hA3A-eBE-Y130F also exhibited slightly higher indel frequencies (Additional file 1: Figure S1a, c). The relatively high indels induced by hA3A-eBE-Y130F are likely caused by the high cytidine deamination activity of its hA3A moiety [16, 17]. Although showing the lowest editing frequencies among all five tested BEs, dCpf1-eBE induced fewer indels and non-C-to-T conversions than the other BEs did (Additional file 1: Figure S1a, c) and therefore yielded purer editing products (Additional file 1: Figure S1d). Assumedly, the catalytically dead Cpf1 moiety in dCpf1-eBE makes its low editing frequency but high product purity (Additional files 3, 4, 5, and 6).
We next sought to compare the performance of these BEs to create human pathogenic C-to-T SNVs. Among reported pathogenic C-to-T SNVs [18], we selected three sites, at which all five BEs have overlapping editing windows (Additional file 1: Figure S2a). Importantly, the cytosine in each of the three selected sites is the only cytosine in the editing window, referred to as preferentially editable SNVs (Fig. 1c). Theoretically, the C-to-T conversions at these three target sites could be used to mimic human genetic disorders (Additional file 1: Figure S2b). At these sites, BE4max and hA3A-eBE-Y130F also induced higher levels of editing frequencies than the other examined BEs in 293FT cells (Fig. 1d, top), consistent with the results obtained at non-pathogenic target sites (Additional file 1: Figure S1). Notably, only hA3A-eBE-Y130F yielded efficient base editing at the loci of BMRP2 (Fig. 1d, top), while no obvious editing was induced by the other BEs. The indel frequencies induced by dCpf1-eBE were lower than those induced by the other BEs (Fig. 1d, bottom). Meanwhile, the C-to-T fraction induced by dCpf1-eBE was significantly higher than those by the other BEs (Additional file 1: Figure S2c), showing that dCpf1-eBE yielded purer editing products.
Another important application of BEs is to correct pathogenic mutations, which theoretically could be used in pre-clinic or clinic studies [19]. To test base editing efficiency and precision in correcting pathogenic mutations of these BEs, we took advantage of ABEmax [15] to first create T-to-C mutations and then to correct them by the aforementioned five BEs (Additional file 1: Figure S3). Three reported pathogenic T-to-C/A-to-G SNV sites that can be preferentially corrected by all five BEs were selected for correction study (Additional file 1: Figure S3a, b). These pathogenic T-to-C/A-to-G mutations were generated by ABEmax individually in 293FT cells (Additional file 1: Figure S3c), and single-colony-derived cell lines with corresponding T-to-C mutations were further confirmed by Sanger sequencing (Additional file 1: Figure S3d). These T-to-C/A-to-G mutations that mimic pathogenic SNVs were further corrected by five tested BEs. As shown in Fig. 1e (top), BE4max and hA3A-eBE-Y130F induced higher efficiencies than the other examined BEs. Notably, only hA3A-eBE-Y130F yielded efficient base editing at the loci of CLN6 (Fig. 1e, top), while the others induced editing similar to the background level. As expected, dCpf1-eBE induced purer editing products than the other BEs though it induced low levels of C-to-T correction efficiency (Fig. 1e, bottom and Additional file 1: Figure S4).
We further compared three representative BEs, including hA3A-eBE-Y130F with the highest editing efficiency, dCpf1-eBE with the purest editing product, and eBE-S3 with intermediate editing efficiency and product purity (Fig. 1d, e and Additional file 1: Figure S1), at additional sites for their editing efficiencies and product purities. Of note, these three selected BEs all express three extra copies of free UGI to enhance editing performance. As expected, hA3A-eBE-Y130F induced the highest editing frequency and dCpf1-eBE yielded the purest C-to-T editing product (Additional file 1: Figure S5, S6), at eight genomic target sites (Additional file 1: Figure S5, including three sites that have been examined with five tested BEs in Additional file 1: Figure S1) as well as eight target sites where C-to-T conversions create pathogenic SNVs (Additional file 1: Figure S6, including three sites that have been examined with five tested BEs in Additional file 1: Figure S2). Meanwhile, we also compared these three representative BEs at the same sites in another human cell line U2OS and obtained similar results (Additional file 1: Figures S7, S8).
As BEs can be used to introduce base substitutions to mimic or revert the pathogenic SNVs (Fig. 1), we set up to computationally profile all human pathogenic C-to-T or T-to-C SNVs to determine which types of BEs might be more suitable for creating or correcting mutations. Twenty BEs with different PAM sequences and editing windows, including the five aforementioned ones, were used for this in silico analysis. The PAM sequences and editing windows of these 20 BEs are listed in Fig. 2a.
For all pathogenic SNVs reported in the NCBI ClinVar database (Fig. 2b), we searched their flanking regions to find nearby PAM sequences that could fit the pathogenic SNV into the editing windows of examined BEs. Based on the existence of PAM sequences, we predicted whether a given SNV could be potentially edited by a specific BE (Fig. 2c). With 20 analyzed BEs, about 94.34% of 17,077 pathogenic C-to-T SNVs could be generated by at least one BE to model the relevant genetic disorders and 94.28% of 5031 pathogenic T-to-C SNVs could be corrected by at least one BE to examine the potential therapeutic effects. The potentially editable SNVs are summarized in Fig. 2d. The in silico profiling of base editable pathogenic SNVs thus suggests broad applications of BEs for human disease study and potential treatment.
To conveniently access the information of these base editable pathogenic point mutations, we constructed a BEable-GPS (http://www.picb.ac.cn/rnomics/BEable-GPS) database for annotation. A “search” function is available to query pathogenic SNVs according to gene symbols, genomic locations or disease phenotypes, and their accessibilities to different BEs (Additional file 1: Figure S9a). With selected BEs, all targetable pathogenic SNVs in queried locations or disease phenotypes can be retrieved in the output list (Additional file 1: Figure S9b). By clicking “Link” button next to a selected SNV, its name (NCBI ClinVar ID), related dbSNP number, chromosome position, gene symbol, related phenotype ID (Fig. 2e), and designed gRNA spacer sequences with the corresponding PAMs highlighted for all applicable BEs (Fig. 2f) are available for further survey.
An online “analysis” function is also available to design specific gRNAs for editable cytosines/guanines from any input sequence (Additional file 1: Figure S10a). Of note, users can also define a specific PAM sequence, editing window, and spacer length to find specific base editable sites for further analysis (Additional file 1: Figure S10a, bottom). All cytosines or guanines that are targetable by the analyzed BEs will be listed together with specific gRNA spacer sequences (Additional file 1: Figure S10b). This online “analysis” function thus expands the application of the BEable-GPS database from pathogenic SNV sites to almost all editable cytosines and guanines. For both search and analysis functions, users can select the union or the intersection of these 20 analyzed BEs for survey and comparison (Additional file 1: Figures S9a, S10a).
It will be of interest for researchers to access BEable-GPS and embedded toolsets for their experimental designs to model or correct disease-related mutations. Of note, to reduce substantial off-target mutations, engineered BEs have been continuously developed for precise base editing [20]. We will keep updating this database by including more BEs to provide additional choices for the study of pathogenic mutations and by incorporating off-target prediction to suggest cautions in the future.