Framework for fair evaluation of different CRISPR-Cas systems
We sought to establish an evaluation framework that allowed an unbiased assessment of the five Cas endonucleases (SpCas9, SaCas9, NmCas9, AsCpf1, and LbCpf1). Every protein was fused to two nuclear localization signals (NLS) and an identical V5 epitope tag. Additionally, we expressed each enzyme and its cognate sgRNA from the same plasmid backbone. The CAG or the EF1α promoter was used to drive the expression of the Cas nuclease, while the same U6 promoter was used to drive the expression of the sgRNA. After cloning, we verified the activity of each construct using target sites that were known to be edited robustly by the respective nucleases (Additional file 1: Figure S1 and Tables S1 and S2).
The various Cas enzymes should ideally be targeting identical genomic loci in order for the results to be comparable. As each endonuclease requires a different PAM for efficient cleavage and the PAMs for SaCas9 and NmCas9 are incompatible, we initially selected 61 matched target sites that are flanked by TTTN (PAM for AsCpf1 and LbCpf1) and either NGGRRT (combined PAM for SpCas9 and SaCas9) or NGGNGATT (combined PAM for SpCas9 and NmCas9) (Additional file 1: Table S3). The sites ranged in length from 17 to 23 nucleotides (nt). Additionally, since each Cas nuclease may be differentially affected by chromatin accessibility, we targeted genes with varying expression levels in HEK293T cells because gene transcription is largely controlled by the underlying chromatin architecture [21]. Based on our RNA-seq data, the chosen genes showed more than 4000-fold difference in expression (Additional file 1: Figure S2a). Consistently, we also observed higher levels of H3K27ac at the promoters of more actively transcribed genes (Additional file 1: Figure S2b).
We asked whether our evaluation studies may be influenced by the choice of promoter used to express the Cas enzymes. We checked the expression of each endonuclease from either the CAG or the EF1α promoter by quantitative real-time PCR (qRT-PCR) and found that transcript levels were approximately 1.5-fold higher under the latter promoter (Additional file 1: Figure S2c). However, the cleavage efficiencies of Cas nucleases expressed under the CAG promoter were highly correlated with the cleavage efficiencies of the enzymes expressed under the EF1α promoter, regardless of whether T7 endonuclease I (T7E1) mismatch cleavage assays or Illumina deep sequencing experiments were used to measure the rate of indel formation (Pearson R2 = 0.75 or 0.96, respectively) (Additional file 1: Figure S2d). The data obtained from the CAG promoter was also not significantly different from the data obtained from the EF1α promoter (P > 0.5, Wilcoxon rank sum test; Additional file 1: Figure S2e). This may be because both promoters were strong enough to produce sufficient amounts of Cas proteins, so that enzyme concentration in the cells was no longer a limiting factor. Hence, we pooled the data obtained using the CAG promoter with the data obtained using the EF1α promoter to perform a combined analysis.
Performance of CRISPR-Cas in NHEJ-mediated genome editing
We first examined the editing activities of the CRISPR-Cas systems without any repair template. Both the T7E1 cleavage assays (Additional file 1: Figure S3) and Illumina deep sequencing experiments (Additional file 1: Figure S4) were used to assess activity at the 61 selected genomic loci in HEK293T cells. Overall, SpCas9 exhibited the highest cleavage efficiencies for spacer lengths between 17 and 20 nt inclusive (Fig. 1a, b and Additional file 1: Figure S5a, b). In particular, it was the only nuclease that was consistently active with short 17-nt spacers, which we further confirmed in other cell lines (Additional file 1: Figure S6). In contrast, SaCas9 and LbCpf1 gave the highest amount of genome modifications for spacer lengths between 21 and 23 nt inclusive (Fig. 1a, b and Additional file 1: Figure S5a, b). Similar results were obtained regardless of whether the matched target sites were present in introns (Additional file 1: Figures S3a–e and S4a–e) or in protein-coding regions (Additional file 1: Figures S3f–h and S4f–h). We also noted that the PAM-proximal seed region of the DNA target is more critical for proper recruitment of the CRISPR-Cas system, but the PAMs for Cpf1 and Cas9 are on opposite sides of each protospacer. Hence, we selected new target sites where the Cpf1 and Cas9 nucleases had overlapping seed regions (PAM-proximal 7 nt; Fig. 1c, d and Additional file 1: Figure S5c, d and Table S4). However, we still observed similar trends with these new sites.
Notably, NmCas9 performed poorly at most of the target sites irrespective of spacer lengths, with editing frequencies considerably lower than the other nucleases (Fig. 1b and Additional file 1: Figure S5b). We also observed that with our chimeric sgRNAs, NmCas9 did not show a preference for longer spacer lengths, consistent with a recent study on the usage of NmCas9 in mammalian genome editing [22]. Nevertheless, since the length of naturally occurring crRNA spacers in N. meningitides was found to be 24 nt [23], we selected nine new 24 nt- or 25 nt-long target sites that are flanked by the PAMs for Cpf1 and NmCas9 (Additional file 1: Figure S7a, b and Table S5). Moreover, these sites are in highly expressed genes to ensure accessibility of the chromatin. When we quantified editing efficiencies at these new genomic loci by T7E1 cleavage assays (Additional file 1: Figure S7c, d) and Illumina deep sequencing experiments (Fig. 1e, f) in HEK293T cells, we again found that the editing activity of NmCas9 was lower than those of both AsCpf1 and LbCpf1 at all nine matched target sites. We further verified the poorer performance of NmCas9 in other cell lines (Additional file 1: Figure S8). Collectively, our results suggest that NmCas9 might not be an ideal Cas nuclease for many genome editing applications, such as multiplex gene targeting.
Next, we asked whether the editing efficiency of each Cas endonuclease may be affected by local chromatin context. To increase the statistical power of our analysis, we selected 18 additional target sites that contain NGGRRT at their 3′ ends and are of length 21 nt, which is within the optimal spacer lengths for both SpCas9 and SaCas9 (Additional file 1: Table S6). Six of these sites are in lowly expressed genes, while the remaining 12 sites are in highly expressed genes (Additional file 1: Figure S9). We assayed the activity of each enzyme by the T7E1 assay and by deep sequencing the targeted loci (Additional file 1: Figure S10). When we considered all the selected sites together, we found that the editing efficiencies of SpCas9, AsCpf1, and LbCpf1 were significantly affected by the expression of the targeted genes (P < 0.05, Wilcoxon rank sum test; Fig. 2a and Additional file 1: Figure S11a), consistent with previous studies that showed that chromatin structure may influence the efficacy of CRISPR-mediated genome editing [24,25,26,27]. The same results were obtained when we restricted our analysis to only the sgRNAs of optimal lengths for every enzyme (Fig. 2b and Additional file 1: Figure S11b). Interestingly, however, the efficacy of SaCas9 and NmCas9 in human cells appeared to be unaffected by gene expression levels, especially when we considered only the sgRNAs of optimal lengths (Fig. 2b and Additional file 1: Figure S11b), possibly because they are smaller enzymes and hence may be able to access nucleosome-bound DNA or heterochromatin more easily.
While AsCpf1 performed generally well in our NHEJ-mediated genome editing experiments, it was usually surpassed by some other enzyme at most target sites, regardless of whether they are located in lowly expressed or highly expressed genes. When we carried out a four-way comparison of the different Cas nucleases using spacers that were either perfectly matched or contained matched seed regions, we found that AsCpf1 was the best performing enzyme at only a minority of the sites, even for optimal spacer lengths (Additional file 1: Figure S12). When we carried out a pairwise comparison of AsCpf1 with either SpCas9 or LbCpf1 alone, focusing only on the sgRNAs of optimal lengths for both enzymes under consideration, we also found that AsCpf1 exhibited significantly lower cleavage efficiencies than the other two nucleases (P < 0.05, Wilcoxon rank sum test; Fig. 2c and Additional file 1: Figure S11c). Nevertheless, despite its overall weaker editing activity, AsCpf1 showed the lowest tolerance to single mismatches between the sgRNA and the target DNA (Fig. 2d, e and Additional file 1: Figure S11d, e). Hence, our results suggest that there is a compromise between cleavage efficiency and specificity of naturally occurring Cas endonucleases.
Performance of CRISPR-Cas in HDR-mediated genome editing with single-stranded oligodeoxynucleotide donor
We sought to determine how well the various CRISPR-Cas systems perform in HDR-mediated precise genome editing. We again targeted the two genomic loci containing matched seeds for Cas9 and Cpf1 nucleases, but here we co-transfected donor single-stranded oligodeoxynucleotide (ssODN) with our CRISPR plasmids in order to introduce a XbaI restriction site between the cleavage sites of Cas9 and Cpf1 (Fig. 3a, b and Additional file 1: Figure S13a, b). Every ssODN contained the restriction site flanked by 47 nt of homology on each side. The donor templates were also complementary to the target strands. Expectedly, restriction fragment length polymorphism (RFLP) analysis revealed that only SpCas9 was able to consistently insert the XbaI site when the spacer length was just 17 nt. However, for spacers that were 20 or 23 nt long, both AsCpf1 and LbCpf1 gave significantly more digested products than SpCas9 (P < 0.05, Student’s t-test; Additional file 1: Figure S13a, b). SaCas9 and NmCas9 yielded almost no detectable shorter fragments after restriction digest regardless of spacer lengths, possibly because they cleaved less efficiently than the other Cas nucleases at the two targeted loci (Fig. 1c, d and Additional file 1: Figure S5c, d). We further confirmed the results by deep sequencing to ensure that the restriction site was correctly inserted (Fig. 3a, b). When we reduced the homology arm length of the donor template from 47 to 27 nt, the editing efficiency of each enzyme was unaffected and the Cpf1 nucleases continued to exhibit significantly higher HDR frequencies than SpCas9 (P < 0.05, Student’s t-test; Fig. 3c, d and Additional file 1: Figure S13c, d). Comparable results were obtained when we varied the amount of donor templates between 100 and 300 ng (Additional file 1: Figure S14). Additionally, we observed that the HDR efficiencies of all Cas nucleases increased with time after transfection (Additional file 1: Figure S15). Moreover, although we detected a small amount of incorrect XbaI integration from our sequencing data, it was, on average, 6.4-fold and 12.4-fold lower than the rate of correct integration at the CACNA1D and PPP1R12C loci, respectively (Additional file 1: Figure S16).
Subsequently, we selected six perfectly matched target sites, namely A3, A11, A12, B4, B8, and B18, that could be cleaved robustly by at least SpCas9, AsCpf1, and LbCpf1 (Additional file 1: Figures S3 and S4) to perform additional HDR-mediated genome editing experiments with ssODNs as donor templates. For A12 and B4, the ssODNs contained 47-nt homology arms flanking either a XbaI or HindIII recognition sequence, while for the remaining target sites, the ssODNs contained 27-nt homology arms instead. Moreover, for the B8 target site, we also tested extra donor templates with even shorter homology arms (27, 25, 23, 21, 19, and 17 nt). All donor templates were of the non-target strand sequence. Overall, we observed that the Cpf1 nucleases exhibited significantly higher HDR efficiencies at all the six target sites than SpCas9 in RFLP assays and deep sequencing experiments (P < 0.05, Student’s t-test; Fig. 4a–c and Additional file 1: Figures S17 and S18). The frequency of erroneous restriction site integrations was much lower than the rate of correct integrations (Additional file 1: Figure S19). Since the six additional sites are located in genes of varying expression levels, the higher HDR efficiency exhibited by Cpf1 appears to be independent of the underlying chromatin architecture. Importantly, the editing efficiency of each Cas endonuclease at the B8 locus was not compromised even when we reduced the homology arm length down to 17 nt. This result is consistent with a previous study that found that zinc finger nucleases could perform precise gene editing with templates containing only around 30–40 total bases of homology [28].
We wondered whether the results from our HDR-mediated editing experiments might be due to differences in cleavage efficiencies. After co-transfecting ssODNs with our CRISPR plasmids, we performed T7E1 assays and RFLP analysis on the same genomic DNA samples. Overall, we observed that SpCas9 generated indels as efficiently as AsCpf1 and LbCpf1 in the T7E1 assays, but yet it produced weaker cleavage bands than the Cpf1 nucleases after restriction digest with XbaI or HindIII (Figs. 3e, f and 4d). Additionally, we sequenced the targeted genomic loci and examined the sequencing reads. Strikingly, SpCas9 produced random indels at least as efficiently as AsCpf1 and LbCpf1 at all the tested loci (Additional file 1: Figure S20), but clearly fewer sequencing reads had the desired restriction site correctly incorporated (Additional file 1: Figure S21). Hence, the lower efficiency of precise genome editing exhibited by SpCas9 compared to the Cpf1 nucleases when ssODNs of non-target strand sequences were used was not simply due to a poorer ability to cut the target sites.
Optimization of ssODN donor templates
The design of the ssODN donor template can influence HDR efficiency [29,30,31,32]. So far, all our experiments had relied on symmetric ssODNs of the non-target strand sequence. Hence, we first sought to explore the extent to which the editing activity of each CRISPR-Cas system may be influenced by the orientation of the donor template. To this end, we targeted the CACNA1D and PPP1R12C loci as well as the A3, A11, B8, and B18 loci using ssODNs that were complementary to either the target or the non-target strand. All the ssODNs contained 27-nt homology arms. We also tested ssODNs with 17-nt arms for the B8 locus. Surprisingly, we did not detect a consistent strand bias for each Cas nuclease by deep sequencing experiments (Additional file 1: Figure S22) or by RFLP analysis (Additional file 1: Figure S23). Instead, at five out of the six targeted sites, we observed a trend for the editing activity of all the enzymes to change in the same direction when we altered the orientation of the donor template, thereby suggesting that each genomic locus may have an inherent ssODN strand preference. For example, at the PPP1R12C locus, the HDR frequencies of all the enzymes showed an increase when we switched from the original ssODN template that was of the non-target strand sequence (NT) to a new donor that was of the target strand sequence (T), although this increase was much larger for SpCas9 (Additional file 1: Figures S22b and S23b). Conversely, at the A11 locus, the HDR frequencies of SpCas9, AsCpf1, and LbCpf1 all decreased when we used T ssODNs in place of the original NT ssODNs, although this reduction was more significant for the Cpf1 nucleases (Additional file 1: Figures S22d and S23d). Furthermore, the changes in HDR frequencies were not simply due to differences in cleavage rates as every nuclease yielded similar amounts of indels in the presence of either the NT or the T ssODNs (Additional file 1: Figure S24).
An earlier study showed that the strand bias of SpCas9 at an AAVS1 genomic locus became more obvious with longer donor templates [30]. Hence, to better detect any such bias, we next used ssODNs with 37-nt homology arm lengths to edit the CACNA1D and PPP1R12C loci. In agreement with previous work [29,30,31], we found from deep sequencing experiments (Fig. 5) and RFLP analysis (Additional file 1: Figure S25) that SpCas9 exhibited significantly higher HDR efficiencies at both genomic loci when donor DNA complementary to the non-target strand was used (P < 0.05, Student’s t-test). In contrast, we also now observed that the NT ssODNs were consistently more effective than the T ssODNs at introducing precise edits at both loci for the Cpf1 nucleases. Hence, Cas9 and Cpf1 prefer donor templates of opposite orientations.
Subsequently, we sought to determine whether the structure of the ssODN could further impact on the editing efficiency of the Cas enzymes. A previous study demonstrated that homology-directed editing by SpCas9 could be enhanced by using asymmetric donor templates [31]. Here, to create such asymmetric donors, we extended either the PAM-proximal or the PAM-distal side of each ssODN from 37 to 77 nt (Fig. 5a). Again, we tested donor DNA that was complementary to either the target or the non-target strand of the CACNA1D or PPP1R12C locus. Consistent with the published report [31], we found that for SpCas9, extending the homology arm at the PAM-distal side of the T ssODN could improve HDR efficiency, while extending the homology arm at the PAM-proximal side was either neutral or detrimental to the performance of the enzyme (Fig. 5b, c and Additional file 1: Figure S25). In contrast, we discovered that for the Cpf1 nucleases, extending the homology arm at the PAM-proximal side of the NT ssODN instead led to an increase in HDR frequency, while extending the homology arm at the PAM-distal side decreased the rate of HDR. Overall, LbCpf1 still exhibited a higher HDR efficiency than SpCas9 at the CACNA1D locus when all possible types of donor DNA had been considered, but at the PPP1R12C locus, the HDR rate exhibited by SpCas9 with its optimal ssODN template was significantly higher than that exhibited by LbCpf1 with its optimal donor template (P < 0.05, Student’s t-test). Taken together, our results indicate that both SpCas9 and LbCpf1 may be used for ssODN-mediated editing, but strand preferences of the genomic locus and the enzyme as well as the structure of the donor template need to be carefully considered.
Enhancement of error-prone repair with long single-stranded DNA
Our deep sequencing data afforded us an opportunity to examine the cleavage efficiencies of the Cas enzymes in the presence of various types of donor DNA. Overall, the presence of symmetric ssODNs with homology arm lengths ranging from 17 to 47 nt (corresponding to single-stranded DNA of lengths 40 to 100 nt) did not affect the frequency of indel formation significantly (Additional file 1: Figures S20, S24, S26). However, we observed that the rate of such error-prone repair outcomes tended to increase when we used the longer asymmetric ssODNs, whose total length was 120 nt. This increase was observed at both the CACNA1D (Fig. 6a, b) and the PPP1R12C (Fig. 6c, d) loci for all the Cas nucleases and appeared to be more significant for ssODNs with a longer PAM-proximal homology arm (Fig. 6b, d). Additionally, the higher indel frequencies were unlikely to account for the increased HDR rates achieved with optimized ssODN donor templates (Fig. 5 and Additional file 1: Figure S25) because suboptimal asymmetric donors that caused a decrease in HDR rates could also boost the frequencies of indel formation. We further noted from a previous study that even non-homologous 127-mer single-stranded DNA could stimulate gene disruption by SpCas9 [33]. Collectively, our results suggest a strategy whereby the efficiency of gene knockout may be enhanced by introducing a long ssODN donor that contains a frameshift or a nonsense mutation flanked by asymmetric homology arms, so that the target gene could be inactivated not only by a stimulated error-prone repair pathway but also by the HDR pathway using an optimized single-stranded DNA donor.
Performance of CRISPR-Cas in HDR-mediated genome editing with plasmid donor
Finally, we asked how well SpCas9 would perform against AsCpf1 and LbCpf1 in HDR-mediated genome editing with a linearized plasmid donor, which is commonly used to integrate an epitope tag into an endogenous target locus. Here, we aimed to fuse enhanced green fluorescent protein (eGFP) to the C-terminus of CLTA and GLUL, which were selected because the SpCas9 and Cpf1 nucleases could theoretically cleave both genes at overlapping sites close to the translation stop codon (Fig. 7a, b). FACS analysis revealed that Cpf1 did not give a higher rate of eGFP integration than SpCas9 when differences in cleavage efficiencies (Fig. 7c) were taken into account. For CLTA, the relative HDR efficiency of the three Cas endonucleases paralleled the relative cleavage efficiency observed in T7E1 assays. For GLUL, SpCas9 exhibited a significantly higher knockin rate than both AsCpf1 and LbCpf1 because AsCpf1 cleaved significantly more poorly than SpCas9 at this target site (P < 0.05, Student’s t-test) and also possibly because the blunt cut created by SpCas9 is overall nearer to the stop codon than the staggered cut created by Cpf1 and CRISPR-facilitated gene tagging is known to be more efficient closer to the break site. Similar results were obtained when we varied the amount of donor plasmids between 300 and 900 ng (Additional file 1: Figure S27). We further confirmed by PCR the correct integration of eGFP into the CLTA and GLUL genomic loci regardless of the Cas nuclease used (Additional file 1: Figure S28). Collectively, our results indicate that SpCas9 performs favorably compared to the Cpf1 enzymes in precision genome engineering when linearized plasmids are used as donor templates.