NmeCas9 is an intrinsically high-fidelity genome-editing platform

Background The development of CRISPR genome editing has transformed biomedical research. Most applications reported thus far rely upon the Cas9 protein from Streptococcus pyogenes SF370 (SpyCas9). With many RNA guides, wildtype SpyCas9 can induce significant levels of unintended mutations at near-cognate sites, necessitating substantial efforts toward the development of strategies to minimize off-target activity. Although the genome-editing potential of thousands of other Cas9 orthologs remains largely untapped, it is not known how many will require similarly extensive engineering to achieve single-site accuracy within large genomes. In addition to its off-targeting propensity, SpyCas9 is encoded by a relatively large open reading frame, limiting its utility in applications that require size-restricted delivery strategies such as adeno-associated virus vectors. In contrast, some genome-editing-validated Cas9 orthologs are considerably smaller and therefore better suited for viral delivery. Results Here we show that wildtype NmeCas9, when programmed with guide sequences of the natural length of 24 nucleotides, exhibits a nearly complete absence of unintended editing in human cells, even when targeting sites that are prone to off-target activity with wildtype SpyCas9. We also validate at least six variant protospacer adjacent motifs (PAMs), in addition to the preferred consensus PAM (5′-N4GATT-3′), for NmeCas9 genome editing in human cells. Conclusions Our results show that NmeCas9 is a naturally high-fidelity genome-editing enzyme and suggest that additional Cas9 orthologs may prove to exhibit similarly high accuracy, even without extensive engineering. Electronic supplementary material The online version of this article (10.1186/s13059-018-1591-1) contains supplementary material, which is available to authorized users.


Background
Over the past decade, clustered, regularly interspaced, short palindromic repeats (CRISPRs) have been revealed as genomic sources of small RNAs (CRISPR RNAs (crRNAs)) that specify genetic interference in many bacteria and most archaea [1][2][3]. CRISPR sequences include "spacers," which often match sequences of previously encountered invasive nucleic acids such as phage genomes and plasmids. In conjunction with CRISPR-associated (Cas) proteins, crRNAs recognize target nucleic acids (DNA, RNA, or both, depending on the system) by base pairing, leading to their destruction. The primary natural function of CRISPR-Cas systems is to provide adaptive immunity against phages [4,5] and other mobile genetic elements [6]. CRISPR-Cas systems are divided into two main classes: class 1, with large, multi-subunit effector complexes, and class 2, with single-protein-subunit effectors [7]. Both CRISPR-Cas classes include multiple types based primarily on the identity of a signature effector protein. Within class 2, the type II systems are the most abundant and the best characterized. The interference function of type II CRISPR-Cas systems requires the Cas9 protein, the crRNA, and a separate non-coding RNA known as the trans-activating crRNA (tracrRNA) [8][9][10]. Successful interference also requires that the DNA target (the "protospacer") be highly complementary to the spacer portion of the crRNA and that the PAM consensus be present at neighboring base pairs [11,12].
Following the discovery that type II interference occurs via double-strand breaks (DSBs) in the DNA target [9], the Cas9 protein was shown to be the only Cas protein required for type II interference, to be manually reprogrammable via engineered CRISPR spacers, and to be functionally portable between species that diverged billions of years ago [10]. Biochemical analyses with purified Cas9 confirmed its role as a crRNA-guided, programmable nuclease that induces R-loop formation between the crRNA and one dsDNA strand, and that cleaves the crRNA-complementary and noncomplementary strands with its HNH and RuvC domains, respectively [13,14]. In vitro cleavage reactions also showed that the tracrRNA is essential for DNA cleavage activity and that the naturally separate crRNA and tracrRNA could retain function when fused into a single-guide RNA (sgRNA) [14]. Several independent reports then showed that the established DSB-inducing activity of Cas9 could be elicited not only in vitro but also in living cells, both bacterial [15] and eukaryotic [16][17][18][19][20]. As with earlier DSB-inducing systems [21], cellular repair of Cas9-generated DSBs by either non-homologous end joining (NHEJ) or homology-directed repair (HDR) enabled live-cell targeted mutagenesis, and the CRISPR-Cas9 system has now been widely adopted as a facile genome-editing platform in a wide range of organisms [22][23][24]. In addition to genome editing, catalytically inactivated Cas9 ("dead" Cas9, dCas9) retains its sgRNA-guided DNA binding function, enabling fused or tethered functionalities to be delivered to precise genomic loci [25,26]. Similar RNA-guided tools for genome manipulations have since been developed from type V CRISPR-Cas systems that use the Cas12a (formerly Cpf1) enzyme [27].
Type II CRISPR-Cas systems are currently grouped into three subtypes (II-A, II-B, and II-C) [7,28]. The vast majority of Cas9 characterization has been done on a single type II-A ortholog, SpyCas9, in part due to its consistently high genome-editing activity. SpyCas9's sgRNAs typically contain a 20-nt guide sequence (the spacer-derived sequence that base pairs to the DNA target [8,14]). The PAM requirement for SpyCas9 is 5′-NGG-3′ (or, less favorably, 5′-NAG-3′), after the 3′ end of the protospacer's crRNA-noncomplementary strand [8,14]. Based on these and other parameters, many sgRNAs directed against potentially targetable sites in a large eukaryotic genome also have near-cognate sites available to it that lead to unintended, "off-target" editing. Indeed, off-target activity by SpyCas9 has been well-documented with many sgRNA-target combinations [29,30], prompting the development of numerous approaches to limit editing activity at unwanted sites [31][32][33][34][35][36]. Although these strategies have been shown to minimize off-targeting to various degrees, they do not always abolish it and they can also reduce on-target activity, at least with some sgRNAs. Furthermore, each of these approaches has required extensive testing, validation, and optimization and in some cases [33,37,38] depended heavily upon prior high-resolution structural characterization [39][40][41][42].
Thousands of other Cas9 orthologs have been documented [7,28,43,44], providing tremendous untapped potential for additional genome-editing capabilities beyond those offered by SpyCas9. Many Cas9 orthologs will provide distinct PAM specificities, increasing the number of targetable sites in any given genome. Many pair-wise Cas9 combinations also have orthogonal guides that load into one ortholog but not the other, facilitating multiplexed applications [44][45][46]. Finally, some Cas9 orthologs (especially those from subtype II-C) are hundreds of amino acids smaller than the 1368 amino acid SpyCas9 [7,43,44] and are therefore more amenable to combined Cas9/sgRNA delivery via a single-size-restricted vector such as adeno-associated virus (AAV) [47,48]. Finally, there may be Cas9 orthologs that exhibit additional advantages such as greater efficiency, natural hyper-accuracy, distinct activities, reduced immunogenicity, or novel means of control over editing. Deeper exploration of the Cas9 population could therefore enable expanded or improved genome engineering capabilities.
We have used N. meningitidis (strain 8013) as a model system for the interference functions and mechanisms of type II-C CRISPR-Cas systems [49][50][51][52]. In addition, we and others previously reported that the type II-C Cas9 ortholog from N. meningitidis (NmeCas9) can be applied as a genome engineering platform [46,53,54]. At 1082 amino acids, NmeCas9 is 286 residues smaller than Spy-Cas9, making it nearly as compact as SauCas9 (1053 amino acids) and well within range of all-in-one AAV delivery. Its spacer-derived guide sequences are longer (24 nt) than those of most other Cas9 orthologs [51], and like SpyCas9, it cleaves both DNA strands between the third and fourth nucleotides of the protospacer (counting from the PAM-proximal end). NmeCas9 also has a longer PAM consensus (5′-N 4 GATT-3′, after the 3′ end of the protospacer's crRNA-noncomplementary strand) [44,46,[51][52][53][54], leading to a lower density of targetable sites compared to SpyCas9. Considerable variation from this consensus is permitted during bacterial interference [46,52], and a smaller number of variant PAMs can also support targeting in mammalian cells [53,54]. Unlike SpyCas9, NmeCas9 has been found to cleave the DNA strand of RNA-DNA hybrid duplexes in a PAM-independent fashion [52,55] and can also catalyze PAM-independent, spacer-directed cleavage of RNA [56]. Recently, natural Cas9 inhibitors (encoded by bacterial mobile elements) have been identified and validated in N. meningitidis and other bacteria with type II-C systems, providing for genetically encodable off-switches for NmeCas9 genome editing [57,58]. These "anti-CRISPR" (Acr) proteins [59] enable temporal, spatial, or conditional control over the NmeCas9 system. Natural inhibitors of type II-A systems have also been discovered in Listeria monocytogenes [60] and Streptococcus thermophilus [61], some of which are effective at inhibiting SpyCas9.
The longer PAM consensus, longer guide sequence, or enzymological properties of NmeCas9 could result in a reduced propensity for off-targeting, and targeted deep sequencing at bioinformatically predicted near-cognate sites is consistent with this possibility [54]. A high degree of genome-wide specificity has also been noted for the dNmeCas9 platform [62]. However, the true, unbiased accuracy of NmeCas9 is not known, since empirical assessments of genome-wide off-target editing activity (independent of bioinformatics prediction) have not been reported for this ortholog. Here we define and confirm many of the parameters of NmeCas9 editing activity in mammalian cells including PAM sequence preferences, guide length limitations, and off-target profiles. Most notably, we use two empirical approaches (GUIDE-seq [63] and SITE-Seq™ [64]) to define Nme-Cas9 off-target profiles and find that NmeCas9 is a high-fidelity genome-editing platform in mammalian cells, with far lower levels of off-targeting than SpyCas9. These results further validate NmeCas9 as a genome engineering platform and suggest that continued exploration of Cas9 orthologs could identify additional RNA-guided nucleases that exhibit favorable properties, even without the extensive engineering efforts that have been applied to SpyCas9 [31,34,35].

Co-expressed sgRNA increases NmeCas9 accumulation in mammalian cells
Previously, we demonstrated that NmeCas9 (derived from N. meningitidis strain 8013 [51]) can efficiently edit chromosomal loci in human stem cells using either dual RNAs (crRNA + tracrRNA) or a sgRNA [53]. To further define the efficacy and requirements of NmeCas9 in mammalian cells, we first constructed an all-in-one plasmid (pEJS15) that delivers both NmeCas9 protein and a sgRNA in a single transfection vector, similar to our previous all-in-one dual-RNA plasmid (pSimple-Cas9-Tracr-crRNA; Addgene #47868) [53]. The pEJS15 plasmid expresses NmeCas9 fused to a C-terminal single-HA epitope tag and nuclear localization signal (NLS) sequences at both N-and C-termini under the control of the elongation factor-1α (EF1α) promoter. The sgRNA cassette (driven by the U6 promoter) includes two BsmBI restriction sites that are used to clone a spacer of interest from short, synthetic oligonucleotide duplexes. First, we cloned three different bacterial spacers (spacers 9, 24, and 25) from the endogenous N. meningitidis CRISPR locus (strain 8013) [51,52] to express sgRNAs that target protospacer (ps) 9, ps24 or ps25, respectively (Additional file 1: Figure S1A). None of these protospacers have cognate targets in the human genome. We also cloned a spacer sequence to target an endogenous genomic NmeCas9 target site (NTS) from chromosome 10 that we called NTS3 ( Table 1). Two of the resulting all-in-one plasmids (spa-cer9/sgRNA and NTS3/sgRNA), as well as a plasmid lacking the sgRNA cassette, were transiently transfected into HEK293T cells for 48 h, and NmeCas9 expression was assessed by anti-HA western blot (Fig. 1a). As a positive control, we also included a sample transfected with a SpyCas9-expressing plasmid (triple-HA epitopetagged, and driven by the cytomegalovirus (CMV) promoter) [65] (Addgene #69220). Full-length NmeCas9 was efficiently expressed in the presence of both sgRNAs (lanes 3 and 4). However, the abundance of the protein was much lower in the absence of sgRNA (lane 2). A different type II-C Cas9 (Corynebacterium diphtheria Cas9, CdiCas9) was shown to be dramatically stabilized by its cognate sgRNA when subjected to proteolysis in vitro [55]; if similar resistance to proteolysis occurs with NmeCas9 upon sgRNA binding, it could explain some or all of the sgRNA-dependent increase in cellular accumulation.

Efficient editing in mammalian cells by NmeCas9
To establish an efficient test system for NmeCas9 activity in mammalian cells, we used a co-transfected fluorescent reporter carrying two truncated, partially overlapping GFP fragments that are separated by a cloning site [66] into which we can insert target protospacers for NmeCas9 (Additional file 1: Figure S1B). Cleavage promotes a single-strand-annealing-based repair pathway that generates an intact GFP open reading frame (ORF), leading to fluorescence [66] that can be scored after 48 h by flow cytometry. We generated reporters carrying three validated bacterial protospacers (ps9, ps24, and ps25, as described above) [51,52] for transient cotransfection into HEK293T cells along with the corresponding NmeCas9/sgRNA constructs. Figure 1b shows that all three natural protospacers of NmeCas9 can be edited in human cells and the efficiency of GFP induction was comparable to that observed with SpyCas9 (Fig. 1b).
Next, we reprogrammed NmeCas9 by replacing the bacterially derived spacers with a series of spacers designed to target 13 human chromosomal sites (Fig. 1c, d and e) with an N 4 GATT PAM (Table 1). These sgRNAs induced insertion/deletion (indel) mutations at all sites tested, except NTS10 (Fig. 1c, lanes [23][24][25], as determined by T7 Endonuclease 1 (T7E1) digestion (Fig. 1c). The editing efficiencies ranged from 5% for NTS1B site to 47% in the case of NTS33 (Fig. 1d), though T7E1 tends to underestimate the  true frequencies of indel formation [67]. Targeted deep sequencing of PCR amplicons, which is a more quantitative readout of editing efficiency, confirmed editing with indel efficiencies ranging from~15 to 85% (Fig. 1e). These results show that NmeCas9 can induce, with variable efficiency, edits at many genomic target sites in human cells. Furthermore, we demonstrated NmeCas9 genome editing in multiple cell lines and via distinct delivery modes. Nucleofection of NmeCas9 ribonucleoprotein (RNP) (loaded with an in vitro-transcribed sgRNA) led to indel formation at three sites in K562 chronic myelogenous leukemia cells and in hTERT-immortalized human foreskin fibroblasts (gift from Dr. Job Dekker) (Fig. 1f). In addition, mouse embryonic stem cells (mESCs) and HEK293T cells were transduced with a lentivirus construct expressing NmeCas9. In these cells, transient transfection of plasmids expressing a sgRNA led to genome editing (Fig. 1f). Collectively, our results show that Nme-Cas9 can be used for genome editing in a range of human or mouse cell lines via plasmid transfection, RNP delivery, or lentiviral transduction.

PAM specificity of NmeCas9 in human cells
During native CRISPR interference in bacterial cells, considerable variation in the N 4 GATT PAM consensus is tolerated: although the G1 residue (N 4 GATT) is strictly required, virtually all other single mutations at A2 (N 4 GATT), T3 (N 4 GATT), and T4 (N 4 GATT) retain at least partial function in licensing bacterial interference [46,52]. In contrast, fewer NmeCas9 PAM variants have been validated for genome editing in mammalian cells [53,54]. To gain more insight into NmeCas9 PAM flexibility and specificity in mammalian cells, and in the context of an otherwise identical target site and an invariant sgRNA, we employed the split-GFP readout of cleavage activity (Additional file 1: Figure S1B). We introduced single-nucleotide mutations at every position of the PAM sequence of ps9, as well as all double-mutant combinations of the four most permissive single mutants, and then measured the ability of NmeCas9 to induce GFP fluorescence in transfected HEK293T cells. The results are shown in Fig. 2a. As expected, mutation of the G1 residue to any other base reduced editing to background levels, as defined by the control reporter that lacks a protospacer [(no ps), see Fig. 3a]. As for mutations at the A2, T3, and T4 positions, four single mutants (N 4 GCTT, N 4 GTTT, N 4 GACT, and N 4 GATA) and two double mutants (N 4 GTCT and N 4 GACA) were edited with efficiencies approaching that observed with the N 4 GATT PAM. Two other single mutants (N 4 GAGT and N 4 GATG) and three double mutants (N 4 GCCT, N 4 GCTA, and N 4 GTTA) gave intermediate or low efficiencies, and the remaining mutants tested were at or near background levels. We note that some of the minimally functional or non-functional PAMs (e.g., N 4 GAAT and N 4 GATC) in this mammalian assay fit the functional consensus sequences defined previously in E. coli [46]. We then used T7E1 analysis to validate genome editing at eight native chromosomal sites associated with the most active PAM variants (N 4 GCTT, N 4 GTTT, N 4 GACT, N 4 GATA, N 4 GTCT, and N 4 GACA). Our results with this set of targets indicate that all of these PAM variants tested except N 4 GACA support chromosomal editing ( Fig. 2b and c). Targeted deep sequencing confirmed editing with indel efficiencies ranging from8 to 60% (except for target site NTS21, which amplified poorly with Illumina-compatible primers) (Fig. 2d).

Apo NmeCas9 is not genotoxic to mammalian cells
NmeCas9 and several other type II-C Cas9 orthologs have been shown to possess an RNA-dependent ssDNA cleavage (DNase H) activity in vitro [52,55]. R-loops (regions where an RNA strand invades a DNA duplex to form a DNA:RNA hybrid, with the other DNA strand displaced) occur naturally during transcription and other cellular processes [68]. Since DNase H activity is independent of the tracrRNA or the PAM sequence, it is theoretically possible that it could degrade naturally occurring R-loops in living cells. Global degradation of R-loops in cells could result in an increase in DNA damage detectable by increased γH2AX staining [69]. To test whether the DNase H activity of NmeCas9 could Table 1 NmeCas9 or SpyCas9 guide and target sequences used in this study. NTS, NmeCas9 target site; STS, SpyCas9 target site. The sgRNA spacer sequences (5′➔3′) are shown with their canonical lengths, and with a 5′-terminal G residue; non-canonical lengths are described in the text and figures. Target site sequences are also 5′➔3′ and correspond to the DNA strand that is non-complementary to the sgRNA, with PAM sequences underlined. The names of sites that showed at least 3% editing with NmeCas9 are indicated in bold (Continued)

NmeCas9 Site
Gene or locus Spacer Sequence Target site, with PAM  lead to an increase in γH2AX, we transduced mouse embryonic stem cells E14 (mESCs) with lentiviral plasmids expressing NmeCas9 (Addgene #120076) and dNmeCas9 (Addgene #120077) (which lacks DNase H activity [52]) to create stable cell lines expressing NmeCas9 or dNmeCas9, respectively. mESCs are ideal for this purpose as R-loops have been extensively studied in these cells and have been shown to be important for differentiation [70]. We performed γH2AX staining of these two cell lines and compared them to wildtype E14 cells. As a positive control for γH2AX induction, we exposed wildtype E14 cells to UV, a known stimulator of the global DNA damage response.
Immunofluorescence microscopy of cells expressing Nme-Cas9 or dNmeCas9 exhibited no increase in γH2AX foci compared to wildtype E14, suggesting that sustained NmeCas9 expression is not genotoxic (Additional file 1: Figure S2A). In contrast, cells exposed to UV light showed a significant increase in γH2AX levels. Flow cytometric measurements of γH2AX immunostaining confirmed these results (Additional file 1: Figure S2B). We next tested whether continuous NmeCas9 expression in human HEK293T cells has genotoxic effects. We performed γH2AX staining as above and found no difference between the wildtype cells and those expressing NmeCas9 (Additional file 1: Figure S2C). These data indicate that NmeCas9 expression does not lead to a global DNA damage response in mESCs or human cells.

Comparative analysis of NmeCas9 and SpyCas9
SpyCas9 is by far the best-characterized Cas9 ortholog and is therefore the most informative benchmark when  defining the efficiency and accuracy of other Cas9s. To facilitate comparative experiments between NmeCas9 and SpyCas9, we developed a matched Cas9 + sgRNA expression system for the two orthologs. This serves to minimize the expression differences between the two Cas9s in our comparative experiments, beyond those differences dictated by the sequence variations between the orthologs themselves. To this end, we employed the separate pCSDest2-SpyCas9-NLS-3XHA-NLS (Addgene #69220) and pLKO.1-puro-U6sgRNA-BfuA1 (Addgene #52628) plasmids reported previously for the expression of SpyCas9 (driven by the CMV promoter) and its sgRNA (driven by the U6 promoter), respectively [58,65]. We then replaced the bacterially derived SpyCas9 sequence (i.e., not including the terminal fusions) with that of Nme-Cas9 in the CMV-driven expression plasmid. This yielded an NmeCas9 expression vector (pEJS424) that is identical to that of the SpyCas9 expression vector in every way (backbone, promoters, UTRs, poly(A) signals, terminal fusions, etc.) except for the Cas9 sequence itself. Similarly, we replaced the SpyCas9 sgRNA cassette in pLKO.1-puro-U6sgRNA-BfuA1 with that of the NmeCas9 sgRNA [46,53], yielding the NmeCas9 sgRNA expression plasmid pEJS333. This matched system facilitates direct comparisons of the two enzymes' accumulation and activity during editing experiments. To assess relative expression levels of the identically tagged Cas9 orthologs, the two plasmids were transiently transfected into HEK293T cells for 48 h, and the expression of the two proteins was monitored by anti-HA western blot (Fig. 3a). Consistent with our previous data ( Fig. 1a), analyses of samples from identically transfected cells show that NmeCas9 accumulation is stronger when co-expressed with its cognate sgRNA (Fig. 3a, compare lane 6 to 4 and 5), whereas SpyCas9 is not affected by the presence of its sgRNA (lanes 1-3).
For an initial comparison of the cleavage efficiencies of the two Cas9s, we chose three previously validated SpyCas9 guides targeting the AAVS1 "safe harbor" locus [20,71] and used the CRISPRseek package [72] to design three NmeCas9 guides targeting the same locus within a region of~700 base pairs (Additional file 1: Figure S3A). The matched Cas9/sgRNA expression systems described above were used for transient transfection of HEK293T cells. T7E1 analysis showed that the editing efficiencies were comparable, with the highest efficiency being observed when targeting the NTS59 site with NmeCas9 ( Fig. 3b and Additional file 1: Figure S3B).
To provide a direct comparison of editing efficiency between the SpyCas9 and NmeCas9 systems, we took advantage of the non-overlapping PAMs of SpyCas9 and NmeCas9 (NGG and N 4 GATT, respectively). Because the optimal SpyCas9 and NmeCas9 PAMs are non-overlapping, it is simple to identify chromosomal target sites that are compatible with both orthologs, i.e., that are dual target sites (DTSs) with a composite PAM sequence of NGGNGATT that is preferred by both nucleases. In this sequence context, both Cas9s will cleave the exact same internucleotide bond (NN/NNNNGGNGATT; cleaved junction in bold, and PAM region underlined), and both Cas9s will have to contend with the exact same sequence and chromatin structural context. Furthermore, if the target site contains a G residue at position − 24 of the sgRNA-noncomplementary strand (relative to the PAM) and another at position − 20, then the U6 promoter can be used to express perfectly matched sgRNAs for both Cas9 orthologs. Four DTSs with these characteristics were used in this comparison (Additional file 1: Figure S4A). We had previously used NmeCas9 to target a site (NTS7) that happened also to match the SpyCas9 PAM consensus, so we included it in our comparative analysis as a fifth site, even though it has a predicted rG-dT wobble pair at position − 24 for the NmeCas9 sgRNA (Additional file 1: Figure S4A).
We compared the editing activities of both Cas9 orthologs programmed to target the five chromosomal sites depicted in Additional file 1: Figure S4A, initially via T7E1 digestion. SpyCas9 was more efficient than NmeCas9 at generating edits at the DTS1 and DTS8 sites (Fig. 3c, lanes 1-2 and 13-14). In contrast, Nme-Cas9 was more efficient than SpyCas9 at the DTS3 and NTS7 sites (Fig. 3c, lanes 5-6 and 17-18). Editing at DTS7 was approximately equal with both orthologs (Fig. 3c, lanes 9-10). Data from three biological replicates of all five target sites are plotted in Fig. 3d. The remainder of our comparative studies focused on DTS3, DTS7, and DTS8, as they provided examples of target sites with NmeCas9 editing efficiencies that are greater than, equal to, or lower than those of SpyCas9, respectively. The editing efficiency of these three sites was confirmed by targeted deep sequencing (see below). At all three of these sites, the addition of an extra 5′-terminal G residue had little to no effect on editing by either SpyCas9 or NmeCas9 (Additional file 1: Figure S4B). Truncation of the three NmeCas9 guides down to 20 nucleotides (all perfectly matched) again had differential effects on editing efficiency from one site to the next, with no reduction in DTS7 editing, partial reduction in DTS3 editing, and complete loss of DTS8 editing (Additional file 1: Figure S4B). These results establish the guide/target context for deeper comparative analyses of SpyCas9 and NmeCas9 indel spectra and accuracy at shared chromosomal sites.

Indel spectrum at NmeCas9-and SpyCas9-edited sites
Our targeted deep sequencing data at the three dual target sites DTS3, DTS7 and DTS8 (Fig. 4d, Additional file 1: Figure S4A and Additional file 2: Table S5) enabled us to analyze the spectrum of insertions and deletions generated by NmeCas9, in comparison with those of SpyCas9 when editing the exact same sites (Additional file 1: Figures S5B-S8). Although small deletions predominated at all three sites with both Cas9 orthologs, the frequency of insertions was lower for NmeCas9 than it was with SpyCas9 (Additional file 1: Figures S5B-S8). For both SpyCas9 and NmeCas9, the vast majority of insertions were only a single nucleotide (Additional file 1: Figure S7). The sizes of the deletions varied from one target site to the other for both Cas9 orthologs. Our data suggest that at Cas9 edits, deletions predominated over insertions and the indel size varies considerably from site to site (Additional file 1: Figures S5B, S9 and S10).
Assessing the genome-wide precision of NmeCas9 editing All Cas9 orthologs described to date have some propensity to edit off-target sites lacking perfect complementarity to the programmed guide RNA, and considerable effort has been devoted to developing strategies (mostly with SpyCas9) to increase editing specificity (reviewed in [31,34,35]). In comparison with SpyCas9, orthologs such as NmeCas9 that employ longer guide sequences and that require longer PAMs have the potential for Bioinformatic and empirical comparison of NmeCas9 and SpyCas9 off-target sites within the human genome. a Genome-wide computational (CRISPRseek) predictions of off-target sites for NmeCas9 (with N 4 GN 3 PAMs) and SpyCas9 (with NGG, NGA, and NAG PAMs) with DTS3, DTS7, and DTS8 sgRNAs. Predicted off-target sites were binned based on the number of mismatches (up to six) with the guide sequences. b GUIDE-Seq analysis of off-target sites in HEK293T cells with sgRNAs targeting DTS3, DTS7, and DTS8, using either SpyCas9 or NmeCas9, and with up to 6 mismatches to the sgRNAs. The numbers of detected off-target sites are indicated at the top of each bar. c Numbers of independent GUIDE-Seq reads for the on-and off-target sites for all six Cas9/sgRNA combinations from (b) (SpyCas9, orange; NmeCas9, blue), binned by the number of mismatches with the corresponding guide. d Targeted deep sequencing analysis of editing efficiencies at on-and off-target sites from (a) or (b) with SpyCas9 (left, orange) or NmeCas9 (right, blue). Data for off-target sites are in grey. For SpyCas9, all off-target sites were chosen from (b) based on the highest GUIDE-Seq read counts for each guide (Additional file 10: Table S3). For NmeCas9, in addition to those candidate off-target sites obtained from GUIDE-Seq (c), we also assayed one or two potential off-target sites (designated with the "-CS" suffix) predicted by CRISPRseek as the closest near-cognate matches with permissive PAMs. Data are mean values ± s.e.m. from three biological replicates performed on different days greater on-target specificity, possibly due in part to the lower density of near-cognate sequences. As an initial step in exploring this possibility, we used CRISPRseek [72] to perform a global analysis of potential NmeCas9 and Spy-Cas9 off-target sites with six or fewer mismatches in the human genome, using sgRNAs specific for DTS3, DTS7, and DTS8 (Fig. 4a) as representative queries. When allowing for permissive and semi-permissive PAMs (NGG, NGA, and NAG for SpyCas9; N 4 GHTT, N 4 GACT, N 4 GAYA, and N 4 GTCT for NmeCas9), potential off-target sites for NmeCas9 were predicted with two to three orders of magnitude lower frequency than for SpyCas9 (Table 2). These results hold true even when we relaxed the PAM requirement to the "slipped" N 5 GHTT, N 5 GACT, N 5 GAYA, and N 5 GTCT PAMs with variant spacing (analyzed because purified, recombinant NmeCas9 has been observed to catalyze DNA cleavage in vitro at such sites [52]). Furthermore, Nme-Cas9 off-target sites with fewer than five mismatches were rare (two sites with four mismatches) for DTS7, and non-existent for DTS3 and DTS8 (Table 2). Even when we relaxed the NmeCas9 PAM requirement to N 4 GN 3 , which includes some PAMs that enable only background levels of targeting (e.g., N 4 GATC (Fig. 2a)), the vast majority of predicted off-target sites (> 96%) for these three guides had five or more mismatches, and none had fewer than four mismatches among the 24 nucleotides of the spacer (Fig. 4a). In contrast, the SpyCas9 guides targeting DTS3, DTS7, and DTS8 had 49, 54, and 62 predicted off-target sites with three or fewer mismatches, respectively, among the 20 nucleotides of the spacer ( Table 2). As speculated previously [53,54], these bioinformatic predictions suggest the possibility that the NmeCas9 genome-editing system may induce very few undesired mutations, or perhaps none, even when targeting sites that induce substantial off-targeting with SpyCas9.
Although bioinformatic predictions of off-targeting can be useful, it is well established that off-target profiles must be defined experimentally in a prediction-independent fashion due to our limited understanding of target specificity determinants, and the corresponding inability of algorithms to predict all possible sites successfully [31,34,35]. The need for empirical off-target profiling is especially acute with Cas9 orthologs that are far less thoroughly characterized than SpyCas9. A previous report used PCR amplification and high-throughput sequencing to detect the frequencies of mutations at 15-20 predicted NmeCas9 off-target sites for each of three guides in human cells and found only background levels of indels in all cases, suggesting a very high degree of precision for NmeCas9 [54]. However, this report restricted its analysis to candidate sites with N 4 GNTT PAMs and three or fewer mismatches (or two mismatches combined with a 1 nucleotide bulge) in the PAM-proximal 19 nucleotides, leaving open the possibility that legitimate off-target sites that did not fit these specific criteria remained unexamined. Accordingly, empirical and minimally biased off-target profiles have never been generated for any NmeCas9/sgRNA combination, and the true off-target propensity of NmeCas9 therefore remains unknown. At the time we began this work, multiple methods for prediction-independent detection of off-target sites had been reported including GUIDE-seq, BLESS, Digenome-Seq, HTGTS, and IDLV capture, each with their own advantages and disadvantages (reviewed in [31,34,35]); additional methods (SITE-Seq [64], CIRCLE-seq [73], and BLISS [74]) have been reported more recently. Initially, we chose to apply GUIDE-seq [63], which takes advantage of oligonucleotide incorporation into double-strand break sites, for defining the off-target profiles of both SpyCas9 and NmeCas9 when each is programmed to edit the DTS3, DTS7, and DTS8 sites (Fig. 3c-d) in the human genome. Table 2 Number of predicted near-cognate sites in the human genome for the three dual target sites (DTS3, DTS7 and DTS8) analyzed in this study. These potential off-target sites differ from the on-target site by six or fewer mismatches, as listed on the left, and include the functional or semi-functional PAMs shown at the top  After confirming that the co-transfected doublestranded oligodeoxynucleotide (dsODN) was incorporated efficiently at the DTS3, DTS7, and DTS8 sites during both NmeCas9 and SpyCas9 editing (Additional file 1: Figure S4C), we then prepared GUIDE-seq libraries for each of the six editing conditions, as well as for the negative control conditions (i.e., in the absence of any sgRNA) for both Cas9 orthologs. The GUIDE-seq libraries were then subjected to high-throughput sequencing, mapped, and analyzed as described [75] (Fig. 4b-c). On-target editing with these guides was readily detected by this method, with the number of independent reads ranging from a low of 167 (NmeCas9, DTS8) to a high of 1834 (NmeCas9, DTS3) ( Fig. 4c and Additional file 3: Table S2).
For our initial analyses, we scored candidate sites as true off-targets if they yielded two or more independent reads and had six or fewer mismatches with the guide, with no constraints placed on the PAM match at that site. For SpyCas9, two of the sgRNAs (targeting DTS3 and DTS7) induced substantial numbers of off-target editing events (271 and 54 off-target sites, respectively (Fig. 4b)) under these criteria. The majority of these Spy-Cas9 off-target sites (88% and 77% for DTS3 and DTS7, respectively) were associated with a canonical NGG PAM. Reads were very abundant at many of these loci and at five off-target sites (all with the DTS3 sgRNA) even exceeded the number of on-target reads (Fig. 4c). SpyCas9 was much more precise with the DTS8 sgRNA: we detected a single off-target site with five mismatches and an NGG PAM, and it was associated with only three independent reads, far lower than the 415 reads that we detected at the on-target site ( Fig. 4c and Additional file 3: Table S2). Overall, the range of editing accuracies that we measured empirically for Spy-Cas9-very high (e.g., DTS8), intermediate (e.g., DTS7), and poor (e.g., DTS3)-are consistent with the observations of other reports using distinct guides (reviewed in [31,34,35]).
In striking contrast, GUIDE-seq analyses with Nme-Cas9, programmed with sgRNAs targeting the exact same three sites, yielded off-target profiles that were exceptionally specific in all cases (Fig. 4b-c). For DTS3 and DTS8, we found no reads at any site with six or fewer guide mismatches; for DTS7, we found one off-target site with four mismatches (three of which were at the PAM-distal end; see Additional file 3: Table S2), and even at this site, there were only 12 independent reads,~100× fewer than the 1222 reads detected at DTS7 itself. This off-target site was also associated with a PAM (N 4 GGCT) that would be expected to be poorly functional, though it could also be considered a "slipped" PAM with a more optimal consensus but variant spacing (N 5 GCTT). To explore the off-targeting potential of NmeCas9 further, we decreased the stringency of our mapping to allow detection of off-target sites with up to 10 mismatches. Even in these conditions, only four (DTS7), 15 (DTS8), and 16 (DTS3) candidate sites were identified, most of which had only four or fewer reads (Fig. 4c) and were associated with poorly functional PAMs (Additional file 3: Table S2). We consider it likely that most if not all of these low-probability candidate off-target sites represent background noise caused by spurious priming and other sources of experimental error.
As an additional test of off-targeting potential, we repeated the DTS7 GUIDE-seq experiments with both SpyCas9 and NmeCas9, but this time using a different transfection reagent (Lipofectamine3000 rather than Polyfect). These repeat experiments revealed that > 96% (29 out of 30) of off-target sites with up to five mismatches were detected under both transfection conditions for SpyCas9 (Additional file 4: Table S1). However, the NmeCas9 GUIDE-seq data showed no overlap between the potential sites identified under the two conditions, again suggesting that the few off-target reads that we did observe are unlikely to represent legitimate off-target editing sites.
To confirm the validity of the off-target sites defined by GUIDE-seq, we designed primers flanking candidate off-target sites identified by GUIDE-seq, PCR-amplified those loci following standard genome editing (i.e., in the absence of co-transfected GUIDE-seq dsODN) (3 biological replicates), and then subjected the PCR products to high-throughput sequencing to detect the frequencies of Cas9-induced indels. For this analysis, we chose the top candidate off-target sites (as defined by GUIDE-seq read count) for each of the six cases (DTS3, DTS7, and DTS8, each edited by either SpyCas9 or NmeCas9). In addition, due to the low numbers of off-target sites and the low off-target read counts observed during the NmeCas9 GUIDE-seq experiments, we analyzed the top two predicted off-target sites for the three NmeCas9 sgRNAs, as identified by CRISPRseek ( Fig. 4a and Table 2) [72]. On-target indel formation was detected in all cases, with editing efficiencies ranging from 7% (DTS8, with both SpyCas9 and Nme-Cas9) to 39% (DTS3 with NmeCas9) (Fig. 4d). At the off-target sites, our targeted deep-sequencing analyses largely confirmed our GUIDE-seq results: SpyCas9 readily induced indels at most of the tested off-target sites when paired with the DTS3 and DTS7 sgRNAs, and in some cases, the off-target editing efficiencies approached those observed at the on-target sites (Fig. 4d). Although some SpyCas9 off-targeting could also be detected with the DTS8 sgRNA, the frequencies were much lower (< 0.1% in all cases). Off-target edits induced by NmeCas9 were far less frequent in all cases, even with the DTS3 sgRNA that was so efficient at on-target mutagenesis: many off-target sites exhibited editing efficiencies that were indistinguishable from background sequencing error rates (Fig. 4d). These results, in combination with the GUIDE-seq analyses described above, reveal wildtype NmeCas9 to be an exceptionally precise genome-editing enzyme.
To explore NmeCas9 editing accuracy more deeply, we used 16 NmeCas9 target sites among the 24 sites across the genome we tested earlier, 10 with canonical N 4 GATT PAMs and six with variant functional PAMs (Additional file 5: Table S9). We then performed GUIDE-seq analyses of NmeCas9 editing at these sites. GUIDE-seq analysis readily revealed editing at each of these sites, with on-target read counts ranging from~100 to~5000 reads (Fig. 5a) confirming the on-target editing shown earlier by T7E1 assay and deep sequencing analyses (Fig. 1c-e and Fig. 2b-d). Most notably, off-target reads were undetectable by GUIDE-seq with 14 out of the 16 sgRNAs (Fig. 5b).
The two guides with off-target activity (NTS1C and NTS25) had only two and one off-target sites, respectively  Figure S11). Off-target editing was confirmed by high-throughput sequencing and analysis of indels (Fig. 5d). Compared with the on-target site (perfectly matched at all positions other than the 5′-terminal guide nucleotide, and with an optimal N 4 GATT PAM), the efficiently targeted NTS1C-OT1 had two wobble pairs and one mismatch (all in the nine PAM-distal nucleotides), as well as a canonical N 4 GATT PAM (Fig. 5c and Additional file 3: Table S2). The weakly edited NTS1C-OT2 site had only a single mismatch (at the 11th nucleotide, counting in the PAMdistal direction), but was associated with a non-canonical N 4 GGTT (or a "slipped" N 5 GTTT) PAM ( Fig. 5c and Additional file 3: Table S2). NTS25 with an N 4 GATA PAM was the other guide with a single off-target site (NTS25-OT1), where NmeCas9 edited up to~1000× less efficiently than at the on-target site (Fig. 5d). This minimal amount of off-target editing arose despite the association of NTS25-OT1 with an optimal N 4 GATT PAM, unlike the variant N 4 GATA PAM that flanks the on-target site. Overall, our GUIDE-seq and sequencing-based analyses demonstrate that NmeCas9 genome editing is exceptionally accurate: we detected and confirmed cellular off-target editing with only two of the 19 guides tested, and even in those two cases, only one or two off-target sites could be found for each. Furthermore, of the three bona fide off-target sites that we identified, only one generated indels at substantial frequency (11.6%); indel frequencies were very modest (0.3% or lower) at the other two off-target sites.
We next sought to corroborate and expand on our GUIDE-seq results with a second prediction-independent method. We applied the SITE-Seq assay, a biochemical method that does not rely on cellular events such as DNA repair, thus potentially enabling a more thorough profiling of genome-wide specificity [64]. SITE-Seq libraries were prepared for the three dual target sites with both Cas9 orthologs as well as for 12 of the NmeCas9-only target sites. SITE-Seq was performed on HEK293T genomic DNA (gDNA) treated with a range of RNP concentrations (4-256 nM) previously shown to discriminate high-and low-probability cellular off-targets [64]. Finally, the resulting libraries were sequenced, aligned, and then analyzed as previously described [64].
Negative controls without RNP recovered zero sites across any concentrations, whereas SpyCas9 assembled with sgRNAs targeting DTS3, DTS7, or DTS8 recovered hundreds (at 4 nM RNP) to thousands (at 256 nM RNP) of biochemical off-target sites (Fig. 5e). In contrast, Nme-Cas9 assembled with sgRNAs targeting the same three sites recovered only their on-target sites at 4 nM RNP and at most 29 off-target sites at 256 nM RNP (Fig. 5e). Moreover, the 12 additional NmeCas9 target sites showed similarly high specificity: eight samples recovered only the on-target sites at 4 nM RNP and six of those recovered no more than nine off-targets at 256 nM RNP (Additional file 1: Figure S5A). Across NmeCas9 RNPs, off-target sequence mismatches appeared enriched in the 5′ end of the sgRNA target sequence (Additional file 6: Table S4). Finally, three of the NmeCas9 RNPs (NTS30, NTS4C, and NTS59) required elevated concentrations to retrieve their on-targets, potentially due to poor sgRNA transcription and/or RNP assembly. These RNPs were therefore excluded from further analysis.
We next performed cell-based validation experiments to investigate whether any of the biochemical off-targets were edited in cells. Since NmeCas9 recovered only~100 biochemical off-targets across all RNPs and concentrations, we could examine each site for editing in cells. SpyCas9 generated > 10,000 biochemical off-targets across all DTS samples, preventing comprehensive cellular profiling. Therefore, for each RNP, we randomly selected 95 high cleavage sensitivity SITE-Seq sites (i.e., recovered at all concentrations tested in SITE-Seq) for examination, as we predicted those were more likely to accumulate edits in cells [64] (Additional file 2: Table S5). Notably, only a subset of the sites validated from GUIDE-seq were contained within this list of sites (1/8 and 5/8 overlapping sites for DTS3 and DTS7, respectively). SITE-Seq and GUIDE-seq validations were performed on the same gDNA samples to facilitate comparisons between data sets. Across all NmeCas9 RNPs, only three cellular off-targets were observed. These three all belonged to the NTS1C RNP, and two of them had also been detected previously with GUIDE-seq. All high cleavage sensitivity SITE-Seq sites (i.e., all on-targets and the single prominent NTS1C off-target, NTS1C-OT1) showed editing in cells. Conversely, SITE-Seq sites with low cleavage sensitivity, defined as being recovered at only 64 nM and/or 256 nM RNP, were rarely found as edited (2/93 sites). Importantly, this suggests that we identified all or the clear majority of NmeCas9 cellular off-targets, albeit at our limit of detection. Across all SpyCas9 RNPs, 14 cellular off-targets were observed (8/70 sites for DTS3, 6/83 sites for DTS7, and 0/79 sites for DTS8) [Additional file 2: Table S5; not all 95 amplicons were included in the final analysis as some were filtered due to low read coverage or high variant calls in the untreated sample (see materials and methods for more details)]. Since our data set was only a subset of the total number of high cleavage sensitivity SITE-Seq sites and excluded many of the GUIDE-seq validated sites, we expect that sequencing all SITE-Seq sites may uncover additional cellular off-targets. Taken together, these data corroborate our GUIDE-seq results, suggesting that NmeCas9 can serve as a highly specific genome editing platform.

Functionality of truncated sgRNAs with NmeCas9
SpyCas9 can accommodate limited variation in the length of the guide region (normally 20 nucleotides) of its sgRNAs [76][77][78][79], and sgRNAs with modestly lengthened (22-nt) or shortened (17-18-nt) guide regions can even enhance editing specificity by reducing editing at off-target sites by a greater degree than they affect editing at the on-target site [76,77]. To test the length dependence of the NmeCas9 guide sequence (normally 24 nucleotides [51]) during mammalian editing, we constructed a series of sgRNAs containing 18,19,20,21,22,23, and 24 nucleotides of complementarity to ps9 cloned into the split-GFP reporter plasmid (Additional file 1: Figure S12A). All designed guides started with two guanine nucleotides (resulting in 1-2 positions of target non-complementarity at the very 5′ end of the guide) to facilitate transcription, analogous to the SpyCas9 "GGN 20 " sgRNAs [76]. We then measured the abilities of these sgRNAs to direct NmeCas9 cleavage of the reporter in human cells. sgRNAs that have 20-23 nucleotides of target complementarity showed activities comparable to the sgRNA with the natural 24 nucleotides of complementarity, whereas sgRNAs containing 18 or 19 nucleotides of complementarity show lower activity (Fig. 6a).
We next used a native chromosomal target site (NTS33 in VEGFA, as in Fig. 1c-e) to test the editing efficiency of NmeCas9 spacers of varying lengths (Additional file 1: Figure S12B). sgRNA constructs included one or two 5′-terminal guanine residues to enable transcription by the U6 promoter, sometimes resulting in 1-2 nucleotides of target non-complementarity at the 5′ end of the guide sequence. sgRNAs with 20, 21, or 22 nucleotides of target complementarity (GGN 18 , GGN 19 , and GGN 20 , respectively) performed comparably to the natural guide length (24 nucleotides of complementarity, GN 23 ) at this site (Fig. 6b-c), and within this range, the addition of 1-2 unpaired G residues at the 5′ end had no adverse effect. These results are consistent with the results obtained with the GFP reporter (Fig. 6a). sgRNAs with guide lengths of 19 nucleotides or shorter, along with a single mismatch in the first or second position (GGN 17 , GGN 16 , and GGN 15 ), did not direct detectable editing, nor did a sgRNA with perfectly matched guide sequences of 17 or 14 nucleotides (GN 16 and GN 13 , respectively) (Fig. 6b-c). However, a 19-nt guide with no mismatches (GN 18 ) successfully directed editing. These results indicate that 19-26-nt guides can be tolerated by NmeCas9, but that activity can be compromised by guide truncations from the natural length of 24 nucleotides down to 17-18 nucleotides and smaller, and that single mismatches (even at or near the 5′-terminus of the guide) can be discriminated against with a 19-nt guide.
The target sites tested in Fig. 6a-c are both associated with a canonical N 4 GATT PAM. To examine length dependence at a site with a variant PAM, we varied guide sequence length at the N 4 GCTT-associated NTS32 site (also in VEGFA). In this experiment, each of the guides had two 5′-terminal G residues, accompanied by 1-2 terminal mismatches with the target sequence (Additional file 1: Figure S12C). At the NTS32 site, sgRNAs with 21-24 nucleotides of complementarity (GGN 24 , GGN 23 , GGN 22 , and GGN 21 ) supported editing, but shorter guides (GGN 20 , GGN 19 , and GGN 18 ) did not (Fig. 6d-e). We conclude that sgRNAs with 20 nucleotides of complementarity can direct editing at some sites (Fig. 6b-c) but not all (Fig. 6d-e). It is possible that this minor variation in length dependence can be affected by the presence of mismatched 5′-terminal G residues in the sgRNA, the adherence of the target to the canonical N 4 GATT PAM consensus, or both, but the consistency of any such relationship will require functional tests at much larger numbers of sites. Nonetheless, NmeCas9 guide truncations of 1-3 nucleotides appear to be functional in most cases, in agreement with the results of others [54].

Truncated sgRNAs reduce off-target cleavage by NmeCas9
Although NmeCas9 exhibits very little propensity to edit off-target sites, for therapeutic applications, it may be desirable to suppress even the small amount of off-targeting that occurs (Fig. 5). Several strategies have been developed to suppress off-targeting by SpyCas9 [31,34,35], some of which could be readily applied to other orthologs. For example, truncated sgRNAs (tru-sgRNAs) sometimes suppress off-target SpyCas9 editing more than they suppress on-target editing [77]. Because 5′-terminal truncations are compatible with NmeCas9 function (Fig. 6), we tested whether NmeCas9 tru-sgRNAs can have similar suppressive effects on off-target editing without sacrificing on-target editing efficiency.
First, we tested whether guide truncation can lead to NmeCas9 editing at novel off-target sites (i.e., at off-target sites not edited by full-length guides), as reported previously for SpyCas9 [77]. Our earlier tests of NmeCas9 on-target editing with tru-sgRNAs used guides targeting the NTS33 (Fig. 6b-c) and NTS32 (Fig. 6d-e) sites. We again used GUIDE-seq with a subset of the validated NTS32 and NTS33 tru-sgRNAs to determine whether NmeCas9 guide truncation leads to off-target editing at new sites, and found none (Additional file 1: Figure S13). Although we cannot rule out the possibility that other NmeCas9 guides could be identified that yield novel off-target events upon truncation, our results suggest that de novo off-targeting by NmeCas9 tru-sgRNAs is unlikely to be a pervasive problem.
The most efficiently edited off-target site from our previous analyses was NTS1C-OT1, providing us with our most stringent test of off-target suppression. When targeted by the NTS1C sgRNA, NTS1C-OT1 has one rG-dT wobble pair at position − 16 (i.e., at the 16th base pair from the PAM-proximal end of the R-loop), one rC-dC mismatch at position − 19, and one rU-dG wobble pair at position − 23 (Fig. 5c). We generated a series of NTS1C-targeting sgRNAs with a single 5′-terminal G (for U6 promoter transcription) and spacer complementarities ranging from 24 to 15 nucleotides (GN 24 to GN 15 , Additional file 1: Figure S14A, top panel). Conversely, we designed a similar series of sgRNAs with perfect complementarity to NTS1C-OT1 (Additional file 1: Figure S14B, top panel). Consistent with our earlier results with other target sites (Fig. 6), T7E1 analyses revealed that both sets of guides enabled editing of the perfectly-matched on-target site with truncations down to 19 nucleotides (GN 18 ), but that shorter guides were inactive. On-target editing efficiencies at both sites were comparable across the seven active guide lengths (GN 24 through GN 18 ), with the exception of slightly lower efficiencies with the GN 19 guides (Additional file 1: Figure S14A & B, middle and bottom panels).
We then used targeted deep sequencing to test whether off-target editing is reduced with the truncated sgRNAs. With both sets of sgRNAs (perfectly complementary to either NTS1C or NTS1C-OT1), we found that off-targeting at the corresponding near-cognate site persisted with the four longest guides (GN 24 , GN 23 , GN 22 , GN 21 ; Fig. 7). However, off-targeting was abolished with the GN 20 guide, without any significant reduction in on-target editing efficiencies (Fig. 7). Off-targeting was also absent with the GN 19 guide, though on-target editing efficiency was compromised. These results, albeit from a limited data set, indicate that truncated sgRNAs (especially those with 20 or 19 base pairs of guide/target complementarity, 4-5 base pairs fewer than the natural length) can suppress even the limited degree of off-targeting that occurs with NmeCas9.
Unexpectedly, even though off-targeting at NTS1C-OT1 was abolished with the GN 20 and GN 19 truncated NTS1C sgRNAs, truncating by an additional nucleotide (to generate the GN 18 sgRNA) once again yielded NTS1C-OT1 edits (Fig. 7a). This could be explained by the extra G residue at the 5′-terminus of each sgRNA in the truncation series (Additional file 1: Figure S14). With the NTS1C GN 19 sgRNA, both the 5′-terminal G residue and the adjacent C residue are mismatched with the NTS1C-OT1 site. In contrast, with the GN 18 sgRNA, the 5′-terminal G is complementary to the off-target site. In other words, with the NTS1C GN 19 and GN 18 sgRNAs, the NTS1C-OT1 off-target interactions (which are identical in the PAM-proximal 17 nucleotides) include two additional nucleotides of non-complementarity or one additional nucleotide of complementarity, respectively. Thus, the more extensively truncated GN 18 sgRNA has greater complementarity with the NTS1C-OT1 site than the GN 19 sgRNA, explaining the re-emergence of off-target editing with the former. This observation highlights the fact that the inclusion of a 5′-terminal G residue that is mismatched with the on-target site, but that is complementary to a C residue at an off-target site, can limit the effectiveness of a truncated guide at suppressing off-target editing, necessitating care in truncated sgRNA design when the sgRNA is generated by cellular transcription. This issue is not a concern with sgRNAs that are generated by other means (e.g., chemical synthesis) that do not require a 5′-terminal G. Overall, our results demonstrate that NmeCas9 genome editing is exceptionally precise, and even when rare off-target editing events occur, tru-sgRNAs can provide a simple and effective way to suppress them.

Discussion
The ability to use type II and type V CRISPR-Cas systems as RNA-programmable DNA-cleaving systems [13,14,27] is revolutionizing many aspects of the life sciences and holds similar promise for biotechnological, agricultural, and clinical applications. Most applications reported thus far have used a single Cas9 ortholog (Spy-Cas9). Thousands of additional Cas9 orthologs have also been identified [28], but only a few have been characterized, validated for genome engineering applications, or both. The development of additional orthologs promises to increase the number of targetable sites (through new PAM specificities), extend multiplexing possibilities (for pairwise combinations of Cas9 orthologs with orthogonal guides), and improve deliverability (for the more compact Cas9 orthologs). In addition, some Cas9s may show mechanistic distinctions (such as staggered vs. blunt dsDNA breaks) [80], greater protein stability in vivo, improved control mechanisms (e.g., via multiple anti-CRISPRs that act at various stages of the DNA cleavage pathway) [57,58,60,61,[81][82][83], and other enhancements. Finally, some may exhibit a greater natural propensity to distinguish between on-vs. off-target sites during genome editing applications, obviating the need Fig. 7 Guide truncation can suppress off-target editing by NmeCas9. a Editing efficiencies at the NTS1C (on-target, red) and NTS1C-OT1 (off-target, orange) genomic sites, after editing by NmeCas9 and NTS1C sgRNAs of varying lengths, as measured by PCR and high-throughput sequencing. Data are mean values ± s.e.m. from three biological replicates performed on different days. b As in (a), but using sgRNAs perfectly complementary to the NTS1C-OT1 genomic site. Two-tailed paired Student's T test showed significant difference in on-and off-target editing efficiency as a function of guide truncation. On-target editing (black asterisk) or off-target editing (red asterisk) is compared to the baseline condition GN 24 , respectively (p < 0.05 is annotated with one asterisk; p < 0.01 is annotated with two asterisks) for extensive engineering (as was necessary with SpyCas9) to attain the accuracy needed for many applications, especially therapeutic development.
Here we have further defined the properties of NmeCas9 during editing in human cells, including validation and extension of previous analyses of guide length and PAM requirements [46,53,54]. Intriguingly, the tolerance to deviations from the N 4 G(A/C)TT natural PAM consensus [51] observed in vitro and in bacterial cells [46,52] is considerably reduced in the mammalian context, i.e., fewer PAM variations are permitted during mammalian editing. The basis for this context-dependent difference is not clear but may be due in part to the ability to access targets within eukaryotic chromatin, or to the larger numbers of PAMs that must be scanned (due to the much larger genome sizes in mammalian cells), since lower SpyCas9/sgRNA concentrations (relative to potential DNA substrates) have been shown to improve accuracy [30,84,85]. We have also found that steady-state NmeCas9 levels in human cells are markedly increased in the presence of its cognate sgRNA, suggesting that sgRNA-loaded NmeCas9 is more stable than apo NmeCas9. An increased proteolytic sensitivity of apo Cas9 relative to the sgRNA-bound form has been noted previously for a different type II-C ortholog (CdiCas9 [55]).
A previous report indicated that NmeCas9 has high intrinsic accuracy, based on analyses of candidate off-target sites that were predicted bioinformatically [54]. However, the true genome-wide accuracy of NmeCas9 was not assessed empirically, as is necessary given well-established imperfections in bioinformatic predictions of off-targeting [31,34,35]. We have used GUIDE-seq [63] and SITE-Seq [64] to define the genome-wide accuracy of wildtype NmeCas9, including side-by-side comparisons with wildtype SpyCas9 during editing of identical on-target sites. We find that NmeCas9 is a consistently high-accuracy genome editor, with off-target editing undetectable above background with 17 out of 19 analyzed sgRNAs, and only one or three verified off-target edits with the remaining two guides. We observed this exquisite specificity by NmeCas9 even with sgRNAs that target sites (DTS3 and DTS7 (see Fig. 4)) that are highly prone to off-target editing when targeted with SpyCas9. Of the four off-target sites that we validated, three accumulated < 1% indels. Even with the one sgRNA that yielded a significant frequency of off-target editing (NTS1C, which induced indels at NTS1C-OT1 with approximately half the efficiency of on-target editing), the off-targeting with wildtype NmeCas9 could be easily suppressed with truncated sgRNAs. Our ability to detect NTS25-OT1 editing with GUIDE-seq, despite its very low (0.06%) editing efficiency based on high-throughput sequencing, indicates that our GUIDE-seq experiments can identify even very low-efficiency off-target editing sites.
Similar considerations apply to our SITE-Seq analyses. We observed high accuracy even when NmeCas9 is delivered by plasmid transfection, a delivery method that is associated with higher off-target editing than more transient delivery modes such as RNP delivery [86,87]. Low off-target activity even with sustained effector expression could be particularly useful for AAV delivery, as recently validated for NmeCas9 [88].
The two type II-C Cas9 orthologs (NmeCas9 and CjeCas9) that have been validated for mammalian genome editing and assessed for genome-wide specificity [47,54] (this work) and both have proven to be naturally hyper-accurate. Both use longer guide sequences than the 20-nucleotide guides employed by SpyCas9, and both also have longer and more restrictive PAM requirements. For both type II-C orthologs, it is not yet known whether the longer PAMs, longer guides, or both account for the limited off-target editing. Type II-C Cas9 orthologs generally cleave dsDNA more slowly than Spy-Cas9 [49,55], and it has been noted that lowering k cat can, in some circumstances, enhance specificity [89]. In keeping with the generally lower enzymatic activity, a recent report found NmeCas9 to exhibit consistently lower cellular editing activity than the two type II-A Cas9s tested (SpyCas9 and SauCas9), though most tests in that report were done with suboptimal guide lengths for NmeCas9 [90]. Whatever the mechanistic basis for the high intrinsic accuracy, it is noteworthy that it is a property of the native proteins, without a requirement for extensive engineering. This adds to the motivation to identify more Cas9 orthologs with human genome-editing activity, as it suggests that it may be unnecessary in many cases (perhaps especially among type II-C enzymes) to invest heavily in structural and mechanistic analyses and engineering efforts to attain sufficient accuracy for many applications and with many desired guides, as was done with (for example) SpyCas9 [32,33,37,38,65]. Although Cas9 orthologs with more restrictive PAM requirements (such as NmeCas9, CjeCas9, and GeoCas9) by definition will afford lower densities of potential target sites than SpyCas9 (which also usually affords the highest on-target editing efficiencies among established Cas9 orthologs), the combined targeting possibilities for multiple such Cas9s will increase the targeting options available within a desired sequence window, with little propensity for off-targeting. The continued exploration of natural Cas9 variation, especially for those orthologs with other advantages such as small size and anti-CRISPR off-switch control, therefore has great potential to advance the CRISPR genome editing revolution.

Conclusions
NmeCas9 is an intrinsically high-accuracy genome-editing enzyme in mammalian cells, and the limited off-target editing that occurs can (at least in some cases) be suppressed by guide truncation. Continued exploration of Cas9 orthologs could therefore yield additional enzymes that do not require extensive characterization and engineering to prevent off-target editing.

Plasmids
Two plasmids for the expression of NmeCas9 were used in this study. The first construct (used in Figs. 1, 2, 6 and 7) was derived from the plasmid pSimpleII where NmeCas9 was cloned under the control of the elongation factor-1α promoter, as described previously [53]. The Cas9 gene in this construct expresses a protein with two NLSs and an HA tag. To make an all-in-one expression plasmid, a fragment containing a BsmBI-crRNA cassette linked to the tracrRNA by six nucleotides, under the control of U6 RNA polymerase III promoter, was synthesized as a gene block (Integrated DNA Technologies) and inserted into pSimpleII, generating the pSimpleII-Cas9-sgRNA-BsmBI plasmid that includes all elements needed for editing (Addgene #115694). To insert specific spacer sequence into the crRNA cassette, synthetic oligonucleotides were annealed to generate a duplex with overhangs compatible with those generated by BsmBI digestion of the pSimpleII-Cas9-sgRNA-BsmBI plasmid. The insert was then ligated into the BsmBI-digested plasmid. For Figs. 3 and 4, NmeCas9 (Addgene #87448) and SpyCas9 (Addgene #69220) constructs were expressed from the pCS2-Dest Gateway plasmid under the control of the CMV IE94 promoter [91]. All sgRNAs used with pCS2-Dest-Cas9 were driven by the U6 promoter in pLKO.1-puro (Addgene #52628 and #86195) [62]. The M427 GFP reporter plasmid [66] was used as described [65].

Western blotting
Forty-eight hours after transfection, cells were harvested and lysed with 50 μl of RIPA buffer. Protein concentration was determined with the BCA kit (Thermo Scientific), and 12 μg of proteins were used for electrophoresis and blotting. The blots were probed with anti-HA (Sigma, H3663) and anti-GAPDH (Abcam, ab9485) as primary antibodies, and then with horseradish peroxidase-conjugated anti-mouse IgG (Thermoscientific, 62-6520) or anti-rabbit IgG (Biorad, 1706515) secondary antibodies, respectively. Blots were visualized using the Clarity Western ECL substrate (Biorad, 170-5060).

Flow cytometry
The GFP reporter was used as described previously [65]. Briefly, cells were harvested 48 h after transfection and used for FACS analysis (BD Accuri 6C). To minimize the effects of differences in the efficiency of transfection among samples, cells were initially gated for mCherry-expression, and the percentage of GFP-expressing cells were quantified within mCherry positive cells. All experiments were performed in triplicate with data reported as mean values with error bars indicating the standard error of the mean (s.e.m.).

Genome editing
Seventy-two hours after transfection, genomic DNA was extracted via the DNeasy Blood and Tissue kit (Qiagen), according to the manufacturer's protocol. Fifty-nanogram DNA was used for PCR amplification using primers specific for each genomic site (Additional file 5: Table  S9) with High Fidelity 2X PCR Master Mix (New England Biolabs). For T7E1 analysis, 10 μl of PCR product was hybridized and treated with 0.5 μl T7 Endonuclease I (10 U/μl, New England Biolabs) in 1X NEB Buffer 2 for 1 h. Samples were run on a 2.5% agarose gel, stained with SYBR-safe (ThermoFisher Scientific), and quantified using the ImageMaster-TotalLab program. Indel percentages are calculated as previously described [93,94]. Experiments for T7E1 analysis are performed in triplicate with data reported as mean ± s.e.m. For indel analysis by TIDE, 20 ng of PCR product is purified and then sequenced by Sanger sequencing. The trace files were subjected to analysis using the TIDE web tool (https://tide.deskgen.com).

Expression and purification of NmeCas9
NmeCas9 was cloned into the pMCSG7 vector containing a T7 promoter followed by a 6XHis tag and a tobacco etch virus (TEV) protease cleavage site. Two NLSs on the C-terminus of NmeCas9 and another NLS on the N-terminus were also incorporated (Addgene #120078). This construct was transformed into the Rosetta 2 DE3 strain of E. coli. Expression of NmeCas9 was performed as previously described for SpyCas9 [14]. Briefly, a bacterial culture was grown at 37°C until an OD600 of 0.6 was reached. At this point, the temperature was lowered to 18°C followed by addition of 1 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG) to induce protein expression. Cells were grown overnight and then harvested for purification. Purification of NmeCas9 was performed in three steps: Nickel affinity chromatography, cation exchange chromatography, and size exclusion chromatography. The detailed protocols for these can be found in [14].

RNP delivery of NmeCas9
RNP delivery of NmeCas9 was performed using the Neon transfection system (ThermoFisher). Approximately 40 picomoles of NmeCas9 and 50 picomoles of sgRNA (transcribed from PCR-assembled templates in vitro with T7 RNA polymerase) were mixed in buffer R and incubated at room temperature for 30 min. This preassembled complex was then mixed with 50,000-150,000 cells, and electroporated using 10 μL Neon tips. After electroporation, cells were plated in prewarmed 24-well plates containing the appropriate culture media without antibiotics. The number of cells used and pulse parameters of electroporation were different for different cell types tested. The number of cells used was 50,000, 100,000, and 150,000 for PLB985 cells, HEK293T cells, and K562/HFF cells respectively. Electroporation parameters (voltage, width, number of pulses) were 1150 v, 20 ms, 2 pulses for HEK293T cells; 1000 v, 50 ms, 1 pulse for K562 cells; 1350 v, 35 ms, 1 pulse for PLB985 cells; and 1700 v, 20 ms, 1pulse for HFF cells.
γH2AX immunofluorescence staining and flow cytometry For immunofluorescence, mouse embryonic stem cells (mESCs) were crosslinked with 4% paraformaldehyde and stained with anti-γH2AX (LP BIO, AR-0149-200) as primary antibody and Alexa Fluor® 488 goat anti-rabbit IgG (Invitrogen, A11034) as secondary antibody. DNA was stained with DAPI. For a positive control, E14 cells were irradiated with 254 nm UV light (3 mJ/cm 2 ). Images were taken by a Nikon Eclipse E400 and representative examples were chosen.
For flow cytometry, cells were fixed with 70% ethanol, primary and secondary antibody were as described above for immunofluorescence, and DNA was stained with propidium iodide. Cells were analyzed by BD FACSCalibur. The box plot was presented with the bottom line of the box representing the first quartile, the band inside box indicating the median, the top line being the third quartile, the bottom end of whisker denoting data of first quartile minus 1.5 times of interquartile range (no less than 0), and the top end of the whisker indicating data of third quartile plus 1.5 times of interquartile. Outliers are not shown. All experiments were performed in duplicate. For genotoxicity assay in HEK293T cells, we treated positive control cells with 5 mJ/cm 2 UV irradiation. We used the same protocol described above for antibody staining. Cells were analyzed by MACS-Quant flow cytometer. Two independent experiments were performed on different days.
CRISPRseek analysis of potential off-target sites Global off-target analyses for DTS3, DTS7, and DTS8 with NmeCas9 sgRNAs were performed using the Bioconductor package CRISPRseek 1.9.1 [72] with parameter settings tailored for NmeCas9. Specifically, all parameters are set as default except the following: gRNA.size = 24, PAM = "NNNNGATT," PAM.size = 8, RNA.PAM.pattern = "NNNNGNNN$" (or "NNNNNGNNN$" for slipped PAM prediction), weights = c(0, 0, 0, 0, 0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), max.mismatch = 6, allowed.mismatch.PAM = 7, topN = 10,000, min.score = 0. This setting means that all seven permissive PAM sequences (N 4 GATT, N 4 GCTT, N 4 GTTT, N 4 GACA, N 4 GACT, N 4 GATA, N 4 GTCT) were allowed and all off-targets with up to 6 mismatches were collected [the sgRNA length was changed from 20 to 24; four additional zeros were added to the beginning of the weights series to be consistent with the gRNA length of 24; and topN (the number of off-target sites displayed) and min.score (the minimum score of an off-target to be included in the output) were modified to enable identification of all off-target sites with up to 6 mismatches]. Predicted off-target sites for DTS3, DTS7, and DTS8 with SpyCas9 sgRNAs were obtained using CRISPRseek 1.9.1 default settings for SpyCas9 (with NGG, NAG, and NGA PAMs allowed). Batch scripts for high-performance computing running the IBM LSF scheduling software are included in the supplemental section. Off-target sites were binned according to the number of mismatches relative to the on-target sequence. The numbers of off-targets for each sgRNA were counted and plotted.

SITE-Seq
We performed the SITE-Seq assay as described previously [64]. In 50-mL conical tubes, high molecular weight genomic DNA (gDNA) was extracted from HEK293T cells using the Blood and Cell Culture DNA Maxi Kit (Qiagen) according to the manufacturer's protocol. sgRNAs for both NmeCas9 and SpyCas9 RNP assembly were transcribed from PCR-assembled DNA templates containing T7 promoters. Oligo sequences used in DNA template assembly can be found in Additional file 8: Table S8. PCR reactions were performed using Q5 Hot Start High-Fidelity 2X Master Mix (New England Biolabs) with the following thermal cycling conditions: 98°C for 2 min, 30 cycles of 20 s at 98°C, 20 s at 52°C, 15 s at 72°C, and a final extension at 72°C for 2 min. sgRNAs were in vitro transcribed using the HiScribe T7 High Yield RNA Synthesis Kit (New England Biolabs) according to the manufacturer's protocol. Transcription reactions were digested with 2 units RNase-free DNase I (New England Biolabs) at 37°C for 10 min; the reaction was stopped by adding EDTA to a final concentration of 35 mM and incubating at 75°C for 10 min. All guides were purified with RNA-Clean beads (Beckman Coulter) and quantified with the Quant-IT Ribogreen RNA Assay kit (ThermoFisher) according to the manufacturers' protocols. Individual RNPs were prepared by incubating each sgRNA at 95°C for 2 min, then allowed to slowly come to room temperature over 5 min. Each sgRNA was then combined with its respective Cas9 in a 3:1 sgRNA:Cas9 molar ratio and incubated at 37°C for 10 min in cleavage reaction buffer (20 mM HEPES, pH 7.4, 150 mM KCl, 10 mM MgCl 2 , 5% glycerol). In 96-well format, 10 μg of gDNA was treated with 0.2 pmol, 0.8 pmol, 3.2 pmol, and 12.8 pmol of each RNP in 50 μL total volume in cleavage reaction buffer, in triplicate. Negative control reactions were assembled in parallel and did not include any RNP. gDNA was treated with RNPs for 4 h at 37°C. Library preparation and sequencing were done according to protocols described previously [64] using the Illumina NextSeq platform, and~3 million reads were obtained for each sample. Any SITE-Seq sites without off-target motifs located within 1 nt of the cut-site were considered false-positives and discarded.

Targeted deep sequencing analysis
To measure indel frequencies, targeted deep sequencing analyses were done as previously described [65]. Briefly, we used two-step PCR amplification to produce DNA fragments for each on-target and off-target site. In the first step, we used locus-specific primers bearing universal overhangs with complementary ends to the TruSeq adaptor sequences (Additional file 9: Table S7). DNA was amplified with Phusion High Fidelity DNA Polymerase (New England Biolabs) using annealing temperatures of 60°C, 64°C or 68°C, depending on the primer pair. In the second step, the purified PCR products were amplified with a universal forward primer and an indexed reverse primer to reconstitute the TruSeq adaptors (Additional file 9: Table S7). Input DNA was PCR-amplified with Phusion High Fidelity DNA Polymerase (98°C, 15 s; 61°C, 25 s; 72°C, 18 s; 9 cycles), and equal amounts of the products from each treatment group were mixed and run on a 2.5% agarose gel. Full-size products (~250 bp in length) were gel-extracted. The purified library was deep sequenced using a paired-end 150 bp MiSeq run.
MiSeq data analysis was performed using a suite of Unix-based software tools. First, the quality of paired-end sequencing reads (R1 and R2 fastq files) was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc/). Raw paired-end reads were combined using paired end read merger (PEAR) [95] to generate single merged high-quality full-length reads. Reads were then filtered by quality [using Filter FASTQ [96]] to remove those with a mean PHRED quality score under 30 and a minimum per base score under 24. Each group of reads was then aligned to a corresponding reference sequence using BWA (version 0.7.5) and SAMtools (version 0. 1.19).
To determine indel frequency, size, and distribution, all edited reads from each experimental replicate were combined and aligned, as described above. Indel types and frequencies were then cataloged in a text output format at each base using bam-readcount (https://github.com/ genome/bam-readcount). For each treatment group, the average background indel frequencies (based on indel type, position, and frequency) of the triplicate negative control group were subtracted to obtain the nucleasedependent indel frequencies. Indels at each base were marked, summarized, and plotted using GraphPad Prism. Deep sequencing data and the results of statistical tests are reported in Additional file 10: Table S3.
SITE-Seq cell-based validation was performed as previously described with minor modifications [64]. In brief, SITE-Seq sites were amplified from~1000 to 4000 template copies per replicate and sequencing data from Cas9-treated samples were combined to minimize any variability due to uneven coverage across replicates. Cas9 cleavage sites were registered from the SITE-Seq data, and mutant reads were defined as any non-reference variant calls within 20 bp of the cut site. Sites with low sequencing coverage (< 1000 reads in the combined, Cas9-treated samples or < 200 reads in the reference samples) or > 2% variant calls in the reference samples were discarded. Sites were tallied as cellular off-targets if they accumulated > 0.5% mutant reads in the combined, Cas9-treated samples. This threshold corresponded to sites that showed unambiguous editing when DNA repair patterns were visually inspected.

Additional files
Additional file 1: Figure S1. (A) Sequences of sgRNAs associated with their respective DNA targets used in Fig. 1b. (B) Schematic representation of the split-GFP reporter system used in Figs. 1b, 2a, and 6a. Figure S2. NmeCas9 does not lead to an increase in γH2AX levels in mouse ESC or human HEK293T cells. Figure S3. SpyCas9 and NmeCas9 exhibit similar editing activity when targeting the AAVS1 locus. Figure S4. NmeCas9 and SpyCas9 editing efficiencies at dual target sites (DTSs) that can be cleaved by both enzymes. Figure S5. SITE-Seq read counts and heat map of the frequencies of indels by length at DTS3, DTS7, and DTS8 for SpyCas9 and NmeCas9. Figure S6. Positions of insertions and deletions at the DTS3, DTS7, and DTS8 sites. Figure S7. Sizes of insertions and deletions at the DTS3, DTS7, and DTS8 sites. Figure S8. Fractions of all mutations that are insertions or deletions at the DTS3, DTS7, and DTS8 sites for SpyCas9 and NmeCas9. Figure S9. As in Figure S6. Figure S10. As in Figure S7. Figure S11. GUIDE-seq read counts at NmeCas9 onand off-target sites. Figure S12. Guide truncation series for Protospacer 9, NTS33 and NTS32. Figure S13. GUIDE-seq analysis to determine whether NmeCas9 sgRNA truncation leads to de novo off-target events. Figure S14. NmeCas9 guide length requirements at the NTS1C and NTS1C-OT1 editing sites. (PDF 5232 kb) analyses using NmeCas9 protein supplied by AM. XDG, PL, AM, LJZ, and SAW carried out additional bioinformatic and statistical analyses. All authors analyzed and interpreted data. NA, XDG, PC, and EJS wrote the manuscript, and all authors edited the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.
Competing interests E.J.S. is a co-founder and scientific advisor of Intellia Therapeutics. P.D.D, A.H.S, A.M.L, K.M, C.K.F, and P.C are current or former employees of Caribou Biosciences, Inc., a company that develops and commercializes genome engineering technologies, and such individuals may own shares or stock options in Caribou Biosciences. SITE-Seq is a trademark of Caribou Biosciences, Inc. The other authors declare that they have no competing interests.