NmeCas9 is an intrinsically high-fidelity genome editing platform [preprint]

The development of CRISPR-Cas9 RNA-guided genome editing has transformed biomedical research. Most applications reported thus far rely upon the Cas9 protein from Streptococcus pyogenes SF370 (SpyCas9). With many RNA guides, SpyCas9 can induce significant levels of unintended mutations at near-cognate sites, necessitating substantial efforts toward the development of strategies to minimize off-target activity. Although the genome-editing potential of thousands of other Cas9 orthologs remains largely untapped, it is not known how many will require similarly extensive engineering efforts to achieve single-site accuracy within large (e.g. mammalian) genomes. In addition to its off-targeting propensity, SpyCas9 is encoded by a relatively large (∼4.2 kb) open reading frame (ORF), limiting its utility in applications that require size-restricted delivery strategies such as adeno-associated virus (AAV) vectors. In contrast, some genome-editing-validated Cas9 orthologs [e.g. Staphylococcus aureus Cas9 (SauCas9), Campylobacter jejuni Cas9 (CjeCas9), and Neisseria meningitidis Cas9 (NmeCas9)] are considerably smaller and therefore better suited for viral delivery. Here we show that wild-type NmeCas9, when programmed with guide sequences of natural length (24 nucleotides), exhibits a nearly complete absence of unintended targeting in human cells, even when targeting sites that are highly prone to off-target activity when employing SpyCas9. We also validate at least six variant protospacer adjacent motifs (PAMs), in addition to the preferred consensus PAM (5’-N4GATT-3’), for NmeCas9 genome editing in human cells. Our results show that NmeCas9 is a naturally high-fidelity genome editing enzyme, and suggest that additional Cas9 orthologs may prove to exhibit similarly high accuracy, even without extensive engineering efforts.

given genome. Many pair-wise Cas9 combinations also have orthogonal guides that load into one 123 ortholog but not the other, facilitating multiplexed applications [44][45][46]. Finally, some Cas9 orthologs 124 (especially those from subtype II-C) are hundreds of amino acids smaller than the 1,368 amino acid 125 SpyCas9 [7,43,44], and are therefore more amenable to combined Cas9/sgRNA delivery via a single 126 size-restricted vector such as adeno-associated virus (AAV) [47,48]. Finally, there may be Cas9 orthologs 127 that exhibit additional advantages such as greater efficiency, natural hyper-accuracy, distinct activities, 128 reduced immunogenicity, or novel means of control over editing. Deeper exploration of the Cas9 129 population could therefore enable expanded or improved genome engineering capabilities. 130 We have used N. meningitidis (strain 8013) as a model system for the interference functions and 131 mechanisms of Type II-C CRISPR-Cas systems [49][50][51][52]. In addition, we and others previously reported 132 that the Type II-C Cas9 ortholog from N. meningitidis (NmeCas9) can be applied as a genome engineering 133 platform [46,53,54]. At 1,082 amino acids, NmeCas9 is 286 residues smaller than SpyCas9, making it 134 nearly as compact as SauCas9 (1,053 amino acids) and well within range of all-in-one AAV delivery. Its 135 spacer-derived guide sequences are longer (24 nts) than those of most other Cas9 orthologs [51], and like 136 SpyCas9, it cleaves both DNA strands between the third and fourth nucleotides of the protospacer 137 (counting from the PAM-proximal end). NmeCas9 also has a longer PAM consensus (5'-N4GATT-3', 138 after the 3' end of the protospacer's crRNA-noncomplementary strand) [44, 46, 51-54], leading to a lower 139 density of targetable sites compared to SpyCas9. Considerable variation from this consensus is permitted 140 during bacterial interference [46,52], and a smaller number of variant PAMs can also support targeting 141 in mammalian cells [53,54]. Unlike SpyCas9, NmeCas9 has been found to cleave the DNA strand of 142 RNA-DNA hybrid duplexes in a PAM-independent fashion [52,55], and can also catalyze PAM-143 independent, spacer-directed cleavage of RNA [56]. Recently, natural Cas9 inhibitors (encoded by 144 bacterial mobile elements) have been identified and validated in N. meningitidis and other bacteria with 145 type II-C systems, providing for genetically encodable off-switches for NmeCas9 genome editing [57,58]. 146 These "anti-CRISPR" (Acr) proteins [59] enable temporal, spatial, or conditional control over the 147 NmeCas9 system. Natural inhibitors of Type II-A systems have also been discovered in Listeria 148 monocytogenes [60] and Streptococcus thermophilus [61], some of which are effective at inhibiting SpyCas9. 149 The longer PAM consensus, longer guide sequence, or enzymological properties of NmeCas9 150 could result in a reduced propensity for off-targeting, and targeted deep sequencing at bioinformatically 151 predicted near-cognate sites is consistent with this possibility [54]. A high degree of genome-wide 152 specificity has also been noted for the dNmeCas9 platform [62]. However, the true, unbiased accuracy of 153 NmeCas9 is not known, since empirical assessments of genome-wide off-target editing activity 154 (independent of bioinformatics prediction) have not been reported for this ortholog. Here we define and 155 confirm many of the parameters of NmeCas9 editing activity in mammalian cells including PAM 156 sequence preferences, guide length limitations, and off-target profiles. Most notably, we use two empirical 157 approaches (GUIDE-seq [63] and SITE-Seq [64] to define NmeCas9 off-target profiles and find that 158 wild-type NmeCas9 is a high-fidelity genome editing platform in mammalian cells, with far lower levels of 159 off-targeting than wild-type SpyCas9. These results further validate NmeCas9 as a genome engineering 160 platform, and suggest that continued exploration of Cas9 orthologs could identify additional RNA-guided 161 nucleases that exhibit favorable properties, even without the extensive engineering efforts that have been 162 applied to SpyCas9 [31,34,35]. 163

Co-expressed sgRNA increases NmeCas9 accumulation in mammalian cells 167
Previously we demonstrated that NmeCas9 (derived from N. meningitidis strain 8013 [51]) can 168 efficiently edit chromosomal loci in human stem cells using either dual RNAs (crRNA + tracrRNA) or a 169 sgRNA [53]. To further define the efficacy and requirements of NmeCas9 in mammalian cells, we first 170 constructed an all-in-one plasmid (pEJS15) that delivers both NmeCas9 protein and a sgRNA in a single 171 transfection vector, similar to our previous all-in-one dual-RNA plasmid (pSimple-Cas9-Tracr-crRNA; 172 Addgene #47868) [53]. The pEJS15 plasmid expresses NmeCas9 fused to a C-terminal single-HA epitope 173 tag and nuclear localization signal (NLS) sequences at both N-and C-termini under the control of the 174 elongation factor-1α (EF1α) promoter. The sgRNA cassette (driven by the U6 promoter) includes two 175 BsmBI restriction sites that are used to clone a spacer of interest from short, synthetic oligonucleotide 176 duplexes. First, we cloned three different bacterial spacers (spacers 9, 24 and 25) from the endogenous N. 177 meningitidis CRISPR locus (strain 8013) [51, 52] to express sgRNAs that target protospacer (ps) 9, ps24 or 178 ps25, respectively (Supplemental Fig. 1A). None of these protospacers have cognate targets in the human 179 genome. We also cloned a spacer sequence to target an endogenous genomic NmeCas9 target site (NTS) 180 from chromosome 10 that we called N-TS3 (Table 1). Two of the resulting all-in-one plasmids 181 Rosa26 GGCAGAUCACGAGGGAAGAGGGGG AGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCTC blot (Fig. 1A). As a positive control we also included a sample transfected with a SpyCas9-expressing 189 plasmid (triple-HA epitope-tagged, and driven by the cytomegalovirus (CMV) promoter) [ Figure 1B shows that all three natural protospacers of NmeCas9 can be 205 edited in human cells and the efficiency of GFP induction was comparable to that observed with SpyCas9 206 (Fig. 1B). 207 Next, we reprogrammed NmeCas9 by replacing the bacterially-derived spacers with a series of 208 spacers designed to target eleven human chromosomal sites with an N4GATT PAM (Table 1). These 209 sgRNAs induced insertion/deletion (indel) mutations at all sites tested, except NTS10 (Fig. 1C, lanes 23-210 25), as determined by T7 Endonuclease 1 (T7E1) digestion (Fig. 1C). The editing efficiencies ranged from 211 5% for NTS1B site to 47% in the case of NTS33 (Fig. 1D) (Fig. 1E). In addition, mouse embryonic stem cells (mESCs) and HEK293T cells 218 were transduced with a lentivirus construct expressing NmeCas9. In these cells, transient transfection of 219 plasmids expressing a sgRNA led to genome editing (Fig. 1E) Fig. 1B). All designed guides started with two guanine 231 nucleotides (resulting in 1-2 positions of target non-complementarity at the very 5' end of the guide) to 232 facilitate transcription and to test the effects of extra 5'-terminal G residues, analogous to the SpyCas9 233 "GGN20" sgRNAs [68]. We then measured the abilities of these sgRNAs to direct NmeCas9 cleavage of 234 the reporter in human cells. sgRNAs that have 20-23 nucleotides of target complementarity showed 235 activities comparable to the sgRNA with the natural 24 nucleotides of complementarity, whereas sgRNAs 236 containing 18 or 19 nucleotides of complementarity show lower activity ( Fig. 2A). 237 We next used a native chromosomal target site (NTS33 in VEGFA, as in Figs. 1C and 1D) to test 238 the editing efficiency of NmeCas9 spacers of varying lengths (Supplemental Fig. 1C). sgRNA constructs 239 included one or two 5'-terminal guanine residues to enable transcription by the U6 promoter, sometimes 240 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online  resulting in 1-2 nucleotides of target non-complementarity at the 5' end of the guide sequence. sgRNAs 241 with 20, 21, or 22 nucleotides of target complementarity (GGN18, GGN19, and GGN20, respectively) 242 performed comparably to the natural guide length (24 nucleotides of complementarity, GN23) at this site 243 ( Fig. 2B-C), and within this range, the addition of 1-2 unpaired G residues at the 5' end had no adverse 244 effect. These results are consistent with the results obtained with the GFP reporter ( Fig. 2A). sgRNAs with 245 guide lengths of 19 nucleotides or shorter, along with a single mismatch in the first or second position 246 (GGN17, GGN16, and GGN15), did not direct detectable editing, nor did a sgRNA with perfectly matched 247 guide sequences of 17 or 14 nucleotides (GN16 and GN13, respectively) ( Fig. 2B-C). However, a 19-nt 248 guide with no mismatches (GN18) successfully directed editing, albeit with slightly reduced efficiency. 249 These results indicate that 19-26-nt guides can be tolerated by NmeCas9, but that activity can be 250 compromised by guide truncations from the natural length of 24 nucleotides down to 17-18 nucleotides 251 and smaller, and that single mismatches (even at or near the 5'-terminus of the guide) can be 252 discriminated against with a 19-nt guide. 253 The target sites tested in Figs ; also see below] has also been reported. To examine length dependence at a site with 256 a variant PAM, we varied guide sequence length at the N4GCTT-associated NTS32 site (also in VEGFA). 257 In this experiment, each of the guides had two 5'-terminal G residues, accompanied by 1-2 terminal 258 mismatches with the target sequence (Supplemental Fig. 1D). At the NTS32 site, sgRNAs with 21-24 259 nucleotides of complementarity (GGN24, GGN23, GGN22, and GGN21) supported editing, but shorter 260 guides (GGN20, GGN19, and GGN18) did not ( Fig. 2D-E). We conclude that sgRNAs with 20 nucleotides 261 of complementarity can direct editing at some sites ( this purpose as R-loops have been extensively studied in these cells and have been shown to be important 304 for differentiation [74]. We performed γH2AX staining of these two cell lines and compared them to 305 wildtype E14 cells. As a positive control for γH2AX induction, we exposed wildtype E14 cells to UV, a 306 known stimulator of the global DNA damage response. Immunofluorescence microscopy of cells 307 expressing NmeCas9 or dNmeCas9 exhibited no increase in γH2AX foci compared to wildtype E14, 308 suggesting that sustained NmeCas9 expression is not genotoxic (Supplemental Fig. 2A). In contrast, cells 309 exposed to UV light showed a significant increase in γH2AX levels. Flow cytometric measurements of 310 γH2AX immunostaining confirmed these results (Supplemental Fig. 2B). These data suggest that 311 NmeCas9 expression does not lead to a global DNA damage response in mESCs. 312 313

Comparative analysis of NmeCas9 and SpyCas9 314
SpyCas9 is by far the best-characterized Cas9 orthologue, and is therefore the most informative 315 benchmark when defining the efficiency and accuracy of other Cas9s. To facilitate comparative 316 experiments between NmeCas9 and SpyCas9, we developed a matched Cas9 + sgRNA expression system 317 for the two orthologs. This serves to minimize the expression differences between the two Cas9s in our 318 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online  comparative experiments, beyond those differences dictated by the sequence variations between the 319 orthologues themselves. To this end, we employed the separate pCSDest2-SpyCas9-NLS-3XHA-NLS 320 (Addgene #69220) and pLKO.1-puro-U6sgRNA-BfuA1 (Addgene #52628) plasmids reported previously 321 for the expression of SpyCas9 (driven by the CMV promoter) and its sgRNA (driven by the U6 promoter), To provide a direct comparison of editing efficiency between the SpyCas9 and NmeCas9 systems, 343 we took advantage of the non-overlapping PAMs of SpyCas9 and NmeCas9 (NGG and N4GATT, 344 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; respectively). Because the optimal SpyCas9 and NmeCas9 PAMs are non-overlapping, it is simple to 345 identify chromosomal target sites that are compatible with both orthologues, i.e. that are dual target sites 346 (DTSs) with a composite PAM sequence of NGGNGATT that is preferred by both nucleases. In this 347 sequence context, both Cas9s will cleave the exact same internucleotide bond (NN/NNNNGGNGATT; 348 cleaved junction in bold, and PAM region underlined), and both Cas9s will have to contend with the 349 exact same sequence and chromatin structural context. Furthermore, if the target site contains a G residue 350 at position -24 of the sgRNA-noncomplementary strand (relative to the PAM) and another at position -20, 351 then the U6 promoter can be used to express perfectly-matched sgRNAs for both Cas9 orthologues. Four 352 DTSs with these characteristics were used in this comparison (Supplemental Fig. 4A). We had previously 353 used NmeCas9 to target a site (NTS7) that happened also to match the SpyCas9 PAM consensus, so we 354 included it in our comparative analysis as a fifth site, even though it has a predicted rG-dT wobble pair at 355 position -24 for the NmeCas9 sgRNA (Supplemental Fig. 4A). 356 We set out next to compare the editing activities of both Cas9 orthologs programmed to target the 357 five chromosomal sites depicted in Supplemental Fig. 4A, initially via T7E1 digestion. SpyCas9 was more 358 efficient than NmeCas9 at generating lesions at the DTS1 and DTS8 sites (Fig. 4C, lanes 1-2 and 13-14). 359 In contrast, NmeCas9 was more efficient than SpyCas9 at the DTS3 and NTS7 sites ( NmeCas9 editing efficiencies that are greater than, equal to, or lower than those of SpyCas9, respectively. 364 At all three of these sites, the addition of an extra 5'-terminal G residue had little to no effect on editing by 365 either SpyCas9 or NmeCas9 (Supplemental Fig. 4B). Truncation of the three NmeCas9 guides down to 366 20 nucleotides (all perfectly matched) again had differential effects on editing efficiency from one site to 367 the next, with no reduction in DTS7 editing, partial reduction in DTS3 editing, and complete loss of 368 DTS8 editing (Supplemental Fig. 4B). 369 370 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; Assessing the genome-wide precision of NmeCas9 editing 371 All Cas9 orthologs described to date have some propensity to edit off-target sites lacking perfect 372 complementarity to the programmed guide RNA, and considerable effort has been devoted to developing 373 strategies (mostly with SpyCas9) to increase editing specificity (reviewed in [31,34,35]). In comparison 374 with SpyCas9, orthologs such as NmeCas9 that employ longer guide sequences and that require longer 375 PAMs have the potential for greater on-target specificity, possibly due in part to the lower density of near-376 cognate sequences. As an initial step in exploring this possibility, we used CRISPRseek [76] to perform a 377 global analysis of potential NmeCas9 and SpyCas9 off-target sites with six or fewer mismatches in the 378 human genome, using sgRNAs specific for DTS3, DTS7 and DTS8 (Fig. 5A) as representative queries. 379 When allowing for permissive and semi-permissive PAMs (NGG, NGA, and NAG for SpyCas9; 380 N4GHTT, N4GACT, N4GAYA, and N4GTCT for NmeCas9), potential off-target sites for NmeCas9 381 were predicted with two to three orders of magnitude lower frequency than for SpyCas9 (Table 2). 382 Furthermore, NmeCas9 off-target sites with fewer than five mismatches were rare (two sites with four 383 386 mismatches) for DTS7, and non-existent for DTS3 and DTS8 (Table 2). Even when we relaxed the 387 NmeCas9 PAM requirement to N4GN3, which includes some PAMs that enable only background levels of 388 targeting (e.g. N4GATC (Fig. 3A)), the vast majority of predicted off-target sites (>96%) for these three 389 guides had five or more mismatches, and none had fewer than four mismatches (Fig. 5A). In contrast, the 390 SpyCas9 guides targeting DTS3, DTS7, and DTS8 had 49, 54, and 62 predicted off-target sites with 391 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; three or fewer mismatches, respectively ( Table 2). As speculated previously [53, 54], these bioinformatic 392 predictions suggest the intriguing possibility that the NmeCas9 genome editing system may induce very 393 few undesired mutations, or perhaps none, even when targeting sites that induce substantial off-targeting 394 with SpyCas9. 395

Number of mismatches
Although bioinformatic predictions of off-targeting can be useful, it is well established that off-396 target profiles must be defined experimentally in a prediction-independent fashion due to our limited 397 understanding of target specificity determinants, and the corresponding inability of algorithms to predict 398 all possible sites successfully [31, 34, 35]. The need for empirical off-target profiling is especially acute 399 with Cas9 orthologs that are far less thoroughly characterized than SpyCas9. A previous report used PCR 400 amplification and high-throughput sequencing to detect the frequencies of lesions at 15-20 predicted 401 NmeCas9 off-target sites for each of three guides in human cells, and found only background levels of 402 indels in all cases, suggesting a very high degree of precision for NmeCas9 [54]. However, this report 403 restricted its analysis to candidate sites with N4GNTT PAMs and three or fewer mismatches (or two 404 mismatches combined with a 1-nt bulge) in the PAM-proximal 19 nucleotides, leaving open the possibility 405 that legitimate off-target sites that did not fit these specific criteria remained unexamined. Accordingly, 406 empirical and minimally-biased off-target profiles have never been generated for any NmeCas9/sgRNA 407 combination, and the true off-target propensity of NmeCas9 therefore remains unknown. At the time we 408 began this work, multiple methods for prediction-independent detection of off-target sites had been 409 (Supplemental Fig. 4C), we then prepared GUIDE-seq libraries for each of the six editing conditions, as 418 well as for the negative control conditions (i.e., in the absence of any sgRNA) for both Cas9 orthologs. 419 The GUIDE-seq libraries were then subjected to high-throughput sequencing, mapped, and analyzed as 420 described [79] (Fig. 5B-C). On-target editing with these guides was readily detected by this method, with 421 the number of independent reads ranging from a low of 167 (NmeCas9, DTS8) to a high of 1,834 422 (NmeCas9, DTS3) ( Fig. 5C and Supplemental Table 2). 423 For our initial analyses, we scored candidate sites as true off-targets if they yielded two or more 424 independent reads and had six or fewer mismatches with the guide, with no constraints placed on the 425 PAM match at that site. For SpyCas9, two of the sgRNAs (targeting DTS3 and DTS7) induced 426 substantial numbers of off-target editing events (271 and 54 off-target sites, respectively (Fig. 5B)) under 427 these criteria. The majority of these SpyCas9 off-target sites (88% and 77% for DTS3 and DTS7, 428 respectively) were associated with a canonical NGG PAM. Reads were very abundant at many of these 429 loci, and at five off-target sites (all with the DTS3 sgRNA) even exceeded the number of on-target reads 430 (Fig. 5C). SpyCas9 was much more precise with the DTS8 sgRNA: we detected a single off-target site 431 with five mismatches and an NGG PAM, and it was associated with only three independent reads, far 432 lower than the 415 reads that we detected at the on-target site ( Fig. 5C and Supplemental Table 2). 433 Overall, the range of editing accuracies that we measured empirically for SpyCas9 -very high (e.g. In striking contrast, GUIDE-seq analyses with NmeCas9, programmed with sgRNAs targeting 437 the exact same three sites, yielded off-target profiles that were exceptionally specific in all cases (Fig. 5B-438 C). For DTS3 and DTS8 we found no reads at any site with six or fewer guide mismatches; for DTS7 we 439 found one off-target site with four mismatches (three of which were at the PAM-distal end; see 440 Supplemental Table 2), and even at this site there were only 12 independent reads, ~100x fewer than the 441 1,222 reads detected at DTS7 itself. This off-target site was also associated with a PAM (N4GGCT) that 442 would be expected to be poorly functional, though it could also be considered a "slipped" PAM with a 443 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; more optimal consensus but variant spacing (N5GCTT). Purified, recombinant NmeCas9 has been 444 observed to catalyze DNA cleavage in vitro at a site with a similarly slipped PAM [52]. To explore the off-445 targeting potential of NmeCas9 further, we decreased the stringency of our mapping to allow detection of 446 off-target sites with up to 10 mismatches. Even in these conditions, only four (DTS7), 15 (DTS8), and 16 447 (DTS3) candidate sites were identified, most of which had only four or fewer reads (Fig. 5C) and were 448 associated with poorly functional PAMs (Supplemental Table 2). We consider it likely that most if not all 449 of these low-probability candidate off-target sites represent background noise caused by spurious priming 450 and other sources of experimental error. 451 As an additional test of off-targeting potential, we repeated the DTS7 GUIDE-seq experiments 452 with both SpyCas9 and NmeCas9, but this time using a different transfection reagent (Lipofectamine3000 453 rather than Polyfect). These repeat experiments revealed that >96% (29 out of 30) of off-target sites with 454 up to five mismatches were detected under both transfection conditions for SpyCas9 (Supplemental Table  455 1). However, the NmeCas9 GUIDE-seq data showed no overlap between the potential sites identified 456 under the two conditions, again suggesting that the few off-target reads that we did observe are unlikely to 457 represent legitimate off-target editing sites. 458 To confirm the validity of the off-target sites defined by GUIDE-seq, we designed primers 459 flanking candidate off-target sites identified by GUIDE-seq, PCR-amplified those loci following standard 460 genome editing (i.e., in the absence of co-transfected GUIDE-seq dsODN) (3 biological replicates), and 461 then subjected the PCR products to high-throughput sequencing to detect the frequencies of Cas9-462 induced indels. For this analysis we chose the top candidate off-target sites (as defined by GUIDE-seq read 463 count) for each of the six cases (DTS3, DTS7 and DTS8, each edited by either SpyCas9 or NmeCas9). In 464 addition, due to the low numbers of off-target sites and the low off-target read counts observed during the 465 NmeCas9 GUIDE-seq experiments, we analyzed the top two predicted off-target sites for the three 466 NmeCas9 sgRNAs, as identified by CRISPRseek (Fig. 5A and Table 2) [76]. On-target indel formation 467 was detected in all cases, with editing efficiencies ranging from 7% (DTS8, with both SpyCas9 and 468 NmeCas9) to 39% (DTS3 with NmeCas9) (Fig. 5D). At the off-target sites, our targeted deep-sequencing 469 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; analyses largely confirmed our GUIDE-seq results: SpyCas9 readily induced indels at most of the tested 470 off-target sites when paired with the DTS3 and DTS7 sgRNAs, and in some cases the off-target editing 471 efficiencies approached those observed at the on-target sites (Fig. 5D). Although some SpyCas9 off-472 targeting could also be detected with the DTS8 sgRNA, the frequencies were much lower (<0.1% in all 473 cases). Off-target lesions induced by NmeCas9 were far less frequent in all cases, even with the DTS3 474 sgRNA that was so efficient at on-target mutagenesis: many off-target sites exhibited lesion efficiencies 475 that were indistinguishable from background, and never rose above ~0.02% (Fig. 5D). These results, in 476 combination with the GUIDE-seq analyses described above, reveal wild-type NmeCas9 to be an 477 exceptionally precise genome editing enzyme. 478 To explore NmeCas9 editing accuracy more deeply, we chose 16 additional NmeCas9 target sites 479 across the genome, 10 with canonical N4GATT PAMs and six with variant functional PAMs 480 (Supplemental Table 9). We then performed GUIDE-seq and analyses of NmeCas9 editing at these sites. 481 GUIDE-seq analysis readily revealed editing at each of these sites, with on-target read counts ranging 482 from ~100 to ~5,000 reads (Fig. 6A). More notably, off-target reads were undetectable by GUIDE-seq 483 with 14 out of the 16 sgRNAs (Fig. 6B). Targeted deep sequencing of PCR amplicons, which is a more 484 quantitative readout of editing efficiency than either GUIDE-seq or T7E1 analysis, confirmed on-target 485 editing in all cases, with indel efficiencies ranging from ~5-85% (Fig. 6C). 486 The two guides with off-target activity (NTS1C and NTS25) had only two and one off-target sites, 487 respectively ( Fig. 6B and Supplemental Fig. 5). Off-target editing was confirmed by high-throughput 488 sequencing and analysis of indels (Fig. 6D). Compared with the on-target site (perfectly matched at all 489 positions other than the 5'-terminal guide nucleotide, and with an optimal N4GATT PAM), the efficiently 490 targeted NTS1C-OT1 had two wobble pairs and one mismatch (all in the nine PAM-distal nucleotides), 491 as well as a canonical N4GATT PAM (Fig. 6E and Supplemental Table 2). The weakly edited NTS1C-492 OT2 site had only a single mismatch (at the 11 th nucleotide, counting in the PAM-distal direction), but 493 was associated with a non-canonical N4GGTT (or a "slipped" N5GTTT) PAM ( Fig. 6E and  494 Supplemental Table 2). NTS25 with an N4GATA PAM was the other guide with a single off-target site 495 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; (NTS25-OT1), where NmeCas9 cleaved and edited up to ~1,000x less efficiently than at the on-target site 496 (Fig. 6D). This minimal amount of off-target editing arose despite the association of NTS25-OT1 with an 497 optimal N4GATT PAM, unlike the variant N4GATA PAM that flanks the on-target site. Overall, our 498 GUIDE-seq and sequencing-based analyses demonstrate that NmeCas9 genome editing is exceptionally 499 accurate: we detected and confirmed cellular off-target editing with only two of the 19 guides tested, and 500 even in those two cases, only one or two off-target sites could be found for each. Furthermore, of the three 501 bona fide off-target sites that we identified, only one generated indels at substantial frequency (11.6%); 502 indel frequencies were very modest (0.3% or lower) at the other two off-target sites. 503 We next sought to corroborate and expand on our GUIDE-seq results with a second prediction-504 independent method. We applied the SITE- Seq  Negative controls without RNP recovered zero sites across any concentrations, whereas SpyCas9 513 assembled with sgRNAs targeting DTS3, DTS7, or DTS8 recovered hundreds (at 4 nM RNP) to 514 thousands (at 256 nM RNP) of biochemical off-target sites (Fig. 6F). In contrast, NmeCas9 assembled 515 with sgRNAs targeting the same three sites recovered only their on-target sites at 4 nM RNP and at most 516 29 off-target sites at 256 nM RNP (Fig. 6F). Moreover, the 12 additional NmeCas9 target sites showed 517 similarly high specificity: eight samples recovered only the on-target sites at 4 nM RNP and six of those 518 recovered no more than nine off-targets at 256 nM RNP (Supplemental Fig. 6A). Across NmeCas9 519 RNPs, off-target sequence mismatches appeared enriched in the 5' end of the sgRNA target sequence 520 (Supplemental Table 4). Finally, three of the NmeCas9 RNPs (NTS30, NTS4C, and NTS59) required 521 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; elevated concentrations to retrieve their on-targets, potentially due to poor sgRNA transcription and/or 522 RNP assembly. These RNPs were therefore excluded from further analysis. 523 We next performed cell-based validation experiments to investigate whether any of the 524 biochemical off-targets were edited in cells. Since NmeCas9 recovered only ~100 biochemical off-targets 525 across all RNPs and concentrations, we could examine each site for editing in cells. SpyCas9 generated 526 >10,000 biochemical off-targets across all DTS samples, preventing comprehensive cellular profiling. 527 Therefore, for each RNP we selected 96 of the high cleavage sensitivity SITE-Seq sites (i.e., recovered at 528 all concentrations tested in SITE-Seq) for examination, as we predicted those were more likely to 529 accumulate edits in cells [63] (Supplemental Table 5). Sites were randomly selected within this cohort 530 and only included a subset of the GUIDE-seq validation test set sites (1/8 and 5/8 overlapping sites for 531 DTS3 and DTS7, respectively). Additionally, SITE-Seq and GUIDE-seq validations were performed on 532 the same gDNA samples to facilitate comparisons between data sets. 533 Across all NmeCas9 RNPs, only three cellular off-targets were observed. These three all 534 belonged to the NTS1C RNP, and two of them had also been detected with GUIDE-seq. Of note, all 535 high cleavage sensitivity SITE-Seq sites (i.e., all on-targets and the single prominent NTS1C off-target, 536 NTS1C-OT1) showed editing in cells. Conversely, SITE-Seq sites with low cleavage sensitivity, defined 537 as being recovered at only 64 nM and/or 256 nM RNP, were rarely found as edited (2/93 sites). 538 Importantly, this suggests that we identified all or the clear majority of NmeCas9 cellular off-targets, albeit 539 at our limit of detection. Across all SpyCas9 RNPs, 14 cellular off-targets were observed (8/70 sites for 540 DTS3, 6/83 sites for DTS7, and 0/79 sites for DTS8) (Supplemental Table 5). Since our data set was 541 only a subset of the total number of high cleavage sensitivity SITE-Seq sites, and excluded many of the 542 GUIDE-seq validated sites, we expect that sequencing all SITE-Seq sites would uncover additional 543 cellular off-targets. Taken together, these data corroborate our GUIDE-seq results, suggesting that 544 NmeCas9 can serve as a highly specific genome editing platform. 545 546 Indel spectrum at NmeCas9-edited sites 547 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; Our targeted deep sequencing data at the three dual target sites (Fig. 5D, Supplemental Fig. 4A  548 and Supplemental Table 5)  Although NmeCas9 exhibits very little propensity to edit off-target sites, for therapeutic 559 applications it may be desirable to suppress even the small amount of off-targeting that occurs (Fig. 6). truncations are compatible with NmeCas9 function (Fig. 2), we tested whether NmeCas9 tru-sgRNAs can 564 have similar suppressive effects on off-target editing without sacrificing on-target editing efficiency. 565 First, we tested whether guide truncation can lead to NmeCas9 editing at novel off-target sites (i.e. 566 at off-target sites not edited by full-length guides), as reported previously for SpyCas9 [69]. Our earlier 567 tests of NmeCas9 on-target editing with tru-sgRNAs used guides targeting the NTS33 (Fig. 2B-C) and 568 NTS32 (Fig. 2D-E) sites. GUIDE-seq did not detect any NmeCas9 off-target sites during editing with full-569 length NTS32 and NTS33 sgRNAs (Fig. 6). We again used GUIDE-seq with a subset of the validated 570 NTS32 and NTS33 tru-sgRNAs to determine whether NmeCas9 guide truncation leads to off-target 571 editing at new sites, and found none (Supplemental Fig. 12). Although we cannot rule out the possibility 572 that other NmeCas9 guides could be identified that yield novel off-target events upon truncation, our 573 results suggest that de novo off-targeting by NmeCas9 tru-sgRNAs is unlikely to be a pervasive problem. 574 The most efficiently edited off-target site from our previous analyses was NTS1C-OT1, providing 575 us with our most stringent test of off-target suppression. When targeted by the NTS1C sgRNA, NTS1C-576 OT1 has one rG-dT wobble pair at position -16 (i.e., at the 16 th base pair from the PAM-proximal end of 577 the R-loop), one rC-dC mismatch at position -19, and one rU-dG wobble pair at position -23 (Fig. 6E). 578 We generated a series of NTS1C-targeting sgRNAs with a single 5'-terminal G (for U6 promoter 579 transcription) and spacer complementarities ranging from 24 to 15 nucleotides (GN24 to GN15, 580 Supplemental Fig. 13A, top panel). Conversely, we designed a similar series of sgRNAs with perfect 581 complementarity to NTS1C-OT1 (Supplemental Fig. 13B, top panel). Consistent with our earlier results 582 with other target sites (Fig. 2), T7E1 analyses revealed that both sets of guides enabled editing of the 583 perfectly-matched on-target site with truncations down to 19 nucleotides (GN18), but that shorter guides 584 were inactive. On-target editing efficiencies at both sites were comparable across the seven active guide 585 lengths (GN24 through GN18), with the exception of slightly lower efficiencies with the GN19 guides 586 (Supplemental Fig. 13A & B, middle and bottom panels). 587 We then used targeted deep sequencing to test whether off-target editing is reduced with the 588 truncated sgRNAs. With both sets of sgRNAs (perfectly complementary to either NTS1C or NTS1C-589 OT1), we found that off-targeting at the corresponding near-cognate site persisted with the four longest 590 guides (GN24, GN23, GN22, GN21; Fig. 7). However, off-targeting was abolished with the GN20 guide, 591 without any significant reduction in on-target editing efficiencies (Fig. 7). Off-targeting was also absent 592 with the GN19 guide, though on-target editing efficiency was compromised. These results, albeit from a 593 limited data set, indicate that truncated sgRNAs (especially those with 20 or 19 base pairs of guide/target 594 complementarity, 4-5 base pairs fewer than the natural length) can suppress even the limited degree of off-595 targeting that occurs with NmeCas9. 596 Unexpectedly, even though off-targeting at NTS1C-OT1 was abolished with the GN20 and GN19 597 truncated NTS1C sgRNAs, truncating by an additional nucleotide (to generate the GN18 sgRNA) once 598 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; again yielded NTS1C-OT1 edits (Fig. 7A). This could be explained by the extra G residue at the 5'-599 terminus of each sgRNA in the truncation series (Supplemental Fig. 13). With the NTS1C GN19 sgRNA, 600 both the 5'-terminal G residue and the adjacent C residue are mismatched with the NTS1C-OT1 site. In 601 contrast, with the GN18 sgRNA, the 5'-terminal G is complementary to the off-target site. In other words, 602 with the NTS1C GN19 and GN18 sgRNAs, the NTS1C-OT1 off-target interactions (which are identical in 603 the PAM-proximal 17 nucleotides) include two additional nucleotides of non-complementarity or one 604 additional nucleotide of complementarity, respectively. Thus, the more extensively truncated GN18 605 sgRNA has greater complementarity with the NTS1C-OT1 site than the GN19 sgRNA, explaining the re-606 emergence of off-target editing with the former. This observation highlights the fact that the inclusion of a 607 5'-terminal G residue that is mismatched with the on-target site, but that is complementary to a C residue 608 at an off-target site, can limit the effectiveness of a truncated guide at suppressing off-target editing, 609 necessitating care in truncated sgRNA design when the sgRNA is generated by cellular transcription. This 610 issue is not a concern with sgRNAs that are generated by other means (e.g. chemical synthesis) that do not 611 require a 5'-terminal G. Overall, our results demonstrate that NmeCas9 genome editing is exceptionally 612 precise, and even when rare off-target editing events occur, tru-sgRNAs can provide a simple and effective 613 way to suppress them.  to define the genome-wide accuracy of wild-type NmeCas9, including side-by-side comparisons with 647 wildtype SpyCas9 during editing of identical on-target sites. We find that NmeCas9 is a consistently high-648 accuracy genome editor, with off-target editing undetectable above background with 17 out of 19 649 analyzed sgRNAs, and only one or three verified off-target edits with the remaining two guides. We 650 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online  observed this exquisite specificity by NmeCas9 even with sgRNAs that target sites (DTS3 and DTS7 (see 651 Fig. 5)) that are highly prone to off-target editing when targeted with SpyCas9. Of the four off-target sites 652 that we validated, three accumulated <1% indels. Even with the one sgRNA that yielded a significant 653 frequency of off-target editing (NTS1C, which induced indels at NTS1C-OT1 with approximately half 654 the efficiency of on-target editing), the off-targeting with wild-type NmeCas9 could be easily suppressed 655 with truncated sgRNAs. Our ability to detect NTS25-OT1 editing with GUIDE-seq, despite its very low 656 (0.06%) editing efficiency based on high-throughput sequencing, indicates that our GUIDE-seq 657 experiments can identify even very low-efficiency off-target editing sites. Similar considerations apply to 658 our SITE-Seq analyses. We observed high accuracy even when NmeCas9 is delivered by plasmid 659 transfection, a delivery method that is associated with higher off-target editing than more transient 660 delivery modes such as RNP delivery [86,87]. 661 The two Type II-C Cas9 orthologs (NmeCas9 and CjeCas9) that have been validated for 662 mammalian genome editing and assessed for genome-wide specificity [47, 54] (this work) have both 663 proven to be naturally hyper-accurate. Both use longer guide sequences than the 20-nucleotide guides 664 employed by SpyCas9, and both also have longer and more restrictive PAM requirements. For both Type 665 II-C orthologs, it is not yet known whether the longer PAMs, longer guides, or both account for the 666 limited off-target editing. Type II-C Cas9 orthologs generally cleave dsDNA more slowly than SpyCas9 667 [49,55], and it has been noted that lowering kcat can, in some circumstances, enhance specificity [88]. 668 Whatever the mechanistic basis for the high intrinsic accuracy, it is noteworthy that it is a property of the 669 native proteins, without a requirement for extensive engineering. This adds to the motivation to identify 670 more Cas9 orthologs with human genome editing activity, as it suggests that it may be unnecessary in 671 many cases (perhaps especially among Type II-C enzymes) to invest heavily in structural and mechanistic 672 analyses and engineering efforts to attain sufficient accuracy for many applications and with many desired 673 guides, as was done with (for example) SpyCas9 [32, 33, 37, 38, 65]. Although Cas9 orthologs with more 674 restrictive PAM requirements (such as NmeCas9, CjeCas9, and GeoCas9) by definition will afford lower 675 densities of potential target sites than SpyCas9 (which also usually affords the highest on-target editing 676 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017; efficiencies among established Cas9 orthologs), the combined targeting possibilities for multiple such 677 Cas9s will increase the targeting options available within a desired sequence window, with little propensity 678 for off-targeting. The continued exploration of natural Cas9 variation, especially for those orthologs with 679 other advantages such as small size and anti-CRISPR off-switch control, therefore has great potential to 680 advance the CRISPR genome editing revolution. was determined with the BCA kit (Thermo Scientific) and 12 μg of proteins were used for electrophoresis 718 and blotting. The blots were probed with anti-HA (Sigma, H3663) and anti-GAPDH (Abcam, ab9485) as 719 primary antibodies, and then with horseradish peroxidase-conjugated anti-mouse IgG (Thermoscientific, 720 62-6520) or anti-rabbit IgG (Biorad, 1706515) secondary antibodies, respectively. Blots were visualized 721 using the Clarity Western ECL substrate (Biorad, 170-5060). 722

Flow cytometry 723
The GFP reporter was used as described previously according to the manufacturer's protocol. 50 ng DNA was used for PCR-amplification using primers 732 specific for each genomic site (Supplemental Table 9) with High Fidelity 2X PCR Master Mix (New 733 England Biolabs). For T7E1 analysis, 10 μl of PCR product was hybridized and treated with 0.5 μl T7 734 Endonuclease I (10 U/μl, New England Biolabs) in 1X NEB Buffer 2 for 1 hour. Samples were run on a 735 2.5% agarose gel, stained with SYBR-safe (ThermoFisher Scientific), and quantified using the 736 ImageMaster-TotalLab program. Indel percentages are calculated as previously described [92,93]. 737 Experiments for T7E1 analysis are performed in triplicate with data reported as mean ± s.e.m. For indel 738 analysis by TIDE, 20 ng of PCR product is purified and then sequenced by Sanger sequencing. The trace 739 files were subjected to analysis using the TIDE web tool (https://tide.deskgen.com). 740 741

Expression and purification of NmeCas9 742
NmeCas9 was cloned into the pMCSG7 vector containing a T7 promoter followed by a 6xHis tag and a 743 tobacco etch virus (TEV) protease cleavage site. Two NLSs on the C-terminus of NmeCas9 and another 744 NLS on the N-terminus were also incorporated. This construct was transformed into the Rosetta 2 DE3 745 strain of E. coli. Expression of NmeCas9 was performed as previously described for SpyCas9 [14]. Briefly, 746 a bacterial culture was grown at 37°C until an OD600 of 0.6 was reached. At this point the temperature 747 was lowered to 18°C followed by addition of 1 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG) to 748 induce protein expression. Cells were grown overnight, and then harvested for purification. Purification of 749 NmeCas9 was performed in three steps: Nickel affinity chromatography, cation exchange 750 chromatography, and size exclusion chromatography. The detailed protocols for these can be found in 751 [14]. 752 manufacturer's protocol. 48 h after transfection, genomic DNA was extracted with a DNeasy Blood and 805 Tissue kit (Qiagen) according to the manufacturer protocol. Library preparation, sequencing, and read 806 analyses were done according to protocols described previously [63,65]. Only sites that harbored a 807 sequence with up to six or ten mismatches with the target site (for SpyCas9 or NmeCas9, respectively) 808 were considered potential off-target sites. Data were analyzed using the Bioconductor package GUIDEseq  Table 2. 820

SITE-Seq 821
We performed the SITE-Seq assay as described previously [ Individual RNPs were prepared by incubating each sgRNA at 95°C for 2 minutes, then allowed to slowly 835 come to room temperature over 5 minutes. Each sgRNA was then combined with its respective Cas9 in a 836 3:1 sgRNA:Cas9 molar ratio and incubated at 37°C for 10 minutes in cleavage reaction buffer (20 mM 837 HEPES, pH 7.4, 150 mM KCl, 10 mM MgCl2, 5% glycerol). In 96-well format, 10 µg of gDNA was 838 treated with 0.2 pmol, 0.8 pmol, 3.2 pmol, and 12.8 pmol of each RNP in 50 µL total volume in cleavage 839 reaction buffer, in triplicate. Negative control reactions were assembled in parallel and did not include 840 any RNP. gDNA was treated with RNPs for 4 hours at 37°C. Library preparation and sequencing were 841 done according to protocols described previously [63] using the Illumina NextSeq platform, and ~3 842 million reads were obtained for each sample. Any SITE-Seq sites without off-target motifs located within 843 1 nt of the cut-site were considered false-positives and discarded. 844 845

Targeted deep sequencing analysis 846
To measure indel frequencies, targeted deep sequencing analyses were done as previously described [65]. 847 Briefly, we used two-step PCR amplification to produce DNA fragments for each on-target and off-target 848 site. In the first step, we used locus-specific primers bearing universal overhangs with complementary ends 849 to the TruSeq adaptor sequences (Supplemental Table 7). DNA was amplified with Phusion High Fidelity 850 DNA Polymerase (New England Biolabs) using annealing temperatures of 60˚C, 64˚C or 68˚C, depending 851 on the primer pair. In the second step, the purified PCR products were amplified with a universal forward 852 primer and an indexed reverse primer to reconstitute the TruSeq adaptors (Supplemental Table 7). Input 853 DNA was PCR-amplified with Phusion High Fidelity DNA Polymerase (98°C, 15s; 61°C, 25s; 72°C, 18s; 854 9 cycles) and equal amounts of the products from each treatment group were mixed and run on a 2.5% 855 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a combined and aligned, as described above. Indel types and frequencies were then cataloged in a text 866 output format at each base using bam-readcount (https://github.com/genome/bam-readcount). For each 867 treatment group, the average background indel frequencies (based on indel type, position and frequency) 868 of the triplicate negative control group were subtracted to obtain the nuclease-dependent indel 869 frequencies. Indels at each base were marked, summarized and plotted using GraphPad Prism. Deep 870 sequencing data and the results of statistical tests are reported in Supplemental Table 3. 871 SITE-Seq cell-based validation was performed as previously described with minor modifications 872 [63]. In brief, SITE-Seq sites were amplified from ~1,000-4,000 template copies per replicate and 873 sequencing data from Cas9-treated samples were combined to minimize any variability due to uneven 874 coverage across replicates. Cas9 cleavage sites were registered from the SITE-Seq data, and mutant reads 875 were defined as any non-reference variant calls within 20 bp of the cut site. Sites with low sequencing 876 coverage (< 1,000 reads in the combined, Cas9-treated samples or <200 reads in the reference samples) 877 or >2% variant calls in the reference samples were discarded. Sites were tallied as cellular off-targets if 878 they accumulated > 0.5% mutant reads in the combined, Cas9-treated samples. This threshold 879 corresponded to sites that showed unambiguous editing when DNA repair patterns were visually 880 inspected. 881 Availability of data and material. The deep sequencing data from this study have been submitted to the NCBI 923 Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra) under accession number XXXXXX. 924 Plasmids will be made available via Addgene. 925  Table 3). For NmeCas9, in addition to those candidate off-target sites obtained from GUIDE-Seq (C), we also 1244 assayed one or two potential off-target sites (designated with the "-CS" suffix) predicted by CRISPRseek as the The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/172650 doi: bioRxiv preprint first posted online Aug. 4, 2017;