Oncogene Concatenated Enriched Amplicon Nanopore Sequencing for rapid, accurate, and affordable somatic mutation detection

We develop the Oncogene Concatenated Enriched Amplicon Nanopore Sequencing (OCEANS) method, in which variants with low variant allele frequency (VAFs) are amplified and subsequently concatenated for Nanopore Sequencing. OCEANS allows accurate detection of somatic mutations with VAF limits of detection between 0.05 and 1%. We construct 4 distinct multi-gene OCEANS panels targeting recurrent mutations in acute myeloid leukemia, melanoma, non-small- cell lung cancer, and hepatocellular carcinoma and validate them on clinical samples. By demonstrating detection of low VAF single nucleotide variant mutations using Nanopore Sequencing, OCEANS is poised to enable same-day clinical sequencing panels. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-021-02449-1).

Section S1. Effect of DNA Length on NS Throughput NS of 160 bp vs 10 kb fragmented human genomic DNA. Fig. S1 shows gel images of human genomic DNA fragmented to mean length of 160 bp or 10 kb. NS read statistics of the two libraries are shown in Fig. S and Fig. S . The mean read length of 160 bp fragmented library was 265 bp, due to the attachment of barcodes and sequencing adapter prior to NS. The 10kb DNA library had a mean read length of 3.8 kb. The Q-score of 10 kb library was higher than that of the 265 bp library. Section S2: SAL Design and Performance     Concatamer (0% VAF) Amplicon (0% VAF) FIG. S . NS amplicon t races with and without SAL. T op panels s how amplicon t races f or 0% V AF s ample. Bottom panels s how amplicon traces for 5% VAF sample. VRF of NS reads at each location that corresponds to the highest frequency single-base changes at that position. VRF in Fig. 2e was calculated by subtracting VRF of 0% VAF sample from 5% VAF sample.

Section S3: Bioinformatic Analysis
Summary of bioinformatic workflow. NS on MinION was run for 30 min to 1 hour depending on the number of barcoded samples. Typically, 350k-500k reads were obtained in the first one hour of NS. Base-called reads were demultiplexed using EPI2ME software (Oxford Nanopore Technologies). Depending on the total number of reads obtained per sample, reads were subsampled randomly to approximately 10,000 reads per sample for analysis. The NS reads were then deconcatenated using a custom python script.
The python script used minimap2 (github.com/lh3/minimap2) to map individual wildtype amplicon reference sequences to the concatenated reads and extract the mapped monomer sequences from the concatenated read. This step also removed any off-target amplicon sequences from the concatenated reads. Fig. S shows a sample concatenated read.
The mapping efficiency for finding monomer sequences within concatenated reads was calculated as mapped nucleotide fraction (MNF), the fraction of concatemer reads that mapped to an amplicon of interest (Fig. S ). Manual analysis of concatenated reads showed that minimap2 missed some monomers (Fig. S ). Tables S1 and S2 shows the number of reads analyzed for each sample and the MNF values. MNF was higher for FFPE samples, which could be due to the fragmented nature of DNA that resulted in reduced amplification of off-target sequences. Sub-sampling different sets of approximately 10,000 reads resulted in same variant calls with similar % VRF values (Table S3).
The mapped on-target sequences obtained after deconcatenation were then analyzed in two ways for variant calling. First, the sequences were aligned to amplicon reference sequences using minimap2 aligner to generate a BAM alignment file. IGVtools was used on the BAM file to extract the number of A, C, G, and basecalled nucleotides, insertions and deletions at each position using the basecount command. The VRF at each nucleotide position was calculated as the highest frequency single-base change at that position. Positions with VRF ¿20% were flagged as potential variants.
Second, the sequences were aligned to human reference genome (GRCh38) using minimap2 aligner to generate a BAM alignment file. The BAM file was down-sampled to <150x coverage for each amplicon using samtools view command. The downsampled BAM file was then used with Clair to call variants. Variant calls with score > 180 were flagged as potential variants. The bioinformatic workflow for variant calling using Clair variant caller is summarized in Fig. S . Potential variants with both VRF> 20% and CLAIR score > 180 were definitively called as variants.

Section S4: Analytical Validation Experiments with Synthetic DNA
Analytical validation experiments using synthetic DNA spike-in samples. Multi-gene OCEANS panels were calibrated by a spiking synthetic mutation-bearing DNA oligonucleotide (gBlock) into the human NA18562 gDNA. The 10% VAF positive control sample had its VAF confirmed by NGS, and was diluted with NA18562 gDNA to prepare positive control samples with VAF ranging between 1% and 0.05%. For 1% and 0.5% VAF samples, 50 ng input DNA was used for OCEANS panels, corresponding to 15,000 haploid copies. For 0.2%, 0.1%, and 0.05% VAF samples, 100 ng input DNA was used for OCEANS panels.
OCEANS reads were deconcatenated, aligned to human reference genome (GRCh38), and then both the Clair score and the Variant Read Fraction (VRF) were calculated for each mutation. A summary of the variant calls made by Clair for melanoma OCEANS panel's calibration runs is shown in Fig. S . Chromosome positions covered by the OCEANS panels are shown in Table S to Table S . Spike-in mutations tested and their computed median enrichment fold (EF) values for AML and melanoma panel are shown in Table S  Table S  Enrichment fold (EF) calculation. Enrichment fold was calculated as described previously in ref.
[1]. Briefly, EF was calculated from the variant read fraction (VRF) observed from NS after OCEANS and the variant allele fraction (VAF) in the original sample based on the formula: For the AML and melanoma panels, EF was calculated for each plex by spiking synthetic DNA bearing a mutation into NA18562 gDNA shown in Fig. 4a and Fig. 4d. Median EF was calculated from EF values in the linear range from 0.05% to 1% VAF for each plex. Given a EF value for a particular mutation, the original VAF of the mutation in the sample can, in principle, be calculated from the VRF based on the formula V AF = However, due to the relatively high variation in observed VRF due to nanopore sequencing's intrinsic error rate, the effective dynamic range of quantitation is small, and VAF estimations are accurate only when the observed VRF is between 10% and 90%. For example, given an EF value of 1000, 99% VRF and 95% VRF correspond to VAFs of 10% and 2%. At 85% vs. 80% VRF, the VAFs correspond to 0.56% and 0.40%, which is a relatively smaller difference.

Section S5. Mutations Detected in Melanoma Clinical Samples and NGS Comparison
Comparison of NS and Illumina NGS data on clinical samples. Fig. S1 shows comparison of OCEANS and Illumina NGS data separately for melanoma fresh frozen (FF) and FFPE clinical tissue samples. Mutations called by the OCEANS panel for melanoma, NSCLC and HCC clinical samples are listed in the excel file. Fig. S shows the location of TERT promoter mutation in a homopolymer region. FF and FFPE samples where insufficient amounts of DNA (< 5ng) were extracted were excluded from these lists, and OCEANS panels were not run on them. Table S shows clinical sample information for melanoma FF samples; FFPE sample clinical information were not available. Table S and Table S show clinical sample information for NSCLC and HCC FF and FFPE samples.
There are significantly higher number of mutations calls in FFPE samples compared to FF samples, such as the MAP2K1 c.371C>T mutation observed in 15 of the melanoma FFPE samples. The high concordance with digital PCR (Supplementary Section S6) generally suggests that these called variants are not due to systematic errors in the OCEANS method. A more likely explanation is that these mutations may have arisen from cytosine deamination DNA damage from FFPE treatment. This is supported by the fact that FFPE mutation calls had a significant over-representation of C>T and G>A mutations that derive from cytosine deamination.

Section S6: ddPCR Comparison Experiments
We performed a total of 24 ddPCR comparison experiments on 6 FFPE DNA samples, testing 4 mutations each (BRAF p.V600E, KRAS p.G13D, KRAS p.E62K, and MAP2K1 p.P124L). We observe that 21 of the 24 matched experiments gave concordant results, with 10 of these concordant results being concordant positives. 2 of the 11 concordant negative values were deemed OCEANS negative because they only satisfied one of the two criteria (based on VRF and Clair), suggesting that the orthogonal bioinformatic validation improved variant call accuracy. Individual