Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres

Fig. 1

Strand-specific nanopore basecalling errors are pervasive at telomeres. a, b IGV screenshot illustrating the three types of basecalling errors found on the forward and reverse strands of telomeres for nanopore sequencing. (TTAGGG)n on the forward strand of nanopore sequencing data was basecalled as (TTAAAA)n while (CCCTAA)n on the reverse strand was basecalled as (CTTCTT)n and (CCCTGG)n. PacBio HiFi data generated from the same cell line (CHM13) is depicted as a control. Reference genome indicated in the plot corresponds to the chm13 draft genome assembly (v1.0). c Co-occurrence heatmap illustrating the frequency of co-occurrence of repeats corresponding to natural telomeres, or to basecalling errors in PacBio HiFi and nanopore long-reads found at chromosomal ends (within 10kb of annotated end of the reference genome). Diagonal of co-occurrence matrix represents counts of long-reads with only a single type of repeats observed. d Basecalling errors at telomeres are observed across different nanopore datasets and sequencing platforms. e Basecalling errors at telomeres are observed for different nanopore basecallers and basecalling models. Guppy5 and the Bonito basecallers, and different bascalling models for each basecaller, were used to basecall telomeric reads in the CHM13 PromethION dataset (reads that mapped to flanking 10kb regions of the CHM13 reference genome). f Basecalling errors share similar nanopore current profiles as telomeric repeats. Current profiles for telomeric and basecalling error repeats were plotted based on known mean current profiles for each k-mer (“Methods”). g Summary of organisms assessed and the types of repeat errors observed. Note that S. pombe and D. melanogaster could not be readily assessed for the presence of error repeats by visualization in IGV as these sequences are more complex

Back to article page