Schematic example of our model’s approach to detecting NAHR events from paired-end read data. The bottom half shows our framework for repeats and potential NAHR events. Each pair of homologous LCRs represents a potential NAHR event, annotated E
6. We then identify homologous LCRs into equivalence classes, folding the reference genome by homology. We focus on E
2 and the blue LCRs for this example. We collect all paired-end reads homologous to the blue LCRs. In the middle, we analyze the schematic blue data for two cases: no NAHR event at E
2 versus NAHR deletion at E
2. For this schematic, assume that E
2 indeed resulted in an NAHR deletion. According to the mechanism of NAHR, a deletion at E
2 results in a hybrid LCR. There are two major components to each analysis: the repeat read depth and the alignment of reads. For the repeat read depth, we compare the observed number of reads across all blue LCRs against the expected number. In the no-event case, there are three blue LCRs and so we expect 3× blue reads, but we only observed 2× blue reads. For the NAHR deletion, a novel hybrid LCR was formed by hybridizing the dark blue and aqua blue LCRs; thus, we expect 2× blue reads, as observed. For alignments, we focus on the hybrid reads that span the NAHR breakpoint in the hybrid blue LCR. Since the hybrid blue LCR is novel with respect to the reference genome, then the hybrid reads can be mapped concordantly to blue LCRs in the reference with very few errors, which is termed phantom concordance. As they are mapped concordantly, they are ignored by most other existing structural variation detection algorithms. But when we align them against the hybrid LCR, many of the read errors are resolved. LCR, low-copy repeat; NAHR, non-allelic homologous recombination.