Schematic example of an NAHR duplication with paired-end read data.(a) A schematic reference genome and paired-end read data. The dark green and light green regions are homologous LCRs that form a potential NAHR event locus. Green nucleotides are the VPs that distinguish the two LCRs. We suppose the individual experienced an NAHR duplication at this locus, and that the two paired-end reads shown were generated from the breakpoint region, that is they are hybrid reads. We consider two possible outcomes: no event and an NAHR duplication. (b) If no NAHR event occurs at this locus, then this locus of the individual’s genome is the same as in the reference. Notice that the paired-end reads are aligned concordantly to these LCRs, albeit with a small number of errors at the VPs; we call this phantom concordance. Suppose the probability of a read error is 2%. Then here, the likelihood of the mediating LCR is 0.98×0.02=0.0196 for each paired-end read. (c) The hybrid LCR formed from the NAHR duplication event is shown with aligned paired-end reads. The hybrid LCR is novel to the individual; it does not exist in the original reference genome. Notice that the VPs switch from dark green to light green after the breakpoint in the hybrid LCR. For simplicity in this schematic, we calculate the probability of a paired-end read’s alignment according to the agreement between its mates’ bases at the VPs, although in the algorithm a full alignment probability is calculated. The likelihood that the paired-end reads came from the hybrid LCR is 0.962=0.9604 for each read. LCR, low-copy repeat; NAHR, non-allelic homologous recombination; VP, variational position.