SVA retrotransposon insertion-associated deletion represents a novel mutational mechanism underlying large genomic copy number changes with non-recurrent breakpoints

Background Genomic disorders are caused by copy number changes that may exhibit recurrent breakpoints processed by nonallelic homologous recombination. However, region-specific disease-associated copy number changes have also been observed which exhibit non-recurrent breakpoints. The mechanisms underlying these non-recurrent copy number changes have not yet been fully elucidated. Results We analyze large NF1 deletions with non-recurrent breakpoints as a model to investigate the full spectrum of causative mechanisms, and observe that they are mediated by various DNA double strand break repair mechanisms, as well as aberrant replication. Further, two of the 17 NF1 deletions with non-recurrent breakpoints, identified in unrelated patients, occur in association with the concomitant insertion of SINE/variable number of tandem repeats/Alu (SVA) retrotransposons at the deletion breakpoints. The respective breakpoints are refractory to analysis by standard breakpoint-spanning PCRs and are only identified by means of optimized PCR protocols designed to amplify across GC-rich sequences. The SVA elements are integrated within SUZ12P intron 8 in both patients, and were mediated by target-primed reverse transcription of SVA mRNA intermediates derived from retrotranspositionally active source elements. Both SVA insertions occurred during early postzygotic development and are uniquely associated with large deletions of 1 Mb and 867 kb, respectively, at the insertion sites. Conclusions Since active SVA elements are abundant in the human genome and the retrotranspositional activity of many SVA source elements is high, SVA insertion-associated large genomic deletions encompassing many hundreds of kilobases could constitute a novel and as yet under-appreciated mechanism underlying large-scale copy number changes in the human genome.


L1ME2
Cruciform STR (yellow): short tandem repeats of 26-bp that are repeated several times. A subtype of STR may also represent Z-DNA. Z-DNA (red): five or more tandem repeats, each comprising an alternating pyrimidine-purine dinucleotide motif (Wang et al., 1981;Cer et al., 2011). Polypurine tract (yellow): poly-A or poly-G tracts Polypyrimidine tract (yellow): poly-C or poly-T tracts IR (green): inverted repeat of ≥6-bp separated by ≤100-bp. Cruciform (green): A subtype of inverted repeat of ≥6-bp separated by 04-bp. DR (pink): direct repeat of 10300-bp with a spacer sequence of 110-bp. Slipped motifs represent direct repeats that are not separated by any spacer sequences. MR (grey): mirror repeat which encompasses at least 10-bp and is separated by 9100-bp. A subset of mirror repeats, termed "triplex motif", is characterized by a spacer sequence of 08-bp between the repeats. A-phased repeat (turquoise): three or more tracts of A 3-7 , T 3-7 , AAATTT, AAATTTT and AAAATTT in any combination located on the plus or minus strand. The centre of the tracts are separated by 1112-bp (Cer et al., 2011;Cer et al., 2013). QC (olive): G-quadruplex-forming repeat encompassing four repeat units, each containing the same number of 'Gs' (on the plus or minus strand). Their number can vary from 3 to 7 and the repeat units can be separated by 1-7-bp (Cer et al., 2011). a: Thirty NF1 deletion breakpoint-flanking sequence fragments were analysed, each encompassing 300-bp. In parentheses are the proportions of the 30 deletion breakpoint-spanning sequences that exhibited non-B DNAforming sequence motifs. b: The control sequences comprised 200 fragments of 300-bp each. The control sequences do not flank any known atypical NF1 deletion breakpoints. The corresponding sequences are located within 17q11.2 telomeric to SUZ12P (genomic position: 29,118,000-29,148,000; hg19) and between RAB11FIP4 and COPRS (30,020,000-30,050,000; hg19). In total, the control dataset encompassed 60-kb of genomic DNA. In parentheses are indicated the proportions of the 200 sequence fragments that exhibited specific non-B DNA-forming sequence motifs. c: The two-tailed Fisher's Exact test was applied to calculate the statistical significance of the differences in the number of non-B DNA motifs observed in the breakpoint-flanking sequences of deletion breakpoints and in the control dataset. d: Some sequences from the investigated datasets fulfill the criteria for more than one non B-DNA motif subtype, i.e. 'TTAATTAATTAA' represents a short tandem repeat (2−6-bp sequence repeated several times) as well as a cruciform repeat which is a subtype of an inverted repeat of ≥6-bp separated by 04-bp. Therefore, the number of sequences exhibiting non-B DNA subtypes exceeds the number of sequences with at least one non-B DNA-forming motif. DR: direct repeat(s); IR: inverted repeat(s). a: In total, we investigated 30 breakpoint-flanking sequences of 300-bp each. These 300-bp regions comprise 150-bp centromeric and 150-bp telomeric to each NF1 deletion breakpoint. b: The control sequences were located within 17q11.2 telomeric to SUZ12P (genomic position: 29,118,000-29,148,000; hg19) and between RAB11FIP4 and COPRS (30,020,000-30,050,000; hg19). In total, the control dataset comprised 60-kb of genomic DNA including 200 fragments of 300-bp each. c: The two-tailed Fisher's Exact test was applied to calculate the statistical significance of the differences in the number of direct and inverted repeats observed in the breakpoint-flanking sequences of the deletion breakpoints and in the control dataset. The sequence alignments were performed using default settings for the algorithm parameters: Expect threshold: 10, word size: 28 and match/mismatch scores: 1, -2. The following parameters were however not run under default settings but instead adjusted to the requirements of our analysis: the parameter 'number of maximum target sequences' was increased from 100 (default) to 20,000 so that all sequence alignments were displayed. The parameter 'maximum matches in a query' was changed from 0 (default) to 100 in order to identify all possible matches to the query sequence. The parameter 'gap costs' was changed from the default setting 'linear' to 'existence: 5 and extension: 2'. These settings imply that the cost to open a gap scores -5 whereas the cost to extend the gap scores -2. By adopting these settings, we reduced the number of gaps that would extend the length of the alignment at the expense of the sequence identity. b: Full-length retrotransposon.  190,379-29,190,683 29,192,185-29,192 207,345-29,207,639 29,208,996-29,209,287 88% 1,358-bp direct a: The sequence alignments were performed using default settings for the algorithm parameters: Expect threshold: 10, word size: 28 and match/mismatch scores: 1, -2. The following parameters were however not run under default settings but instead adjusted to the requirements of the analysis; the parameter 'number of maximum target sequences' was increased from 100 (default) to 20,000 so that all sequence alignments were displayed. The parameter 'maximum matches in a query' was changed from 0 (default) to 100 in order to identify all possible matches to the query sequence. The parameter 'gap costs' was changed from the default setting 'linear' to 'existence: 5 and extension: 2'. These settings imply that the cost to open a gap scores -5 whereas the cost to extend the gap scores -2. By adopting these settings, we reduced the number of gaps that would extend the length of the alignment at the expense of the sequence identity. b: The control dataset comprises two genomic regions: one is located telomeric to SUZ12P (genomic position: 29,118,000-29,210,000; 92-kb), the other between RAB11FIP4 and COPRS (genomic position: 30,020,000-30,048,000; 28-kb). The total 120-kb of genomic DNA were subdivided into 30 regions of 4-kb each and hypothetical breakpoints were assigned locations between nucleotides at positions 2,000 and 2,001 of each of these 4-kb fragments. Only seven of these 30 control regions (K1-K30) contained direct or inverted repeats as indicated in the first column. c: Full-length retrotransposon. The number of direct and inverted repeats >150-bp exhibiting ≥87% sequence homology was determined by BLASTN self-alignments of 2-kb regions flanking the deletion breakpoint regions on both sides. The number of such repeats was also determined in a control dataset of sequences derived from two genomic regions: one is located telomeric to SUZ12P (genomic position: 29,118,000-29,210,000; 92-kb), the other between RAB11FIP4 and COPRS (genomic position: 30,020,000-30,048,000; 28-kb). In total, these two regions comprise 120-kb of genomic DNA which were subdivided into 30 fragments of 4-kb each. Hypothetical breakpoints were assigned locations between nucleotides at positions 2,000 and 2,001 of each of these 4-kb fragments. b: The two-tailed Fisher's Exact test was applied to assess the statistical significance of the differences in the number of repeats observed.
The NF1 deletion breakpoint dataset included 30 sequence fragments of 300-bp flanking the breakpoints of the 15 atypical NF1 deletions. The breakpoints were located between nucleotides 150 and 151 of each of these 300-bp fragments. In our analysis, we considered those retrotransposons that overlapped the breakpoints or which were located immediately adjacent to the breakpoints. b: The control dataset included 200 sequence fragments of 300-bp that are located in 17q11.2 within genomic regions not harbouring known atypical NF1 deletion breakpoints. The control sequence dataset comprised two 30-kb regions, one located telomeric to SUZ12P (genomic position: 29,118,000-29,148,000; hg19) and the other located between RAB11FIP4 and COPRS (30,020,000-30,050,000; hg19). The 60-kb of genomic DNA sequences were subdivided into two hundred 300-bp regions and hypothetical breakpoints were assigned locations between nucleotides 150 and 151 of each of these 300-bp fragments. c: The two-tailed Fisher's Exact test was applied to assess the statistical significance of the differences between the number of elements observed in the two datasets.     The Expand Long Template PCR system (Roche) was used to perform the PCRs with the addition of 10% DMSO. The genomic DNA used as PCR template was diluted at 40ng/µl with water and added to the PCR to a final amount of 400ng. The initial denaturation of the genomic DNA was performed for 10 minutes. Different elongation times were tested in order to amplify not only 5' truncated (and hence shorter) SVA element insertions but also full-length SVA elements putatively inserted into SUZ12P intron 8 or within the intergenic region between RAB11FIP4 and COPRS. were not included in the analysis because their telomeric deletion breakpoints were not located within the region between NF1-REPa and NF1-REPc but instead were located 1.53-Mb and 3.69-Mb telomeric to NF1-REPc, respectively. The 615,383-bp region harbouring all 15 telomeric breakpoints was subdivided into two regions: region 1 (32,559-bp) and region 2 (582,824-bp). b: The expected number of breakpoints within region 1 was determined as follows: A total of 15 breakpoints were observed to be located within 615,383-bp. The proportion of the entire 615,383-bp corresponding to region 1 is 0.05 and the proportion corresponding to region 2 is 0.95. Under the assumption of an equal number of breakpoints in both regions 1 and 2, the expected number of breakpoints in region 1 is N=1 (0.05 x 15) and in region 2: N=14 (0.95 x 15). c: The chi-squared test (one degree of freedom) was used to calculate the significance of the difference between the observed versus the expected number of breakpoints. were not included in the analysis because their centromeric deletion breakpoints were not located within the region between NF1-REPa and NF1-REPc but instead were located 785-kb and 1.2-Mb centromeric to NF1-REPa, respectively. The 318,008-bp region harbouring all 15 centromeric breakpoints was subdivided into two regions: region 1 (39,082-bp) and region 2 (278,926-bp). b: The expected number of breakpoints within region 1 was determined as follows: A total of 15 breakpoints were observed to be located within 318,008-bp. The proportion of the entire 318,008-bp corresponding to region 1 is 0.12 and the proportion corresponding to region 2 is 0.88. Under the assumption of an equal number of breakpoints in both regions 1 and 2, the expected number of breakpoints in region 1 is N=2 (0.12 x 15) and in region 2, the expected number of breakpoints is N=13 (0.88 x 15). c: The chi-squared test (one degree of freedom) was used to calculate the significance of the difference between the observed versus the expected numbers of breakpoints.  Eleven NF1 deletion breakpoint-flanking sequence fragments were analysed, each encompassing 300-bp. In parentheses are the proportions of the 11 deletion breakpoint-spanning sequences that exhibited non-B DNAforming sequence motifs. b: The control sequences comprised 200 fragments of 300-bp each. The control sequences do not flank any known atypical NF1 deletion breakpoints. The corresponding sequences are located within 17q11.2 telomeric to SUZ12P (genomic position: 29,118,000-29,148,000; hg19) and between RAB11FIP4 and COPRS (30,020,000-30,050,000; hg19). In total, the control dataset encompassed 60-kb of genomic DNA. In parentheses are indicated the proportions of the 200 sequence fragments that exhibited specific non-B DNA-forming sequence motifs. c: The two-tailed Fisher's Exact test was applied to calculate the statistical significance of the differences in the number of non-B DNA motifs observed in the breakpoint-flanking sequences of deletion breakpoints and in the control dataset. d: Some sequences from the investigated datasets fulfill the criteria for more than one non B-DNA motif subtype, i.e. 'TTAATTAATTAA' represents a short tandem repeat (2−6-bp sequence repeated several times) as well as a cruciform repeat which is a subtype of an inverted repeat of ≥6-bp separated by 04-bp. Therefore, the number of sequences exhibiting non-B DNA subtypes exceeds the number of sequences with at least one non-B DNA-forming motif. Table S20: Numbers of direct and inverted repeats identified within 150-bp flanking the breakpoints of the 11 atypical NF1 deletions with centromeric breakpoints located in SUZ12P as compared with the numbers of such repeats identified within a control dataset of sequences not harbouring NF1 deletion breakpoints. MEME suite (http://meme.nbcr.net/meme/) was used to analyse repeats ≥ 6-bp up to 150-bp. The number of base-pairs between the repeats was not restricted to a specific length. DR: direct repeat(s); IR: inverted repeat(s). a: In total, we investigated 11 centromeric breakpoint-flanking sequences of 300-bp each. These 300-bp regions comprise 150-bp centromeric and 150-bp telomeric to each NF1 deletion breakpoint. b: The control sequences were located within 17q11.2 telomeric to SUZ12P (genomic position: 29,118,000-29,148,000; hg19) and between RAB11FIP4 and COPRS (30,020,000-30,050,000; hg19). In total, the control dataset comprised 60-kb of genomic DNA including 200 fragments of 300-bp each. c: The two-tailed Fisher's Exact test was applied to calculate the statistical significance of the differences in the number of direct and inverted repeats observed in the breakpoint-flanking sequences of the deletion breakpoints and in the control dataset.

P-value b
Direct repeat (>150-bp) 2/11 (18%) 6/30 (20%) 0.99 a: The number of direct repeats >150-bp exhibiting ≥87% sequence homology was determined by BLASTN self-alignments of 2-kb regions flanking the deletion breakpoint regions on both sides. The number of such repeats was also determined in a control dataset of sequences derived from two genomic regions: one is located telomeric to SUZ12P (genomic position: 29,118,000-29,210,000; 92-kb), the other between RAB11FIP4 and COPRS (genomic position: 30,020,000-30,048,000; 28-kb). In total, these two regions comprise 120-kb of genomic DNA which were subdivided into 30 fragments of 4-kb each. Hypothetical breakpoints were assigned locations between nucleotides at positions 2,000 and 2,001 of each of these 4-kb fragments. b: The two-tailed Fisher's Exact test was applied to assess the statistical significance of the differences in the number of repeats observed. Table S22: SVA insertion-associated deletions in the human genome as compared with the chimpanzee genome according to (Lee et al., 2012).

Genomic position of the inserted SVA element (hg19) Deletion size (bp)
Chr 1      Netherlands) as well as the custom-designed MLPA probes (marked in bold) used to narrow down the breakpoint regions of the atypical NF1 deletions investigated in this study. The custom-designed MLPA probes were established in our previous study (Vogt et al. 2012) in order to distinguish type-2 deletions with breakpoints located in SUZ12 and SUZ12P from deletions that do not harbour proximal and distal deletion breakpoints within these paralogs.

Probe designation
Probe position on chromosome 17 (          A B Figure S1: The insertion of 9-bp (green) at the deletion breakpoints of patient 70969 may have occurred in association with the occurrence of the large NF1 deletion mediated by replication-associated template switching. (A) Alignment of the deletion breakpointflanking sequences of patient 70969 against the reference sequence of the human genome (hg19). Sequences located at the proximal (centromeric) deletion breakpoint are indicated in black, whilst sequences at the distal (telomeric) breakpoint are given in blue. The vertical red line highlights the position of the proximal deletion breakpoint. The 9-bp insertion (green) represents a duplication of 9-bp from the distal breakpoint region (underlined). (B) In the proximal breakpoint-flanking region, DNA synthesis at the leading strand is interrupted but appears to have resumed, after an interstrand template switch, at sequences located within the distal breakpointflanking region (blue) (step 1). Subsequently, the 9-bp indicated in green are newly synthesized and included in the nascent DNA strand at the replication fork located in the distal breakpoint region (step 2). This is then followed by another template switch occurring onto the leading strand (step 3) upon which replication is continued (step 4). The nucleotides exhibiting microhomology at sites of template switching are marked in yellow. TGGTAGCAGGTGTGGTG 5 Figure S2: The insertion of 10-bp at the deletion breakpoint of patient 619 is likely to have occurred concomitantly with the large NF1 deletion mediated by template switching during replication. (A) Alignment of the deletion junction sequences against the reference sequence of the human genome (hg19). Sequences at the proximal (centromeric) deletion breakpoint are indicated in black, whilst sequences at the distal (telomeric) breakpoint are given in blue. The vertical red line represents the proximal deletion breakpoint. The insertion of 10-bp (green) appears to represent a duplication of the underlined sequences. (B) Within the proximal breakpoint-flanking region, DNA synthesis at the leading strand is interrupted but continues after an interstrand template switch into a replication fork located within the distal breakpointflanking region (blue) (step 1). The 7-bp sequence indicated in green is newly synthesized and included within the nascent DNA strand (step 2). Subsequently, a further template switch occurs which involves sequences located in 3' direction, also within the lagging strand template, causing the insertion of the trinucleotide 'AAT' (step 4). Finally, another template switch occurs (step 5) and replication is continued within the distal breakpoint-flanking region. The nucleotides exhibiting microhomology at sites of template switching are marked in yellow.  Figure S3: The insertion of 11-bp at the deletion breakpoint junction of patient R84329 is likely to have occurred concomitantly with the large NF1 deletion mediated by replication-associated template switching. (A) Alignment of the deletion breakpoint-flanking sequences against the reference sequence of the human genome (hg19). Sequences at the proximal (centromeric) deletion breakpoint are indicated in black, whilst sequences at the distal (telomeric) breakpoint are given in blue. The vertical red line highlights the position of the distal deletion breakpoint. The insertion of 11-bp (green) appears to represent a duplication of pre-existing sequences (underlined). (B) In the distal breakpoint-flanking region, DNA synthesis at the lagging strand stops and an interstrand template switch occurs into a replication fork located within the proximal breakpoint-flanking region (black) (step 1). The hexanucleotide 'GGAGAC' (green) is included within the nascent DNA strand (step 2). Single nucleotide changes due to DNA polymerase errors are highlighted in grey. Subsequently, another template switch occurs (step 3) causing the insertion of the five additional nucleotides indicated in green (step 4). Finally, a further template switch occurs (step 5) followed by continued replication within the proximal breakpoint-flanking region (step 6). The nucleotide exhibiting microhomology at the site of template switching is marked in yellow.  Figure S4: Alignments of the deletion junction sequences of the 15 atypical NF1 deletions against the reference sequence of the human genome (hg19). The reference sequences at the proximal (centromeric) and distal (telomeric) breakpoint flanking regions are given in black and blue, respectively. Microhomology at the breakpoints is indicated in red and is characterized by one or more perfectly matching nucleotides at the breakpoints. Microinsertions of nucleotides at the deletion breakpoints are indicated in green and sequence mismatches between the junction sequence and the reference sequence is marked in grey. SNPs are highlighted in turquoise. Sequence composition of NF1-REPa harbouring LRRC37B-P (red) and SMURF2-P (green), both pseudogenes. The numbers of the exons located within these pseudogenes are indicated. Additionally, NF1-REPa contains sequences that are highly homologous to chromosome 19p13.12 (marked in grey). The black arrows indicate the genomic orientation of the three LRRC37B pseudogene fragments (designated P1P3). P1P3 represent partial duplications of the LRRC37B gene located within NF1-REPc (not shown). (B) Inverted repeats of 5.7-kb were identified within LRRC37B-P1 and P2 (indicated by grey arrows). The centromeric deletion breakpoint in patient 619 (genomic position: 28,946,218; hg19) is located within the centromeric 5.7-kb inverted repeat. The centromeric deletion breakpoint in patient 659 (genomic position: 28,948,946; hg19) is located 48-bp telomeric to the 5.7-kb repeat. The relative positions of both breakpoints are indicated by lilac triangles. The 5.7-kb inverted repeats exhibit 99% sequence identity and are separated by 3,710-bp. We surmise that these repeats contributed to the occurrence of the large NF1 deletions by forming a hairpin structure, thereby inducing a DNA double strand break. The sequence composition of the 5.7-kb repeat is shown in (C).   Figure S8: PCRs performed in order to amplify across the SVA element inserted at the breakpoints of the atypical NF1 deletion identified in patient DA-77. The centromeric and the telomeric breakpoint regions within 17q11.2 and the inserted SVA element are indicated. Breakpoint-spanning PCRs were performed with primers GSP_1for and GSP_1rev (indicated by arrows). PCR with primers AD91for and SVA_4rev was also performed to characterize the insertion site of the SVA element. Sequence analysis of these PCR products revealed the structure of the inserted SVA element at the deletion breakpoints in patient DA-77 and indicated that the centromeric deletion breakpoint was located within intron 8 of SUZ12P whereas the telomeric breakpoint was located in an intergenic region between the RAB11FIP4 and COPRS genes.   Figure S11: Identification of the centromeric deletion breakpoint in patient ASB4-55. (A) The breakpoint of the deletion is located within SUZ12P intron 8. The genomic region that is not deleted is indicated by a grey bracket. The lengths of the repetitive elements and unique sequences located within breakpoint-flanking regions are indicated in base-pairs. (B) Relative extent of the non-deleted regions as determined by PCR using DNA isolated from somatic hybrid cells containing only the chromosome 17 with the deletion and not the normal chromosome 17 from the patient. (C) Inverse PCR after restriction of genomic DNA with PciI (red triangles) and HincII (green triangles) indicated that the deletion breakpoint lies immediately adjacent to a polyT (40) tract that is not included in the reference sequence of the human genome (hg19). (D) Semi-specific PCR confirmed the presence of the polyT tract at the deletion breakpoint. Breakpoint-spanning PCR as indicated in Figure S13 revealed that the breakpoint of the deletion and the insertion of the SVA element occurred at genomic position 29,103,071. The telomeric breakpoint region is located between the RAB11FIP4 and COPRS genes. The lengths of the repetitive elements and unique sequences located close to the deletion breakpoint are indicated in base-pairs. (B and C) Array CGH analysis suggested that the breakpoint should be located between nucleotide positions 29,968,972 and 29,971,033. The breakpoint region was further narrowed down by PCR using DNA isolated from somatic cell hybrids containing only the chromosome 17 harbouring the deletion from the patient as well as semi-specific PCR. These experiments indicated that the genomic region telomeric to position 29,970,020 was not deleted. Breakpoint-spanning PCR with primers indicated in Figure S13 revealed that the breakpoint was located within an MER41B element. In the reference sequence of the human genome (hg19), the corresponding MER41B element encompasses 300-bp. In patient ASB4-55, however, the MER41B element is truncated, spanning only 167-bp and is located immediately adjacent to the VNTR region of the inserted SVA element. Genomic position 29,969,839 demarcates the breakpoint of the deletion and the insertion of the SVA element. harbouring the SVA insertion-associated NF1 deletion in this patient. Indicated are the centromeric and telomeric regions flanking the deletion breakpoints within 17q11.2 and the inserted SVA element. The lengths of the repetitive and unique sequences located within the breakpoint-flanking regions are indicated in base-pairs. (B) PCR performed in order to amplify across the SVA element inserted at the breakpoints of the atypical NF1 deletion identified in patient ASB4-55. Breakpoint-spanning PCR was performed with primers as117for and as146Brev. Sequence analysis of the corresponding PCR product indicated the structure of the SVA element as well as its insertion sites. Figure S16: Alignment of the reference sequence of SUZ12P intron 8 (hg19) against the corresponding region in patient DA-77. The polyT tract of the SVA element inserted into SUZ12P intron 8 in this patient is marked in red. The LINE 1 endonuclease (L1 EN), most likely involved in the insertion of this SVA element, is known to exhibit substrate specificity and cleaves at specific L1 EN consensus cleavage sites such as 5'-TTTT/A-3' and 5'-CTTT/A-3' (Morrish et al., 2002). The L1 EN cleavage site 5'-TTTT/A-3' in the reference sequence hg19 is highlighted in yellow and the position of cleavage is indicated by an arrow. The SNP rs8071236 (T/C) is located within this sequence motif as highlighted in blue. The chromosome 17 sequence of patient DA-77 harbouring the SVA insertion exhibited the Callele of this SNP. Hence, the corresponding L1 EN cleavage site was 5'-CTTT/A-3'.  Figure S17: Alignment of the reference sequence of SUZ12P intron 8 (hg19) against the corresponding region in patient ASB4-55. The polyT tract of the SVA element inserted into SUZ12P intron 8 in this patient is marked in red. The LINE 1 endonuclease (L1 EN), most likely involved in the insertion of this SVA element, is known to exhibit substrate specificity and cleaves at specific L1 EN consensus cleavage sites such as 5'-CTTT/A-3' (Morrish et al., 2002). The L1 EN cleavage site 5'-CTTT/A-3' in the reference sequence hg19 is highlighted in yellow and the position of cleavage is indicated by an arrow. In each case, at least 200 interphase nuclei from cultured blood were investigated. Dual-colour FISH was performed with BAC RP11-142O6 which spans the proximal part of the NF1 gene and the alpha-satellite enumeration probe SE17/D17Z1 (Kreatech, Amsterdam, Netherlands) which was used as a control. The NF1 probe is visible in green whereas the control probe is visible in red. Figure S19: Segregation of the 1-Mb-spanning atypical NF1 deletion in the family of patient DA-77 (III/2). The SVA insertion-associated NF1 deletion was originally identified in patient III/2 and had occurred in the grandmother (I/2). It must have resulted from a postzygotic rearrangement since the grandmother (I/2) exhibited somatic mosaicism with normal cells as determined by FISH. The grandmother then passed on the SVA insertion-associated atypical NF1 deletion to her offspring. The SVA insertion associated-deletion was verified in her grandchildren by PCR and sequence analysis of the corresponding PCR products (patients III/1, III/2, III/3 and III/4).  To investigate whether the insertion of the SVA element in SUZ12P intron 8 would be a frequent insertion/deletion polymorphism, 50 African and 50 white European DNA samples were analysed by PCR#1. However, PCR#1 was not positive in any of these DNA samples and hence it is unlikely that the SVA insertion would represent a frequent polymorphism. By contrast, PCR#1 was positive in patient DA-77 and her family members III/1, III/3, III/4 and I/2 who harboured the large NF1 deletion and the SVA insertion at the deletion breakpoints. (C) To analyse whether the SVA insertion had preceded the occurrence of the large NF1 deletion in the grandmother of patient DA-77, we investigated the potential presence of cells with the SVA insertion but lacking the large NF1 deletion by PCR. The grandmother (I/2) exhibited somatic mosaicism in blood with 75% of cells harbouring the deletion whilst 25% of cells were normal as determined by FISH. However, PCR#2 and PCR#4 performed using bloodderived DNA from the grandmother were negative for the anticipated PCR products of 1.8-kb (PCR#2) and 3.3-kb (PCR#4) under the scenario of the SVA insertion being present whilst the large NF1 deletion was absent. We therefore concluded that the grandmother did not possess a chromosome 17 which harbours the inserted SVA element without the large NF1 deletion. This conclusion was further confirmed by PCRs #3 and #5 performed using genomic DNA from the grandmother. If the SVA element had been inserted into a chromosome 17 lacking the large NF1 deletion, a PCR product of 4.3-kb would have been anticipated for PCR#5 and a 2.3-kb product for PCR#3. However, only shorter PCR products of 533-bp (PCR#3) and 2.6-kb (PCR#5) were obtained, which were derived from a normal chromosome 17 lacking the SVA element insertion. It might also have been possible that a full-length SVA element had inserted into chromosome 17. Therefore, PCRs #2-5 were also performed with elongation times ranging from 35 minutes in order to amplify PCR products up to 6.6-kb. However, under these conditions, only short PCR products of 533-bp (PCR#3) and 2.6-kb (PCR#5) were obtained, derived from a normal chromosome 17 lacking the SVA insertion. Sequence analysis of PCR#3 spanning 533-bp indicated the presence of two heterozygous SNPs (rs8066389 and rs8071236) thereby confirming the presence of two normal chromosomes 17 neither of which possessed an SVA insertion at the corresponding position in SUZ12P intron 8. We conclude that the large NF1 deletion occurred concurrently with the SVA insertion. In order to determine whether the insertion of the SVA element had occurred prior to the large NF1 deletion in the grandmother of patient DA-77, we performed PCRs #68 using blood-derived DNA from the grandmother. PCR#7 was negative for the PCR product of 416-bp anticipated under the scenario of the SVA insertion being present whilst the large NF1 deletion was absent. We concluded that the grandmother did not possess a chromosome 17 harbouring the inserted SVA element in the absence of the large NF1 deletion. This conclusion was further confirmed by PCR#6. If the SVA element had been inserted into a chromosome 17 lacking the large deletion, a PCR product of 2.3-kb would have been expected for PCR#6. However, only a shorter PCR product of 588-bp was obtained which was derived from a normal chromosome 17 lacking the SVA insertion. It might also have been possible that a full-length SVA element had inserted into chromosome 17. Therefore, PCR#6 was also performed with an elongation time of 4 minutes in order to amplify a PCR product of 4-kb corresponding to the size of a full-length SVA element. However, under these conditions, only the short PCR product of 588-bp was obtained which was derived from a normal chromosome 17 without the insertion. Sequence analysis of the 4,684-bp PCR product of PCR#8 obtained with primers AD75for/AD21rev indicated the presence of one heterozygous SNP (rs9913053) thereby confirming the presence of two normal chromosomes 17 neither of which possessed an SVA insertion at the corresponding position within the intergenic region between RAB11FIP4 and COPRS. We conclude that the large NF1 deletion must have occurred concurrently with the SVA insertion in the grandmother of patient DA-77. To investigate whether the SVA insertion within SUZ12P intron 8 might represent a frequent insertion/deletion polymorphism, 50 African and 50 white European DNA samples were analysed by PCR#9. However, PCR#9 was not positive in any of these DNA samples and hence it is unlikely that the insertion represents a frequent polymorphism at this location in the human genome. By contrast, PCR#9 yielded a 800-bp spanning PCR product in patient ASB4-55 harbouring the large NF1 deletion and the SVA insertion at the breakpoints. (C) In order to determine whether the insertion of the SVA element occurred prior to the large NF1 deletion, we investigated genomic DNA from patient ASB4-55 by PCR. The patient exhibited somatic mosaicism for the deletion which was present in 93% of her blood cells as determined by FISH. PCRs #10 and #11 performed using blood-derived DNA from the patient as template were negative for the PCR product of 1.5-kb (PCR#10) and 1.3-kb (PCR#11) anticipated under the scenario of the SVA insertion being present whilst the large NF1 deletion was absent. We therefore concluded that patient ASB4-55 did not possess a chromosome 17 lacking the large NF1 deletion but harbouring the inserted SVA element within SUZ12P intron 8. This conclusion was further confirmed by PCR#12 performed using genomic DNA from the patient. If patient ASB4-55 were to possess a chromosome 17 with the SVA insertion but lacking the large NF1 deletion, a PCR product of 2.1-kb would have been anticipated for PCR#12. However, PCR#12 yielded only a shorter PCR product of 854-bp which was derived from a normal chromosome 17 lacking the SVA insertion. It might also have been possible that a full-length SVA element had inserted into chromosome 17. Therefore, PCRs #1012 were performed with elongation times of 3 minutes in order to amplify PCR products up to 4-kb corresponding to the size of a full-length SVA element. However, under these conditions, only shorter PCR products were obtained which were derived from a normal chromosome 17 lacking the SVA insertion. Sequence analysis of the PCR#12 product spanning 854-bp was indicative of one heterozygous SNP (rs58883430), thereby confirming the presence of two normal chromosomes 17 neither of which possessed an SVA insertion at the corresponding position in SUZ12P intron 8. We conclude that the large NF1 deletion must have occurred concurrently with the SVA element insertion.  Figure S23: PCR analysis performed to investigate whether the SVA element identified at the deletion breakpoint of patient ASB4-55 had inserted into chromosome 17 within the intergenic region between RAB11FIP4 and COPRS prior to the occurrence of the NF1 deletion. (A) Structure of a hypothetical normal chromosome 17 lacking the large NF1 deletion but possessing the 1.3-kb SVA element insertion within the intergenic region between RAB11FIP4 and COPRS. (B) In order to ascertain whether the insertion of the SVA element occurred prior to the large NF1 deletion, we investigated genomic DNA from patient ASB4-55. However, PCR#13 performed using blood-derived DNA of the patient as template was negative for the PCR product of 457-bp anticipated under the scenario of the SVA insertion being present whilst the large NF1 deletion was absent. We concluded that patient ASB4-55 did not possess a normal chromosome 17 (lacking the NF1 deletion) that nevertheless harboured the inserted SVA element within the intergenic region between RAB11FIP4 and COPRS. This conclusion was further confirmed by PCR#14 performed using genomic DNA from the patient. If the SVA element had been inserted into a chromosome 17 lacking the large deletion, a PCR product of 1.7-kb would have been expected for PCR#14. However, only a shorter PCR product of 449-bp was obtained which was derived from a normal chromosome 17 without the SVA element insertion. It might also have been possible that a full-length SVA element had inserted into chromosome 17. Therefore, PCR#14 was also performed with an elongation time of 3 minutes in order to amplify PCR products up to 4-kb corresponding to the size of a full-length SVA element. However, under these conditions, only a short PCR product of 449-bp, derived from a normal chromosome 17 lacking the insertion, was obtained. We conclude that the large NF1 deletion must have occurred concurrently with the SVA element insertion. Figure S24: Design and analysis of the 8 x 15K custom array (Agilent Technologies, Santa Clara, CA, USA), which contained 15,744 oligonucleotide probes including 4,891 control probes as well as 10,853 test probes assigned to five different groups. The probes in groups I-III were selected from the Agilent eArray library, which provides validated catalogue probes, avoiding all common repeats and other redundant sequences. Since the coverage of the paralogous NF1-REPs and the SUZ12 sequences with regard to these catalogue probes is low, we designed additional customized probes (129 probes in group IV and 137 probes in group V) located within these segmental duplications. For this purpose, we used the genomic tiling approach of the eArray tool (Agilent eArray library, https://earray.chem.agilent.com/earray/). The 129 customized probes in group IV were located within regions of absolute sequence identity between the paralogs whereas the 137 probes in group V were designed so as to contain several paralogous sequence variants in order to potentiate paralog-specific hybridization. Restriction enzymatic digestion of 0.5 µg genomic DNA from the patient as well as from a sex-matched control was performed with a mixture of AluI and RsaI at 37°C for two hrs. Sample labelling of the restriction-digested genomic DNA samples with Cy5-dUTP or Cy3-dUTP, respectively, was performed using the Genomic DNA Enzymatic Labeling Kit (Agilent Technologies). Sample hybridization and washing of the microarrays was carried out by means of the Oligo aCGH-on-chip Hybridization and Wash Buffer Kits (Agilent Technologies). Fluorescent intensities were detected with Scan Control A.8.4.1 Software on the Agilent DNA Microarray Scanner and extracted from the images using Feature Extraction 10.7.3.1 Software (Agilent Technologies) and the design file 033151_D_F_20110323.xml. The software tools Feature Extraction 10.7.3.1 and Genomic Workbench Lite 6.0.130.24 (CGH module) were used for quality control, annotation, statistical data analysis and visualization. The quality of the individual microarrays used in the experiments was validated against the quality metrics (QCmetrics) of this software (Feature Extraction 10.7.3.1). The microarray data were normalized to compensate for varying global signal intensities and to adjust them for downstream analyses. The identification of aberrant regions was performed with the analysis software Genomic Workbench Lite using the aberration algorithm ADM-2 in combination with a centralization algorithm. The analysis of the microarrays was performed by IMGM Laboratories (Martinsried, Germany).  Figure S25: Principle of the inverse PCR technique exemplified in the context of the characterization of the centromeric breakpoint of the NF1 deletion in patient ASB4-55. In step 1, 10 µg genomic DNA derived from blood of the patient was restriction-digested with PciI (New England Biolabs, Ipswich, USA). Among the resulting DNA fragments is the target fragment harbouring the deletion junction with the unknown inserted sequence (green) immediately flanked by non-deleted sequences (blue). The restriction fragments were purified with the QIAquick Nucleotide Removal Kit (Qiagen, Hilden, Germany) and eluted in 50 μl water. In step 2, self-ligation of the DNA fragments was set up in a final volume of 1 ml including 50 units T4 DNA ligase (Promega, Mannheim, Germany) and incubated overnight at 16°C. Subsequently, the self-ligation reaction mixture was purified and concentrated with the QIAquick Nucleotide Removal Kit and used as a template for PCR with inversely oriented primers (as_inv2rev and as_inv1for) located in regions flanking the telomeric deletion breakpoint (step 3). These primers were located within unique, non-repetitive sequences in the vicinity of non-deleted regions. The resulting PCR products were cloned using the StrataClone PCR Cloning Kit (Agilent Technologies, Santa Clara, USA) and sequenced from both ends using M13 primers. All enzymes and primers used to analyse the deletion breakpoint-flanking regions in patients DA-77 and ASB4-55 by inverse PCR are listed in Tables S28 and S29.

1.) Restriction of genomic DNA
2.) Self ligation as_inv1for as_inv2rev 3.) PCR amplification as_inv2rev as_inv1for AD88for AD90for egalAAL Figure S26: Semi-specific PCR performed to narrow down the breakpoint regions and to identify the unknown sequences inserted at the deletion breakpoints. The principle of the assay is explained using the identification of the centromeric deletion breakpoint in patient DA-77 as an example. (A) Schema of the centromeric deletion breakpoint region in patient DA-77. (B) The first PCR was performed with the regionspecific forward primer (AD88for) located within the non-deleted reference sequence (blue) in combination with a non-specific return PCR primer (egalAAL) which is expected to bind within the inserted, unknown sequence (green). In this first step, many different PCR fragments were amplified which were distinguishable by their lengths as schematically indicated. An aliquot of 4 µl (100 ng) of the resulting heterogeneous PCR products was used as a template for a nested PCR (step 2) using a region-specific primer AD89for together with the nonspecific return primer egalAAL. In step 3 of the assay, a further nested PCR was performed using 4 µl of the second PCR as a template and the primers AD90for and egalAAL. Subsequently, the PCR products resulting from step 3 were subject to direct sequence analysis or were cloned and sequenced from both ends using vector-based M13 primers. All PCRs were performed using the Expand Long Template PCR system (Roche, Mannheim, Germany). All primers used to analyse the deletion breakpoint-flanking regions in patients DA-77 and ASB4-55 by semispecific PCR are listed in Tables S30 and S31. GSP_2for unknown inserted sequence AP1 AP2 Figure S27: Principle of the GenomeWalker assay used to identify unknown sequences inserted at the deletion breakpoints of patients DA-77 and ASB4-55. In the first step, 2.5 µg genomic DNA was digested with a blunt-end restriction enzyme (New England Biolabs, Ipswich, USA) for two hours at the enzymespecific temperature. Among the restricted DNA fragments was the target fragment that encompasses the deletion breakpoint region (blue) and the adjacent unknown sequence (green). After inactivation of the enzyme, the fragmented DNA was purified by means of the Nucleotide Removal Kit (Qiagen, Hilden, Germany) and resolved in a final volume of 30 µl TE-buffer. An aliquot (4 µl) of the purified DNA fragments was then added to the ligation-reaction which also included 1.9 µl oligonucleotide adaptors (indicated in red), 1.6 µl T4 DNA ligase buffer and 0.5 µl T4 DNA ligase. The ligation was performed at 16°C overnight. The next day, the reaction was inactivated at 70°C for 5 minutes, 72 µl TE-buffer were added and this library of adaptor-ligated restriction fragments was then used as a template for two subsequent PCR steps. The first-step PCR was performed with a region-specific primer (GSP_1for) located within the non-deleted sequence close to the deletion breakpoint (blue) and the adaptor primer (AP1) which hybridized to the adaptor. Subsequently, an aliquot of this first PCR was used as template for the second PCR using a nested region-specific primer (GSP_2for) and the adaptor primer AP2. Both PCRs were performed with the Advantage® 2 PCR Kit (Clontech, Saint-Germain-en-Laye, France), according to the manufacturer's instructions. The PCR products were then gel-purified (S.N.A.P.™ UV-Free Gel Purification Kit, Invitrogen, CA, USA) and cloned (StrataClone PCR Cloning Kit, Agilent Technologies, Santa Clara, CA, USA) prior to sequence analysis. All enzymes and primers used for these assays are listed in Tables S32 and S33.