| Benefits of using personalized genome assembly as reference | Proofs | |
---|---|---|---|
Personalized genome (PG) assembly | • Individualized assembly with inclusion of the individual-specific haplotypes, and better representations for the clinically important genes; no need to deal with ALT loci in NGS secondary analysis | • Fig. 3B, Fig. S5A-G, Fig. S9A/B, Fig. S12A-E; HLA genes, GSTT1, KIR2DL5A | |
Read mapping | Illumina short reads | • More properly paired reads (HCC1395BL 0.5%, HCC1395 0.46%), fewer improperly paired reads (HCC1395BL 41.4%, HCC1395 38.2%) • Fewer mismatches (HCC1395BL 18.2%, HCC1395 16.6%) • Fewer soft-clipped reads (HCC1395BL 11.7%, HCC1395 11.6%), fewer hard-clipped reads (HCC1395BL 32.0%, HCC1395 28.7%) • Fewer split reads (HCC1395BL 31.9%, HCC1395 28.8%) • Better read placements with smaller standard deviations for library insert sizes (HCC1395BL 2.76, HCC1395 2.83) • More uniformly read placements with smaller standard deviations for read coverages (HCC1395BL 4.31, HCC1395 4.92) | • Fig. 4A/B • Fig. 4C • Fig. S4A/B • Fig. 4D • Fig. S4C • Fig. 4E |
PacBio long reads | • Higher numbers of reads being mapped (HCC1395BL 1.65%, HCC1395 2.98%) • Fewer mismatches (HCC1395BL 1%, HCC1395 1%) • Lower non-primary/supplementary alignments (HCC1395BL 6.73%/14.5%, HCC1395 1.8%/10.39%) • More uniformly read placements with smaller standard deviations for read coverages (HCC1395BL 12.08, HCC1395 11.48) | • Fig. S4D • Fig. S4D • Fig. S4D • Fig. 4F | |
Somatic SNV detection | Illumina short reads | • Total somatic SNV counts increased by, on average, 1689 SNPs and 415 InDels • Novel SNVs discovered (1017), 177 overlapping with 71 genes, e.g., GTF2H2 and PTPN13, and some were confirmed by Sanger sequencing (8 out of 10 selected SNVs) • Context sequences of somatic SNVs more accurate, some with germline SNVs (3995) • Avoid GRCh38-only, non-personalized SNVs (901) | • Fig. 5D, Tables S6/S7/S8, Fig. S6 • Fig. 5C, Table S6, Fig. S5A-G • Fig. 5C |
Somatic SV detection | Illumina short reads | • Somatic SV counts increased by 82/GRIDSS2, 189/Manta, 54/Delly, and 86/novoBreak • Novel SVs discovered (59), including 17 gene-overlapping SVs, e.g., CCDC91 • SV resolution more accurate, e.g., SV with MED12L gene • Avoid GRCh38-only, non-personalized SVs (29) | • Fig. 6A, Tables S9/S10, Fig. 7A, Fig. S13A/B • Fig. 6D/E, Table S11, Fig. S9A/B, Fig. 7D • Fig. S9B |
PacBio long reads; assembled contigs | • Somatic SV counts increased by 194 (with supports by 2 or more calling methods) • Novel SVs discovered (279), including 91 gene-overlapping SVs, e.g., CDH23, ST14, GNG7 • SV resolution more accurate, e.g., SV with MED12L gene • Avoid GRCh38-only, non-personalized SVs (213) | • Fig. 8A, Fig. S10, Fig. S13A/B • Fig. 8C/D, Table S14, Fig. S12A-E • Fig. S9B |