Skip to main content

Table 2 Collective benefits of using the personalized assembly as reference for read mapping and somatic SNV/SV detection as compared to GRCh38

From: Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples

 

Benefits of using personalized genome assembly as reference

Proofs

Personalized genome (PG) assembly

• Individualized assembly with inclusion of the individual-specific haplotypes, and better representations for the clinically important genes; no need to deal with ALT loci in NGS secondary analysis

• Fig. 3B, Fig. S5A-G, Fig. S9A/B, Fig. S12A-E; HLA genes, GSTT1, KIR2DL5A

Read mapping

Illumina short reads

• More properly paired reads (HCC1395BL 0.5%, HCC1395 0.46%), fewer improperly paired reads (HCC1395BL 41.4%, HCC1395 38.2%)

• Fewer mismatches (HCC1395BL 18.2%, HCC1395 16.6%)

• Fewer soft-clipped reads (HCC1395BL 11.7%, HCC1395 11.6%), fewer hard-clipped reads (HCC1395BL 32.0%, HCC1395 28.7%)

• Fewer split reads (HCC1395BL 31.9%, HCC1395 28.8%)

• Better read placements with smaller standard deviations for library insert sizes (HCC1395BL 2.76, HCC1395 2.83)

• More uniformly read placements with smaller standard deviations for read coverages (HCC1395BL 4.31, HCC1395 4.92)

• Fig. 4A/B

• Fig. 4C

• Fig. S4A/B

• Fig. 4D

• Fig. S4C

• Fig. 4E

PacBio long reads

• Higher numbers of reads being mapped (HCC1395BL 1.65%, HCC1395 2.98%)

• Fewer mismatches (HCC1395BL 1%, HCC1395 1%)

• Lower non-primary/supplementary alignments (HCC1395BL 6.73%/14.5%, HCC1395 1.8%/10.39%)

• More uniformly read placements with smaller standard deviations for read coverages (HCC1395BL 12.08, HCC1395 11.48)

• Fig. S4D

• Fig. S4D

• Fig. S4D

• Fig. 4F

Somatic SNV detection

Illumina short reads

• Total somatic SNV counts increased by, on average, 1689 SNPs and 415 InDels

• Novel SNVs discovered (1017), 177 overlapping with 71 genes, e.g., GTF2H2 and PTPN13, and some were confirmed by Sanger sequencing (8 out of 10 selected SNVs)

• Context sequences of somatic SNVs more accurate, some with germline SNVs (3995)

• Avoid GRCh38-only, non-personalized SNVs (901)

• Fig. 5A/B, Table S5

• Fig. 5D, Tables S6/S7/S8, Fig. S6

• Fig. 5C, Table S6, Fig. S5A-G

• Fig. 5C

Somatic SV detection

Illumina short reads

• Somatic SV counts increased by 82/GRIDSS2, 189/Manta, 54/Delly, and 86/novoBreak

• Novel SVs discovered (59), including 17 gene-overlapping SVs, e.g., CCDC91

• SV resolution more accurate, e.g., SV with MED12L gene

• Avoid GRCh38-only, non-personalized SVs (29)

• Fig. 6A, Tables S9/S10, Fig. 7A, Fig. S13A/B

• Fig. 6D/E, Table S11, Fig. S9A/B, Fig. 7D

• Fig. S9B

• Figs. 6C and 7C

PacBio long reads; assembled contigs

• Somatic SV counts increased by 194 (with supports by 2 or more calling methods)

• Novel SVs discovered (279), including 91 gene-overlapping SVs, e.g., CDH23, ST14, GNG7

• SV resolution more accurate, e.g., SV with MED12L gene

• Avoid GRCh38-only, non-personalized SVs (213)

• Fig. 8A, Fig. S10, Fig. S13A/B

• Fig. 8C/D, Table S14, Fig. S12A-E

• Fig. S9B

• Fig. 8B, Table S13