Skip to main content

Table 1 Summary of quality assessments of assemblies from different scaffolding strategies. The PacBio contig assembly of HCC1395BL was selected for further scaffolding. Generally, two steps of scaffolding, using 10X Genomics linked reads with ARCS followed by Hi-C reads with SALSA (PacBio_canu + ARCS + SALSA), produced a better scaffolded assembly than using one-step scaffolding only (either PacBio_canu + ARCS or PacBio_canu + SALSA). The final scaffold assembly (HCC1395BL_v1.0) is the one with the highest Top50 and scaffold N50 values, the largest scaffold size, and the greatest numbers of mapped complete BUSCOs and RefSeq transcripts

From: Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples

 

# Scaffolds

# bp from scaffolds

Top50

N50

L50

Largest scaffold length (bp)

# Scaffolds on GRCh38

# novel scaffolds

# bp from novel scaffolds

# Complete BUSCO (4,104 BUSCO)

# NMs 95+% mapped (50,052 NMs)

# NRs 95+% mapped (15,544 NRs)

PB_canu (contigs)

2828

2,904,842,414

1,356,447,278 (46.69%)

13,480,407

57

62,208,403

2,526

302

16,403,702

3890

49,287

15,115

PB_canu + 10X_arcs (scaffolds)

2032

2,904,931,213

2,067,627,443 (71.17%)

35,058,531

26

121,623,092

1,764

268

14,867,391

3800

49,570

15,207

PB_canu + HiC_salsa (scaffolds)

1891

2,905,381,691

2,377,981,926 (81.84%)

46,871,224

19

180,772,639

1,617

274

15,303,825

3892

49,495

15,177

PB_canu + 10X_arcs + HiC_salsa (scaffolds, HCC1395BL_v1.0)

1645

2,905,196,510

2,691,295,119 (92.62%)

69,970,292

14

181,209,810

1,406

244

14,104,388

3889

49,613

15,227