Skip to main content

Table 1 Sources of data for parental haplotype inference and benchmarking

From: Determination of complete chromosomal haplotypes by bulk DNA sequencing

Sample

Data type

Data source

Read count

Mean

Contacts

Application

    

depth

(>1Mb)

 

RPE-1

Bulk WGS

[24]

228,708,769a

13 ×

 

Variant calling

RPE-1

Linked reads

New

941,518,426b

60 ×c

 

Variant calling

      

& local phasing

RPE-1

CCS long reads

New

4,607,047d

11 ×

 

Local phasing

RPE-1

Hi-C

[44]

281,285,484e

 

48,124,211

Long-range phasing

RPE-1

Single cell with

New

   

hi-conf variants and

 

monosomies

    

reference haplotypes

NA12878

Linked reads v.1

10X Genomicsf

422,179,395g

35 ×c

 

Local phasing

NA12878

Linked reads v.2

10X Genomicsh

423,854,243i

35 ×c

 

Local phasing

NA12878

Hi-C

[35]

486,848,169j

 

91,428,507

Long-range phasing

NA12878

Phased VCF

GIABk

   

hi-conf variants and

      

reference haplotypes

NA12878

Phased VCF

Diploid assemblyl

   

hi-conf variants and

      

reference haplotypes

  1. aSRR1778442: median insert 243; 208,151,992 fragments aligned in pair; 2 ×101bp reads; duplication rate 0.024.
  2. bMean molecular length 24.8kb; median insert 551; 913,660,083 aligned in pair; 2 ×150bp reads; duplication rate 0.255.
  3. cexcluding the GEMcode sequence and duplicated fragments
  4. dMean read length 7.1kb; 4,606,654 aligned.
  5. eSRS1045722: median insert 364; 279,027,892 aligned in pair; 2 ×150bp reads; duplication rate 0.067.
  6. fhttps://support.10xgenomics.com/genome-exome/datasets/2.1.0/NA12878_WGS_210
  7. gMean molecular length 68.7kb; median insert 349; 407,015,530 aligned in pair; 2 ×150bp reads; duplication rate 0.062.
  8. hhttps://support.10xgenomics.com/genome-exome/datasets/2.2.1/NA12878_WGS_v2
  9. iMean molecular length 85.6kb; median insert 370; 418,283,435 aligned in pair; 2 ×150bp reads; duplication rate 0.079.
  10. jSRR1658572: median insert 377; 484,211,662 aligned in pair; 2 ×101bp reads; duplication rate 0.028.
  11. khttps://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
  12. lhttp://ftp.dfci.harvard.edu/pub/hli/hifiasm/NA12878-r253/. Phased variants were determined using dipcall (https://github.com/lh3/dipcall) on the sequences of parental chromosomes generated by diploid de novo assembly of the NA12878 genome using PacBio High-Fidelity long reads together with short reads of the parental genomes using hifiasm [40].