Skip to main content

Table 1 Sources of data for parental haplotype inference and benchmarking

From: Determination of complete chromosomal haplotypes by bulk DNA sequencing

Sample Data type Data source Read count Mean Contacts Application
     depth (>1Mb)  
RPE-1 Bulk WGS [24] 228,708,769a 13 ×   Variant calling
RPE-1 Linked reads New 941,518,426b 60 ×c   Variant calling
       & local phasing
RPE-1 CCS long reads New 4,607,047d 11 ×   Local phasing
RPE-1 Hi-C [44] 281,285,484e   48,124,211 Long-range phasing
RPE-1 Single cell with New     hi-conf variants and
  monosomies      reference haplotypes
NA12878 Linked reads v.1 10X Genomicsf 422,179,395g 35 ×c   Local phasing
NA12878 Linked reads v.2 10X Genomicsh 423,854,243i 35 ×c   Local phasing
NA12878 Hi-C [35] 486,848,169j   91,428,507 Long-range phasing
NA12878 Phased VCF GIABk     hi-conf variants and
       reference haplotypes
NA12878 Phased VCF Diploid assemblyl     hi-conf variants and
       reference haplotypes
  1. aSRR1778442: median insert 243; 208,151,992 fragments aligned in pair; 2 ×101bp reads; duplication rate 0.024.
  2. bMean molecular length 24.8kb; median insert 551; 913,660,083 aligned in pair; 2 ×150bp reads; duplication rate 0.255.
  3. cexcluding the GEMcode sequence and duplicated fragments
  4. dMean read length 7.1kb; 4,606,654 aligned.
  5. eSRS1045722: median insert 364; 279,027,892 aligned in pair; 2 ×150bp reads; duplication rate 0.067.
  6. fhttps://support.10xgenomics.com/genome-exome/datasets/2.1.0/NA12878_WGS_210
  7. gMean molecular length 68.7kb; median insert 349; 407,015,530 aligned in pair; 2 ×150bp reads; duplication rate 0.062.
  8. hhttps://support.10xgenomics.com/genome-exome/datasets/2.2.1/NA12878_WGS_v2
  9. iMean molecular length 85.6kb; median insert 370; 418,283,435 aligned in pair; 2 ×150bp reads; duplication rate 0.079.
  10. jSRR1658572: median insert 377; 484,211,662 aligned in pair; 2 ×101bp reads; duplication rate 0.028.
  11. khttps://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
  12. lhttp://ftp.dfci.harvard.edu/pub/hli/hifiasm/NA12878-r253/. Phased variants were determined using dipcall (https://github.com/lh3/dipcall) on the sequences of parental chromosomes generated by diploid de novo assembly of the NA12878 genome using PacBio High-Fidelity long reads together with short reads of the parental genomes using hifiasm [40].