Skip to main content

Table 2 Methods and computational tools for haplotype reconstruction

From: Computational methods for chromosome-scale haplotype reconstruction

Approach

Tools

Data

Advantages

Disadvantages

Reference-based phasing

 Molecular haplotyping

WhatsHap [44], HapCut2 [45] and ProbHap [46]

Long reads such as PacBio, Hi-C of individual

Can phase de novo and rare variants

Limitations in complex regions such as centromeres, HLA, etc.

 Single-cell phasing

CHISEL [47], Satas et al. [48], RCK [49]

Single-cell short-read

High precision at single-cell, detection of rare alleles

Engineering tricks required to scale to > million cells

 Polyploid phasing

HapTree [50], Hap10 [51], WhatsHap-polyphase [52], H-PoP [53]

Local phasing

Can phase de novo and rare variants

Limitations in repetitive regions and not optimized for ploidy > 5

De novo assembly

 Diploid assembly

Falcon Unzip [23], Falcon phase [54]

Long reads and Hi-C of individual

Local phased contigs

No chromosome-scale assembly and computationally expensive

DipAsm [55], Porubsky et al. [56]

Long reads and Hi-C of individual

Chromosome-scale diploid assembly

Collapsed assembly not suitable for repetitive regions

Hifiasm, HiCanu [57], SDip [58]

HiFi reads of individual

High consensus accuracy and continuity

No chromosome-scale assembly

pstools

Hifi and Hi-C reads

High-quality chromosome-scale haplotype assembly

Only designed for haplotyping diploids

TrioCanu [59], Hifiasm+trio, WHdenovo [60]

Long reads of trios

Local phased contigs

Require family information

 Polyploid assembly

SDA [61], SDip [58]

Long reads of individual

Local phased contigs

Need to be optimized for whole genomes

POLYTE [62]

Illumina short reads

Local phased contigs

Does not scale well to whole genomes

Strain-resolved metagenome assembly

 De novo (re-) assembly

IDBA-UD [63], DESMAN [64]

Metagenome short reads

No prior knowledge required

Low sensitivity: rare haplotypes can remain undetected

OPERA-MS [65]

Metagenome using short and long reads

High continuity

Computationally expensive

 SNV-based assembly

ConStrains [66], StrainFinder [67], Gretel [68]

Metagenome short reads

Computational efficiency

Assembly accuracy depends on variant calling

 Read binning

MetaMaps [69]

Metagenome long reads

Computational efficiency

Accuracy depends on database

 Contig binning

ProxiMeta [70], bin3C [71]

Metagenome short reads and Hi-C

Reference-free, ability to link plasmids to host chromosome

Multiple technologies necessary (Hi-C + shotgun sequencing)