Skip to main content

Table 2 Methods and computational tools for haplotype reconstruction

From: Computational methods for chromosome-scale haplotype reconstruction

Approach Tools Data Advantages Disadvantages
Reference-based phasing
 Molecular haplotyping WhatsHap [44], HapCut2 [45] and ProbHap [46] Long reads such as PacBio, Hi-C of individual Can phase de novo and rare variants Limitations in complex regions such as centromeres, HLA, etc.
 Single-cell phasing CHISEL [47], Satas et al. [48], RCK [49] Single-cell short-read High precision at single-cell, detection of rare alleles Engineering tricks required to scale to > million cells
 Polyploid phasing HapTree [50], Hap10 [51], WhatsHap-polyphase [52], H-PoP [53] Local phasing Can phase de novo and rare variants Limitations in repetitive regions and not optimized for ploidy > 5
De novo assembly
 Diploid assembly Falcon Unzip [23], Falcon phase [54] Long reads and Hi-C of individual Local phased contigs No chromosome-scale assembly and computationally expensive
DipAsm [55], Porubsky et al. [56] Long reads and Hi-C of individual Chromosome-scale diploid assembly Collapsed assembly not suitable for repetitive regions
Hifiasm, HiCanu [57], SDip [58] HiFi reads of individual High consensus accuracy and continuity No chromosome-scale assembly
pstools Hifi and Hi-C reads High-quality chromosome-scale haplotype assembly Only designed for haplotyping diploids
TrioCanu [59], Hifiasm+trio, WHdenovo [60] Long reads of trios Local phased contigs Require family information
 Polyploid assembly SDA [61], SDip [58] Long reads of individual Local phased contigs Need to be optimized for whole genomes
POLYTE [62] Illumina short reads Local phased contigs Does not scale well to whole genomes
Strain-resolved metagenome assembly
 De novo (re-) assembly IDBA-UD [63], DESMAN [64] Metagenome short reads No prior knowledge required Low sensitivity: rare haplotypes can remain undetected
OPERA-MS [65] Metagenome using short and long reads High continuity Computationally expensive
 SNV-based assembly ConStrains [66], StrainFinder [67], Gretel [68] Metagenome short reads Computational efficiency Assembly accuracy depends on variant calling
 Read binning MetaMaps [69] Metagenome long reads Computational efficiency Accuracy depends on database
 Contig binning ProxiMeta [70], bin3C [71] Metagenome short reads and Hi-C Reference-free, ability to link plasmids to host chromosome Multiple technologies necessary (Hi-C + shotgun sequencing)