Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: nPhase: an accurate and contiguous phasing method for polyploids

Fig. 2

nPhase algorithm. Here we represent how a triploid’s reads could align to a reference sequence. Each read is one of three colors, one for each haplotype. The clustering, consensus, and cluster identity maintenance steps are iteratively repeated until all remaining clusters are forbidden to merge. Clustering: each vertical line represents a SNP; different colors signify different haplotypic origins. Only two reads are clustered at a time; here we show three clusters, so this is the result of the third step of nPhase’s iterative clustering. Consensus: a consensus sequence is generated by allowing every read in the cluster to vote for a specific base for a given position. Votes are weighted by the pre-calculated context coverage number to discourage sequencing errors. The consensus sequences that represent clusters are treated just like aligned long reads and continue to be clustered. Cluster identity maintenance: when all remaining clusters are very different from each other, they are not allowed to merge; this is to prevent the algorithm from always outputting only one cluster per region. The remaining clusters and their consensus sequences should correspond to the haplotypes present in the original dataset

Back to article page