Multiple genome alignment in the telomere-to-telomere assembly era

Kille, Bryce; Balaji, Advait; Sedlazeck, Fritz J.; Nute, Michael; Treangen, Todd J.

doi:10.1186/s13059-022-02735-6

Table 1 The many challenges of MGA and potential solutions. As MGA incorporates both local alignment and global MSA, many challenges are shared. While some ideas from MSA have been able to improve the capabilities of MGA, the challenges are by no means solved. Improving runtime, performance across divergent genomes, and accuracy in the presence of repeats still remain important challenges for MGA

From: Multiple genome alignment in the telomere-to-telomere assembly era

Challenge	Solution
Evolutionary distance between genomes increases alignment difficulty [68–70].	Adding closely related species to the input dataset bridges the evolutionary gap between genomes in the input and can increase alignment sizes [71].
Computational costs are currently prohibitive for many large-scale MGAs.	Progressive methods eliminate the need for O(n²) pairwise alignments by only computing a constant number of pairwise alignments at each of the O(n) nodes.
Low complexity repeats often cause spurious alignments and/or dramatically increase computational cost.	Repeat-masking greatly simplifies the problem of alignment. Indeed, the Cactus GitHub repository states that “genomes that aren’t properly masked can still take tens of times longer to align that those that are masked." [72]
Progressive alignments for whole genomes don’t account for incomplete lineage sorting or horizontal gene transfer, rendering it an incomplete model for both eukaryotes and bacteria.	Divide-and-conquer approaches, similar to those recently used for MSA, could potentially be used to allow sections of genomes to be treated as arising from different phylogenies.
Sequencing error and micro-rearrangements can mask the existence of longer stretches of homologous regions.	Modifications to the genome graphs, similar to those for de-Bruijn graph cleaning, can result in longer, more inclusive LCBs.
The number of pairwise anchors will grow significantly with the number of genomes being compared and the number of anchors present in all genomes will decrease.	Anchoring methods that aim to identify multiple local alignments may bridge the gap between all-pairs anchors and anchors present in all genomes.
While MSA allows for alignment of regions within an LCB, there are limited methods for extending the borders of LCBs.	Multiple alignment extension algorithms, such as the one used in procrastAligner [73], can be employed.

Back to article page

ISSN: 1474-760X

Contact us

Submission enquiries: editorial@genomebiology.com
General enquiries: info@biomedcentral.com

Genome Biology

Contact us