Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Matchtigs: minimum plain text representation of k-mer sets

Fig. 3

An example of the matchtigs algorithm. A The input genomic sequences. B We first build an arc-centric compacted de Bruijn graph (for simplicity, the reverse complements of the nodes and arcs are not shown). C In the graph we compute the bi-imbalances of the nodes (the difference between outdegree and indegree). D From each node with negative bi-imbalance we compute the min-cost paths to all reachable nodes with positive bi-imbalance. The costs of each arc are the amount of characters required to join two strings from the negative to the positive node while repeating the k-mers between the nodes. Specifically, the costs of an arc are \(|s| - (k-1)\), where |s| is the length of its label. E Using a min-cost perfect matching instance built from the min-cost paths, we decide which bi-imbalances should be fixed by repeating k-mers. The blue/tightly dashed edges are joining edges stemming from the min-cost paths. The red edges in longer dashes indicate that a node should stay unmatched, i.e. that fixing its bi-imbalance requires breaking arcs. The solution edges are highlighted in bold. There is one node in the matching problem for each binode in the original graph. The nodes \(x'\) are not reverse complements of nodes x, but stem from a reduction that makes a copy of each node. For more details, refer to the “Solving the min-cost integer ow formulation with min-cost matching” section. F For each joining edge in the solution we insert a joining arc into the DBG (in blue, small dashes), always directed such that the overall bi-imbalance decreases. The remaining imbalance is removed by inserting arbitrary breaking arcs (in read, longer dashes). G We compute a biEulerian circuit in the balanced graph. H We break the biEulerian circuit at all breaking arcs. I We output the strings spelled by the broken walks

Back to article page