Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: DIVE: a reference-free statistical approach to diversity-generating and mobile genetic element discovery

Fig. 1

DIVE algorithm and simulations. a The termini of transposable elements (TEs) shown in red (5′ end) and green (3′ end) will contain a diverse set of neighboring sequences as a result of different insertions in a genome. In turn, a highly variable set of target sequences will be observed for the anchors overlapping with the termini (red and green k-mers). Similarly, the sequences sorrounding cargo gene hotspots in integrative and conjugative elements (ICE) will be followed or preceded by a set of diverse sequences when the cargo genes vary across the analyzed sample. Lastly, CRISPR repeats will also contain a diverse set of neighboring sequences due to the diversity of spacer sequences. In this case, the diversity will be observed both upstream and downstream of the anchor as shown in the cartoon. b DIVE processes reads sequentially using a sliding window that moves along the sequencing reads, recording for each anchor the upstream and downstream k-mers (targets). For each anchor, a target dictionary is constructed where DIVE keeps track of the target sequences observed clustering them as they are observed. c Example of the cluster formation process based on Jaccard similarity (JS) for a given anchor (downstream case). The first target observed for the green anchor is a pink sequence. An entry is created for the green anchor in the anchor dictionary and a target dictionary is initialized for this anchor containing just the pink sequence. Then a yellow and a blue sequence are observed which, given the dissimilarity with respect to the previously observed sequences, result in a new entries in the target dictionary of the anchor. Finally, another pink sequence is observed, and given its similarity to the firstly observed pink sequence, the two are clustered together. With probability 50%, the newly observed pink sequence becomes the key in the target dictionary and the count for the pink cluster is increased. d Sensitivity curves for DIVE and MGEfinder in our simulations of ancestral and active MGEs (Additional file 1). In the ancestral element simulations we evaluate the performance considering different copy numbers, whereas in the active element we consider various levels of element activity. The sensitivity of DIVE in detecting MGE termini is higher than that of MGEfinder in both cases for most coverages, copy number, and activity level

Back to article page