Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD

Fig. 1

Schematic representation of EMERALD’s safety window calculation of a DIAMOND DeepClust cluster containing 4 member sequences. EMERALD performs a pairwise global alignment between the cluster representative against each of the 4 cluster member sequences using affine gap costs and BLOSUM62 as substitution matrix. For the first sequence pair, the right-hand side illustrates the suboptimal alignment graph and their corresponding suboptimal alignment configurations between the two sequences listed as \(\Delta\)-suboptimal alignments (an alignment is \(\Delta\)-suboptimal if its score is not more than \(\Delta\) smaller than the optimal score). The illustrated graph is one of minimum size to fulfill the property of including all \(\Delta\)-suboptimal alignments (here, we choose \(\Delta = 8\)). Source-to-sink paths in the graph correspond to suboptimal alignments; nodes and edges on the unique optimal alignment path are shown in black, while those configurations on a \(\Delta\)-suboptimal path are illustrated in gray. The optimal alignment path is color coded in black and the two top \(\Delta\)-suboptimal alignment paths illustrated in orange and blue. For \(\alpha = 0.75\) and \(\Delta = 8\), we obtain three safety windows shown as green intervals. These three colored safety windows correspond to subpaths contained in at least \(\alpha =0.75\) (i.e., \(75\%\)) of all source-to-sink paths (i.e., of all \(\Delta\)-suboptimal alignments). Note that the middle safety window is not captured (i.e., contained) by the (unique) optimal alignment, in black, and is only revealed by the subgraph of all \(\Delta\)-suboptimal alignments. Finally, we project the safety windows onto the cluster member (and cluster representative sequence) as explained in (the “Methods" section). This procedure is repeated for all possible pairwise comparisons between the representative sequence and the 4 members, thereby obtaining \((\alpha ,\Delta )\)-safety windows for each cluster member (bottom left)

Back to article page