Skip to main content

Table 2 Glossary. Here, positive (P) or negative (N) describes the SV detection (or SV calling), and true (T) or false (F) describes if the calling was correct. Thus, SVs are true positive (TP) if they are called or false negatives (FN) if they are not called but present in the sample. Conversely, SVs that are not in the sample are true negatives (TN) if they are not called or false positives (FP) if they are called

From: Structural variant calling: the long and the short of it

WordDefinition
AccuracyProportion of correctly identified events (T) to the overall events: (TP + TN)/(TP + TN + FP + FN).
BreakpointsPositions on the genome denoting the start and end of SVs relative to the reference genome.
ContigsContiguous sequence stretches assembled from reads.
De Bruijn graphDirected graph consisting of nodes with exactly n incoming and n outgoing edges. In genome assemblies, a de Bruijn graph is built where the nodes are k-mers (sequences of length k) and the edges correspond to the overlap on k − 1 bases between nodes.
String graph-based assemblySimilar method to De Bruijn graph-based assembly, but in this case, the overlaps between all read pairs (instead of k-mers) are computed to construct a string graph based on the overlaps.
Insert sizeThe distance between the two paired-end reads.
OverhangPortion of a mapped read that cannot be aligned and thus could indicate a structural variation.
PhasingThe identification of two or more heterozygous variations are co-occurring on the same or different DNA molecule.
Precision (or positive predictive value)Proportion of predictions (FP + TP) that are correct (TP).
Recall (or sensitivity or true-positive rate)Proportion of the total positives (FN + TP) that were correctly identified (TP).
ScaffoldConnected contiguous sequence stretches, with unresolved sequence stretches in between.
Split readsReads containing parts that map in different loci on the reference genome. They are found by splitting the read in sub-segments, align individually each sub-segment, and then grouping sub-fragments from one read.
Tandem sequenceA specific type of repetitive region that was repeated directly adjacent to each other.