Skip to main content

Table 2 Glossary. Here, positive (P) or negative (N) describes the SV detection (or SV calling), and true (T) or false (F) describes if the calling was correct. Thus, SVs are true positive (TP) if they are called or false negatives (FN) if they are not called but present in the sample. Conversely, SVs that are not in the sample are true negatives (TN) if they are not called or false positives (FP) if they are called

From: Structural variant calling: the long and the short of it

Word

Definition

Accuracy

Proportion of correctly identified events (T) to the overall events: (TP + TN)/(TP + TN + FP + FN).

Breakpoints

Positions on the genome denoting the start and end of SVs relative to the reference genome.

Contigs

Contiguous sequence stretches assembled from reads.

De Bruijn graph

Directed graph consisting of nodes with exactly n incoming and n outgoing edges. In genome assemblies, a de Bruijn graph is built where the nodes are k-mers (sequences of length k) and the edges correspond to the overlap on k − 1 bases between nodes.

String graph-based assembly

Similar method to De Bruijn graph-based assembly, but in this case, the overlaps between all read pairs (instead of k-mers) are computed to construct a string graph based on the overlaps.

Insert size

The distance between the two paired-end reads.

Overhang

Portion of a mapped read that cannot be aligned and thus could indicate a structural variation.

Phasing

The identification of two or more heterozygous variations are co-occurring on the same or different DNA molecule.

Precision (or positive predictive value)

Proportion of predictions (FP + TP) that are correct (TP).

Recall (or sensitivity or true-positive rate)

Proportion of the total positives (FN + TP) that were correctly identified (TP).

Scaffold

Connected contiguous sequence stretches, with unresolved sequence stretches in between.

Split reads

Reads containing parts that map in different loci on the reference genome. They are found by splitting the read in sub-segments, align individually each sub-segment, and then grouping sub-fragments from one read.

Tandem sequence

A specific type of repetitive region that was repeated directly adjacent to each other.