Skip to main content

Table 1 Glossary of terms

From: When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data

Term Definition
Bit-pattern observable The run of 0 s in a binary string
Bit vector An array data structure that holds bits
Canonical k-mer The smallest hash value between a k-mer and its reverse complement
Hash function A function that takes input data of arbitrary size and maps it to a bit string that is of fixed size and typically smaller than the input
Jaccard similarity A similarity measure defined as the intersection of sets, divided by their union
K-mer decomposition The process of extracting all sub-sequences of length k from a sequence
Minimizer The smallest hash value in a set
Multiset A set that allows for multiple instances of each of its elements (i.e. element frequency)
Register A quickly accessible bit vector used to hold information
Sketch A compact data structure that approximates a data set
Stochastic averaging A process used to reduce the variance of an estimator