Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Vcfanno: fast, flexible annotation of genetic variants

Fig. 3

Parallel sweeping algorithm. As in Fig. 2, we sweep across the chromosome from lower to higher positions (and left to right in the figure). The green query intervals are to be annotated with the two annotation files depicted with blue and orange intervals. The parallelization occurs in chunks of query intervals delineated by the black vertical lines. One process reads query intervals into memory until a maximum gap size to the next interval is reached (e.g., chunks 2, 4) or the number of intervals exceeds the chunk size threshold (e.g., chunks 1, 3). While a new set of query intervals accumulates, the first chunk, bounded to the right by the first vertical black line above, is sent for sweeping and a placeholder is put into a FIFO (first-in, first-out) queue, so that the output remains sorted even though other chunks may finish first. The annotation files are queried with regions based on the bounds of intervals in the query chunk. The queries then return streams of intervals and, finally, those streams are sent to the chrom-sweep algorithm in a new process. When it finishes, its placeholder can be pulled from the FIFO queue and the results are yielded for output

Back to article page