Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Fig. 2

The workflow of building a GTB file in the GBC framework. a Slice (one or more) inputs into several chunks according to the specified number of parallel threads, and then each chunk is divided into several blocks for compression. b Code genotypes of each variant with byte-encoded genotype (BEG). c Combine multiple BEG of each biallelic variant into maximized byte-encoded genotype (MBEG). d Sort variants by approximate minimum discrepancy ordering (AMDO) to improve compression ratio. The sample graph is produced from the first 4000 biallelic variants of assoc.hg19.vcf.gz (download from https://doi.org/10.5281/zenodo.7737556). Genotype 0|0 is filled with white, 0|1 or 1|0 is filled with gray, and 1|1 is filled with black. e Compress the position, genotype, and allele data with an advanced compressor separately. Then, the compressed data (entity data) is concatenated into a long array and written into the disk, while the abstract information is recorded in the memory. f Store the compressed data in Genotype Block (GTB) format

Back to article page