Skip to main content
Fig. 7 | Genome Biology

Fig. 7

From: Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis

Fig. 7

Clustering and matching PRJEB31736 runs by referring 1KG samples. a MDS plot of 1730 PRJEB31736 runs using Kssd reference subtracted sketches. b Genotype PCA using combined VCF file of HG01855 (PRJEB31736) and 2504 1KG samples. c Matching HG01855 (PRJEB31736) to 2504 1KG samples using QTLtools mbv. Each black circle represents a 1KG sample, with x and y indicating percentages of consistent genotypes on heterozygous and homozygous sites, respectively. d Matching HG01855 (PRJEB31736) to 2504 1KG samples using Kssd. Each black circle represents a 1KG sample, with x and y indicating Jaccard and containment coefficients to HG01855, respectively. e Execution time of the above analyses. MDS plot of 160 testing runs using f Mash sketches, g Kssd sketches without reference subtraction, and h Kssd sketches with reference subtraction

Back to article page