Data quality and the influence of library preparation protocol in whole genome sequencing data. (A) Hierarchical clustering of pairwise k-mer distance measures across WGS samples. Samples prepared using different protocols are indicated in different colors. (B) Percentage of aligned reads per sample. Black and grey bars separate samples from different individuals. Red and blue circles indicate the choice of library preparation protocol. (C) Percentage of duplicated reads. (D) Percentage of properly paired reads. (E) Percentage of paired reads that map to different chromosomes. (F) Distribution of average GC content per read. Samples prepared using different protocols are colored accordingly. (G) Distribution of estimated insert size. (H) Distribution of the number of base pairs that are soft clipped from reads during the alignment. Diff, different; WGS, whole genome sequencing.