Skip to main content
Fig. 6 | Genome Biology

Fig. 6

From: seqQscorer: automated quality control of next-generation sequencing data using machine learning

Fig. 6

Identifying outliers in independent diagnostic studies. Gene expression data of 90 human disease samples (RNA-seq) were retrieved from 6 independent diagnostic datasets in the GEO database. For each dataset, samples were plotted using the 2 first components of a principal component analysis (PCA) applied on normalized gene expression profiles and clustering by group (control vs disease samples) was evaluated by the Dunn index. The bar plot shows for each dataset the difference of Dunn index before and after automatic removal of outlier samples. In each group, 2 samples with highest low-quality probability were arbitrarily considered outliers. A positive difference in Dunn index denotes an improved clustering after removing outlier samples

Back to article page