Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: HycDemux: a hybrid unsupervised approach for accurate barcoded sample demultiplexing in nanopore sequencing

Fig. 1

An example of clustering 30 sequences using our hybrid clustering algorithm (A, B, and C) and the subsequent demultiplexing mechanism based on the clustering results (D). A We perform initial clustering on 30 sequences, resulting in 6 clusters. If a cluster contains a number of sequences greater than the GoodIndex, it is considered a good cluster. In this case, clusters \(C_1\), \(C_2\), and \(C_3\) are considered good clusters. B We attempt to merge the good clusters by selecting k signals from each cluster. If the distance values in the corresponding \(k \times k\) dynamic time warping (DTW) distance matrix are all smaller than a given threshold, we merge two clusters. In this case, \(C_1\) and \(C_2\) are merged into a single cluster. C We attempt to add sequences that are not in good clusters to the existing good clusters after the cluster merging is completed. We select a representative sequence from each good cluster and calculate the DTW distance between the representative sequence and the sequences not in the good cluster. If the distance value is less than a given threshold, we add the sequence to the corresponding good cluster. D We demultiplex the cluster by selecting k signals within the cluster and converting all known barcode sequences into standard nanopore signals based on the official 6-mer table of Oxford Nanopore Sequencing Company. We then calculate the DTW distance matrix between these k signals and the standard nanopore signals, and find the row label corresponding to the minimum value of each column to obtain a one-dimensional row label matrix. Finally, we compute the mode of the row-labeled matrix and use it as the final demultiplexed result

Back to article page