Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Towards accurate and reliable resolution of structural variants for clinical diagnosis

Fig. 2

An insight into the reference samples and efforts for SV detection by the SEQC-II consortium. Great strides have been made in advancing SV detection by the SEQC-II consortium (Fig. 2). First, the SEQC-II consortium established high-quality SV calling sets based on multi-platform sequencing of tumor- normal reference samples and partially verified this using orthogonal methods, including PCR-based validation, cytogenetic array BioNano optical mapping, as well as fusion gene detected from RNA-seq [35]. Meanwhile, SEQC-II systematically evaluated the reproducibility of somatic SV detections across platforms and benchmarked the performance of various software tools. Leveraging the developed high-quality SV calling sets, they developed a deep learning-based calling algorithm for SV detection using the convolutional neural network (CNN). The proposed deep learning models achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and complex genomic regions. Furthermore, the CNV inference method was developed based on the generated single-cell RNAseq data. Second, the SEQC-II consortium comprehensively investigated the performance and confounding factors (i.e., long or short-read sequencing, capture panels, and bioinformatics pipelines) of gene fusion detection [28]. It was found that long-read sequencing achieved higher precision and discovered more novel fusion genes. Short-read sequencing achieved greater sensitivity for detecting known fusion genes correlated with the endogenous expression of targeted genes. Third, the SEQC-II consortium prioritized SV detection divergent sources by using multiple illumina-based short-read sequencing of the Chinese quartet reference samples. Interestingly, mapping methods are significant resources of calling variability, followed by sequencing centers and replicates. Surprisingly, SV supported by only one site or technical replicate often represented true positives defined by long-read PacBio sequencing, consistent with an overall higher false-negative rate for SV calling [36]

Back to article page