Skip to main content

Table 2 Benchmark datasets generated from SEQC-II consortium efforts and potential application in SV detection

From: Towards accurate and reliable resolution of structural variants for clinical diagnosis

Working group

Reference samples

Benchmark data

Potential benefit for SV detection

Link

Somatic mutation [30, 32, 66, 73,74,75,76]

Tumor-normal sample: HCC1395BL as normal and HCC1395 as tumor

Fresh DNA:

WGS - HiSeq, NovaSeq, 10X Genomics, and PacBio)

WES - Hiseq and Ion Torrent

AmpliSeq - MiSeq

Microarray -AffyChip CytoScan HD

• Somatic SV benchmark establishment

• Low allelic frequency (LOF) somatic SV detection in liquid biopsy or FFPE samples

• Deep learning-based somatic SV detection

• Reproducibility and repeatability assessment of somatic SV detection based on multiple sample and design

All raw data (FASTQ files): NCBI’s SRA database (SRP162370)

VCF and source code:

ftp://ftp-trace.ncbi.nlm.nih.gov/seqc/ftp/release/Somatic_Mutation_WG/

BAM files: Seven Bridges’ s Cancer Genomics Cloud (CGC) platform and license is needed.

FFPE/mixed DNA:

WGS/WES: Hiseq

Fresh cells:

scCNV: 10X Genomics

scCNV data: SRA repository under accession code no. PRJNA504037.

Source code: https://github.com/oxwang/fda_scRNA-seq and https://codeocean.com/capsule/0497386 or https://doi.org/10.24433/CO.1559060.v1.

Oncopanel [28, 29, 54, 56, 77]

Sample A: ten cancer cell line mixture

Sample B: a normal male cell line (Agilent OneSeq Human Reference DNA, PN 5190–8848)

Spike in samples: 5% AcroMetrix spikes-ins + Sample B

8 pan-cancer gene panels:

WES: HiSeq, NovaSeq, Ion Torrent, Nanopore, Stranded RNAseq

WGS: 10X Genomics

Microarrays: SNP array and aCGH

• Reproducibility and repeatability assessment of actionable somatic SV assessment

• Benefit of gene fusion detection by integrating DNAseq and RNAseq

FASTQ or BAM: BioProject PRJNA677997 - https://www.ncbi.nlm.nih.gov/bioproject/PRJNA677997.

VCF/BED: https://figshare.com/projects/SEQC2_Onco-panel_Sequencing_Working_Group_-_PanCancer_panel_Study/94520

Germline mutation [36, 38, 39]

Chinese Quartet samples (B-lymphocyte cell line and blood samples)

WGS: Hiseq, NovaSeq, illumina X10, PacBio

Microarrays: SNP array

• Influential factors on reproducibility assessment for germline SV detection

• Germline SV detection concordance between B-lymphocyte cell line and blood samples

• Deep learning-based somatic SV detection

• Cross check the best practice of germline SV detection with NIST efforts

Raw data: BioProject PRJNA723125 (HapMap samples) https://www.ncbi.nlm.nih.gov/bioproject/PRJNA723125/ and

NODE OEP001896 (Chinese Quartet Samples) https://www.biosino.org/node/project/detail/OEP001896

Source code: https://github.com/justwalking2017/SEQC_WG3_Script

HapMap samples (HG001)

HapMAP Ashkenazi Trio

WGS: HiSeq, BGISEQ, MGISEQ, NovaSeq

WES: Ion proton and Ion S5

Raw data: BioProject PRJNA646948 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA646948), within accessions SRR12898279–SRR12898354

Source code:https://www.github.com/jfoox/abrfngs2

Bacterial genomes (ATCC MSA-3001)

Miseq, Ion PGM, Ion S5, MinION, Flongle, and GenapSys