Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight

Fig. 3

Dark coding regions occur throughout the genome and are largely resolved with long-read sequencing technologies. We identified 2855 dark coding (CDS) regions in 748 protein-coding genes that were dark by either depth or mapping quality (Additional file 2: Table S1; Additional file 3: Table S2). We identified 117 (15.6%) of the 748 protein-coding genes were 100% dark in CDS regions, 402 (53.7%) were at least 25% dark in CDS regions, and 592 (79.1%) were at least 5% dark in CDS regions (Additional file 2: Table S1). a We mapped all protein-coding gene bodies with a dark coding exon to the genome to visualize their genomic location and are generally spread throughout. There are several tight clusters of dark CDS regions on chromosomes 1, 9, 10, and Y, however. b We assessed how well increasing read lengths would resolve dark regions by assessing samples sequenced with Illumina whole-genome sequencing using 250-nucleotide read lengths, as well as long-read technologies 10x Genomics, Oxford Nanopore Technologies (ONT), and Pacific Biosciences (PacBio). Data from the samples sequenced using 250-nucleotide Illumina read lengths reduced the area under the curve (AUC) by 12.1% in CDS regions. Comparing long-read sequencing technologies to the standard Illumina 100-nucleotide read lengths, 10x Genomics, PacBio, and ONT reduced the area under the curve for CDS regions by approximately 49.5%, 64.4%, and 90.4%, respectively. The AUC for each technology is scaled in reference to Illumina sequencing based on 100-nucleotide read lengths (i.e., AUC for Illumina 100-nucleotide read lengths = 1). In contrast to overall results, PacBio outperformed 10x Genomics when looking only at CDS regions (see text). Most analyses focused on genes where at least 5% of the CDS nucleotides are dark, indicated by the dashed line. c, d We also calculated the raw number of dark nucleotides for each technology in GRCh38, genome wide, in full gene bodies, and in CDS regions

Back to article page