Skip to main content
Fig. 6 | Genome Biology

Fig. 6

From: Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight

Fig. 6

Camouflaged genes are consistently dark in gnomAD, but dark-by-depth genes may be sample or dataset specific. Many dark genes are specifically camouflaged (Additional file 13: Table S12; Additional file 14: Table S13), but many are dark by depth; we found that camouflaged regions in the ADSP are consistently dark in the gnomAD consortium data (http://gnomad.broadinstitute.org/) [36]. Dark-by-depth regions may be more variable between samples and datasets, however, suggesting these regions may be sensitive to specific aspects of whole-genome sequencing (e.g., library preparation) or downstream analyses. a SMN1 and SMN2 are camouflaged by each other (only SMN1 shown). Both genes contribute to spinal muscular atrophy and have been implicated in ALS. b HSPA1A and HSPA1B are also camouflaged by each other (only HSPA1A shown). The heat-shock protein family has been implicated in ALS. c NEB (9.5% dark CDS) is a special case that is camouflaged by itself. NEB is associated with 24 diseases in the HGMD, including nemaline myopathy, a hereditary neuromuscular disorder. NEB is a large gene; thus, 9.5% dark CDS translates to 2424 protein-coding bases. d CR1 is a top Alzheimer’s disease gene that plays a critical role in the complement cascade as a receptor for the C3b and C4b complement components, and potentially helps clear amyloid-beta (Aβ) [37,38,39]. CR1 is also camouflaged by itself, where the repeated region includes the extracellular C3b and C4b binding domain. The number of repeats and density of certain isoforms have been associated with Alzheimer’s disease [21, 40,41,42,43]. e HLA-DRB5 is dark by depth in the ADSP and gnomAD data. HLA-DRB5 has been implicated in several diseases, including Alzheimer’s disease. f RPGR is likewise dark in ADSP and gnomAD and is associated with several eye diseases, including retinitis pigmentosa and cone-rod dystrophy. g ARX is dark-by-depth, but varies by sample or cohort, as approximately 70% of gnomAD samples are not strictly dark by depth. ARX is associated with diseases including early infantile epileptic encephalopathy 1 (EIEE1) and Partington syndrome. h Similarly, TBX1 is not strictly dark by depth in approximately 70% of gnomAD samples. The Y axes for figures af indicate median coverage in gnomAD (blue = exomes; green = genomes), whereas the Y axes in g, h represent the proportion of gnomAD samples that have > 5x coverage. Dark and camouflaged regions, as well as the percentage of each gene’s CDS region that is dark, are indicated by red lines. Dark regions in exome data are either similar or more pronounced than what we observed in whole-genome data, highlighting that dark and camouflaged regions are generally magnified in whole-exome data. For interest, we also discovered that APOE—the top genetic risk for Alzheimer’s disease [44,45,46]—is approximately 6% dark CDS (by depth) for certain ADSP samples with whole-genome sequencing, and the same region is dark in gnomAD whole-exome data (Additional file 1: Figure S11)

Back to article page