Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era

Fig. 2

Integrative “post-GWAS” mapping of allele-specific marks for identifying disease-associated regulatory sequence variants. Genome-wide association studies (GWAS) typically implicate a haplotype block spanning tens to hundreds of kilobases, with resolution limited by the fact that all single nucleotide polymorphisms (SNPs) that are in strong linkage disequilibrium (LD) with the index SNP will show a similar disease association. A combination of post-GWAS modalities using maps of allele-specific marks can help to localize the causal genes and the underlying regulatory sequences. a The S100A*-ILF2 region exemplifies this approach. The map shows the index SNPs for expression quantitative trait loci (eQTLs), methylation quantitative trait loci (mQTLs), haplotype-dependent allele-specific DNA methylation (hap-ASM), and allele-specific transcription factors (ASTF). The suggestive (sub-threshold) GWAS signal for multiple myeloma susceptibility (rs7536700, p = 4 × 10−6) tags a haplotype block of 95 kb, which was defined using 1000 Genome data [186] with an algorithm that emphasizes D-prime values [187, 188]. The GWAS SNP overlaps no known regulatory element or transcription factor (TF) binding site. Numerous cis-eQTL SNPs correlating with several genes within 1 MB have been identified in this haplotype block (eQTL-tagged genes indicated in red), so identifying the causal regulatory SNP(s) is not possible solely from eQTL data. However, several SNPs in the block identify mQTLs, all correlating with the same CpG site, cg08477332. Fine mapping using targeted bis-seq [49] confirmed a discrete hap-ASM differentially methylated region (DMR; orange) spanning ~1 kb. The hap-ASM index SNP rs9330298 is in strong LD with rs7536700 (D′ = 1), is the closest SNP to the DMR, and is an eQTL correlating with S100A13 expression. In addition, this DMR coincides with a CTCF peak that shows allele-specific binding in chromatin immunoprecipitation-sequencing (ChIP-Seq) data, nominating the disruption of CTCF binding by rs9330298 as a candidate mechanism underlying susceptibility to multiple myeloma, either by direct effects in B cells or via effects on immune surveillance by T cells. The eQTL and ASTF data are from the Genotype-Tissue Expression project (GTEx) and alleleDB, respectively [47, 180]. RNA-seq data in GM12878 cell lines were downloaded from ENCODE. The mQTL and hap-ASM data are from [49], and the CTCF ChIP-seq data (GM12878 LCL) from ENCODE. The dashed line represents a genomic region lacking defined LD structure. b Map showing three-dimensional chromatin interactions in the S100A* gene cluster. The hap-ASM region coincides with a CTCF-mediated chromatin anchor site, as suggested by chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) data (K562 cell line) [122]. This evidence suggests that disruption of the CTCF-binding site by the candidate regulatory SNP (rSNP), rs9330298, might abrogate the formation of one or more chromatin loops. c Bis-seq (closed circles, methylated CpGs; open circles, unmethylated CpGs) confirms that the hap-ASM DMR overlaps a CTCF-binding site (amplicon 2) and the lower position weight matrix (PWM) score for allele B of rs9330298 predicts allele-specific disruption of CTCF binding, consistent with the allele-specific binding seen in the ChIP-seq data. The disruption of this CTCF-mediated chromatin anchor site could account for eQTLs in this region, where the S100A cluster genes are no longer insulated from the active enhancers of neighboring genes, such as ILF2 or CHTOP, which have higher expression levels in blood

Back to article page