Systematic analysis of chromatin interactions at disease associated loci links novel candidate genes to inflammatory bowel disease
© The Author(s). 2016
Received: 8 June 2016
Accepted: 7 November 2016
Published: 30 November 2016
Genome-wide association studies (GWAS) have revealed many susceptibility loci for complex genetic diseases. For most loci, the causal genes have not been identified. Currently, the identification of candidate genes is predominantly based on genes that localize close to or within identified loci. We have recently shown that 92 of the 163 inflammatory bowel disease (IBD)-loci co-localize with non-coding DNA regulatory elements (DREs). Mutations in DREs can contribute to IBD pathogenesis through dysregulation of gene expression. Consequently, genes that are regulated by these 92 DREs are to be considered as candidate genes. This study uses circular chromosome conformation capture-sequencing (4C-seq) to systematically analyze chromatin-interactions at IBD susceptibility loci that localize to regulatory DNA.
Using 4C-seq, we identify genomic regions that physically interact with the 92 DRE that were found at IBD susceptibility loci. Since the activity of regulatory elements is cell-type specific, 4C-seq was performed in monocytes, lymphocytes, and intestinal epithelial cells. Altogether, we identified 902 novel IBD candidate genes. These include genes specific for IBD-subtypes and many noteworthy genes including ATG9A and IL10RA. We show that expression of many novel candidate genes is genotype-dependent and that these genes are upregulated during intestinal inflammation in IBD. Furthermore, we identify HNF4α as a potential key upstream regulator of IBD candidate genes.
We reveal many novel and relevant IBD candidate genes, pathways, and regulators. Our approach complements classical candidate gene identification, links novel genes to IBD and can be applied to any existing GWAS data.
KeywordsInflammatory bowel disease Genetics Epigenetics Genome-wide association studies (GWAS) Enhancer elements Chromatin interactions DNA regulation Candidate genes
Inflammatory bowel disease (IBD) is an inflammatory disorder of the gastro intestinal tract with an intermittent, chronic, or progressive character. Studies on the pathogenesis of IBD have elucidated the involvement of a broad range of processes that mainly regulate the interaction between the intestinal mucosa, the immune system, and microbiota . A role for genetics in the pathogenesis of IBD has been established through twin-based, family-based, and population-based studies . Subsequently, a substantial effort to identify genetic elements involved in the IBD pathogenesis followed. In this respect, multiple genome-wide association studies (GWASs) have been performed over the past years [2–5]. In these studies, common genetic variants (single nucleotide polymorphisms (SNPs)) are assayed across the whole genome in search of variants that are significantly over-represented or under-represented in patients compared to healthy controls. Although GWASs have revealed many IBD-associated loci, for most loci the causal genes that led to the associations have not been identified. Furthermore, the majority of IBD-associated SNPs are located in non-coding DNA and therefore cannot be causal in the sense that they directly lead to amino acid changes at the protein level [2–4, 6–9]. Therefore, these SNPS are generally thought to be markers for disease-causing variants in nearby genes. This model is used in classical approaches for candidate gene identification. These approaches are mainly based on the selection of genes that have shared functional relationships and are localized in the vicinity of the identified loci [10, 11]. This has led to the identification of crucial genes and pathways involved in the IBD pathogenesis . However, over the past decade it has been established that besides genes, the human genome consists of many other functional elements in the non-protein-coding regions. These regions of the genome can play a role in the pathogenesis of complex diseases. As such, many types of DNA regulatory elements (DRE), especially enhancer elements, are involved in establishing spatiotemporal gene expression patterns in a cell type-specific manner . These elements are crucial in the regulation of developmental processes and in maintaining cell type-specific functionality. It is therefore now widely appreciated that part of the GWAS associations is due to sequence variation in DRE, but this information has largely been ignored in candidate gene identification [9, 14–18].
We have recently shown that 92 of 163 IBD GWAS susceptibility loci localize to DRE (identified through the presence of H3K27Ac in relevant cell types) . DRE are involved in transcription regulation and establishing cell type-specific expression patterns . The genes that are regulated by the IBD-associated elements are likely to play a role in IBD and can therefore be considered as IBD candidate genes. This information has not been used in previous candidate gene approaches, because the identification of these genes comes with several hurdles. Since regulatory elements can regulate genes via chromatin–chromatin interactions that comprise up to 1 Mb [20, 21], these genes cannot be identified based on their linear distance from the regulatory regions. Classical methods for candidate gene identification, that take regulatory mechanisms into account, have mainly been restricted to computational approaches [14, 16, 22, 23]. So far, a limited number of studies have shown the value of using physical interactions between regulatory elements and the genes they regulate through studying the three-dimensional (3D) nuclear conformation chromatin interactions in GWAS interpretation. These studies analyzed either single interactions (3C) or many-vs-many interactions (Hi-C) and were performed in colorectal cancer, auto-immune diseases, and multiple other diseases [24–27]. In contrast to these approaches we make use of circular chromosome conformation capture-sequencing (4C-seq), thereby increasing the number of analyzed interactions compared to 3C and increasing the resolution compared to Hi-C. Our study provides the first systematic analysis of chromatin interactions between disease-associated DRE and candidate genes in IBD. We have identified 902 novel IBD candidate genes, consisting of many noteworthy genes, for example IL10RA, SMAD5, and ATG9A.
Genes interacting with DRE at IBD associated loci
4C-seq identifies different sets of candidate genes in different cell types
Noteworthy novel candidate genes
ATG9A : ATG9A encodes autophagy related protein 9A. Autophagy plays an import role in host defense by eliminating pathogens. ATG-family member ATG16L1 has previously been associated with Crohn’s disease .
BATF : Basic leucine zipper transcription factor ATF-like (BATF) belongs to the activator protein 1 family that is involved in transcription regulation in all immune cells. Batf-deficient mice do not develop Th17 cells and do not produce IL17. Furthermore, BATF regulates cell type specific gene expression in Th2 cells, germinal center B-cells, and T-follicular helper cells .
CD46/CD55 : CD46 (also known as MCP) and CD55 (also known as DAF) are regulatory proteins expressed on surface membranes. These proteins protect the host from autologous complement-mediated injury upon activation of the complement cascade. Daf-deficient mice show increased epithelial damage upon induction of colitis, delayed healing, and elevated expression of proinflammatory cytokines .
IL10RA : The IL10-receptor consists of the two subunits IL10RA and IL10RB. Sequence variants in genes encoding these two subunits are known to cause severe very early onset IBD in a monogenic fashion . While the association of IL10RB with the complex form of IBD was reported by GWASs, the link with IL10RA was so far missing.
SMAD5: SMAD5 is a downstream effector in BMP signaling. SMAD5 expression was found to be downregulated in intestinal cells of IBD patients. Furthermore, conditional depletion of Smad5 in mice results in increased susceptibility for development of colitis upon DSS-induction (dextran sulfate sodium) .
As expected, based on their common hematopoietic origin, the two immune cell types show larger overlap compared to DLD-1 cells (Fig. 2b, Additional file 2: Figure S5). With a median enhancer-to-gene distance of 261, 370, and 354 kbp in DLD-1, lymphocytes, and monocytes respectively, a large proportion of the genes we report are found outside the GWAS susceptibility loci (Fig. 2c). Notably, some of the interactions between IBD loci and candidate gene span over 5 Mb. For example, rs925255 shows a significant (p = 6.068 × 10–9) physical interaction with TANK (TRAF family member-associated NF-κB activator), a gene that is localized 30 Mb from this locus (Additional file 1: Table S2).
Validation and reproducibility of 4C-seq data
To validate the reproducibility of our data, we prepared a 4C template from lymphocytes from a different donor and performed 4C-seq for the 92 regions on this material. Additional file 2: Figure S4A shows that 91% of the candidate genes that are identified in the replicate dataset were also identified in the dataset that is used throughout this study. This demonstrates the reproducibility of the 4C technique, not only in technical, but also in biological duplicates. These results are in line with studies that have previously shown that in 3C-based methods, results from biological duplicates are highly reproducible . Furthermore, we validated the reproducibility of our data by intersecting the 4C datasets with Hi-C datasets that were created in CD34+ leukocytes and a lymphoblastoid cell line . This confirmed a high reproducibility by showing that 99% (CD34+) and 87% (lymphoblastoid) of the genes that were found by Hi-C were also found in our 4C data (Additional file 2: Figure S4B).
Identified candidate genes are actively expressed
We reasoned that genes that are truly regulated by active enhancers in vivo would, on average, be more highly expressed than other genes within the region of the 4C signal. The quantitative examination of expression levels and histone modifications that mark active enhancers and promoters confirmed that the genes that were detected by our method indeed are more actively transcribed than all other genes (also than genes that were not detected by 4C and are found in the same genomic region, Additional file 2: Figures S6 and S7). These results support the detection of functional interactions by the 4C-seq approach that was executed here. Furthermore, we assessed “possible” insulator elements (i.e. insulators occupied by CTCF protein) between the 92 DRE and the candidate genes. Interestingly, the majority of interactions bypasses several CTCF sites and numerous interactions skip over 50 sites bound by CTCF (Additional file 2: Figure S8). In addition, genes that do not interact with the 4C viewpoint do not seem to have more CTCF sites between the viewpoint and their promoter compared to the interacting genes (Additional file 2: Figure S8). This is in line with observations from Hi-C datasets where 82% of long-range interactions bypass at least one CTCF site .
Previously, insulator regions have been shown to prevent enhancer-gene interactions . We therefore investigated whether assessment of the CTCF binding can be used as an alternative to the 4C method by predicting the borders of the regions in which our candidate genes were found. We conclude that CTCF binding information cannot be used as an alternative for the 4C-based candidate gene approach presented here.
4C-seq candidate genes have SNP-dependent expression profiles
We hypothesize that the candidate genes that we identify are contributing to the IBD pathogenesis via impaired transcription regulation caused by variants in DRE. To test this hypothesis, we studied whether 4C-seq candidate genes show different expression profiles in different genetic backgrounds (i.e. in individuals that carry the associated SNP versus individuals that do not) through eQTL analyses . We performed two different analyses in separate databases. First, we used the GTEx database  to test whether our approach is able to detect the eQTLs that are present in the intestinal epithelium (colon-sigmoid, colon-transverse, terminal ileum) and whole blood . We performed an eQTL look-up of the 92 IBD-associated SNPs in these tissues and found 50 genes with a SNP-dependent expression profile. Interestingly, all of the 50 genes were identified by our 4C-seq approach (Additional file 3: Table S4). Second, we made use of another eQTL database (STAGE)  and explored the presence of candidate genes among the genes that were found to have expression levels that are dependent on the interacting SNP genotype in white blood cells. This revealed 10 candidate genes that have an eQTL in the STAGE database. Next, we analyzed all non-interacting genes within 2 Mb from the 4C viewpoint (Additional file 3: Table S4). In contrast to the interacting genes, none of the non-interacting genes showed genotype-dependent expression in the same database. These findings altogether support the capability of our method to identify the candidate genes of which the expression regulation is dependent on IBD-associated genomic variants.
4C-seq gene set is enriched in genes involved in inflammation in IBD patients
Chromatin interactions reveal IL10RA and ATG9A as novel IBD targets
Furthermore, we identified ATG9A (autophagy-related gene 9A) as a novel candidate gene, as its transcriptional start site is physically interacting with an enhancer element in the proximity of rs2382817 in DLDs and monocytes (p = 7.891 × 10–13 in monocytes, p = 9.787 × 10–12 in DLDs, Additional file 2: Figure S9). ATG9A is known to be involved in the generation of autophagosomes. Furthermore, ATG9A has been shown to dampen the innate immune response that occurs in response to microbial dsDNA. ATG9A knockout mice show enhanced expression of IFN-β, IL6, and CXCL10 upon exposure to microbial dsDNA . This gene is furthermore of interest to IBD, because the association of other autophagy genes to IBD is well established [6, 43, 44]. For example, patients that are homozygous for the ATG16L risk allele show Paneth cell granule abnormalities . Based on the role ATG9A plays in responding to microbial dsDNA and the role ATG16L plays in Paneth cell degranulation, it is possible that ATG9A contributes to the IBD pathogenesis in monocytes and intestinal epithelial cells via distinct mechanisms.
Pathway analysis shows cell type-specific results
Hepatocyte nuclear factor 4α (HNF4α) is a potential key regulator of the IBD candidate genes
Our study confirms that many genes that are likely dysregulated in IBD are regulated by HNF4α. Furthermore, HNF4α was found to be one of our candidate genes that was identified by a distal interaction with rs6017342 in intestinal epithelial cells (Additional file 1: Table S2). Upon exposure of intestinal organoids to bacteria lysate, we found that the epithelial response is characterized by a marked upregulation of both the NF-κB pathway and HNF4α (Fig. 6b). The kinetics of HNF4α expression upon epithelial responses and the enrichment of HNF4α-regulated genes among the IBD candidate genes propose HNF4α as a potential key regulator in IBD.
This study shows that using chromatin interactions for GWAS interpretation reveals many novel and relevant candidate genes for IBD. Specifically, we have intersected data on chromatin interactions, mRNA expression, and H3K27Ac occupation data (marking active enhancer elements) to identify IBD candidate genes. By applying 4C-seq to cell types involved in IBD, we revealed 902 novel candidate genes, consisting of multiple noteworthy genes like SMAD5, IL10RA, and ATG9A. Notably, many novel genes were located outside the associated loci.
There are multiple ways that can be used to identify significant interactions in 4C-seq datasets and none of these methods offer the ideal solution for all interaction ranges (long, short, inter-chromosomal), resolutions, and dynamic ranges of signal [51, 52]. In this study, we have selected a method that, to our opinion, provides a good balance between the specificity and sensitivity for interactions spanning up to several megabases. In order to reduce the amount of false-positive findings, we chose to use a stringent cutoff (p ≤ 10–8).
The identification of functional DRE–gene interactions is further established through the overlap of the candidate gene sets identified in the different cell types. Intestinal epithelial cells are developmentally and functionally very distinct from cells with a shared hematopoietic origin, in that context monocytes and lymphocytes are more alike. These differences in overlapping background are reflected by the sets of candidate genes identified in the different cell types. Specifically, lymphocytes and monocytes shared a large part of the candidate genes, whereas intestinal epithelial cells showed a more distinct set of genes (for example, monocytes share 42% and 8% of candidate genes with lymphocytes and DLD-1, respectively; Fig. 2a and Additional file 2: Figure S5). Although this approach gives a general overview of the contribution of lymphocytes to the IBD pathogenesis, it does not enable to discriminate between mechanisms in lymphocyte subsets. Analyzing a pool of cell types also decreases the sensitivity of the detection of candidate genes that are specific to a subset of cells. Therefore, in future approaches, 4C datasets for specific lymphocyte subtypes can provide more insight into the contribution of each of these cell types to the IBD pathogenesis. Furthermore, since UC is limited to the colon and CD can occur throughout the intestine, creating a 4C dataset from epithelium derived from different parts the intestine (i.e. duodenum, jejunum, ileum, and colon) might help to discriminate between the UC and CD specific pathogenic processes.
We examined the presence of eQTLs among the IBD-associated SNPs and the 4C-seq candidate genes. These analyses confirm that our approach is capable to pick up every candidate gene that was found to have SNP-dependent expression levels in tissues relevant for IBD. As expected based on the two eQTL databases that were used, not all 4C-seq candidate genes we found to have a SNP-dependent expression pattern. This is (at least in part) due to the highly context-specific nature of SNP-dependent differential expression of many eQLTs . While eQTLs are usually identified at one specific cell state , many SNP-dependent expression patterns are only present under specific conditions (i.e. developmental stages, presence of activating stimuli, etc.), resulting in a high false-negative rate of eQTL detection. For example, many 4C-seq candidate genes might be differentially expressed between genotypes in the presence of pro-inflammatory stimuli. Our findings both confirm that our assay enables to detect genes with a SNP-dependent expression profile and underlines the need of chromatin-based techniques to identify the genes that are missed by eQTL analyses.
By using GSEA we show that the 4C-seq candidate genes are highly enriched among genes that are upregulated in inflamed intestinal biopsies from IBD patients. Since the GSEA compares inflamed versus non-inflamed intestinal tissue within patients, we cannot determine what the baseline difference in expression is between patients and healthy controls. Although the fact that a gene is upregulated upon inflammation does not show a causal relation between the (dys)regulation of that gene and the IBD phenotype, it shows the involvement of the novel 4C-seq candidate genes in IBD.
We have shown that pathway-enrichment and upstream regulator-enrichment algorithms can be used to interpret and prioritize this large candidate gene dataset. Interpretation of the 4C-seq data can be further optimized by using this data in a quantitative manner (i.e. correlating peak strength instead of using a cutoff value for peak calling). However, as with all approaches for candidate gene identification, further validation is needed to identify the causal genes for IBD. The first step towards this confirmation will in this case consist of revealing the dysregulation of the candidate gene expression upon alteration of the enhancer function in vivo.
We have profiled the chromatin interactions in primary cells from healthy controls and a cell line, to create a profile of the genes that physically interact with the IBD susceptibility loci under normal conditions in peripheral immune cells derived from healthy individuals and in an intestinal epithelium-derived cell line. As the effects of common variants in regulatory regions are relatively mild, it is improbable that a single common variant that is present in an IBD patient will ablate or create a whole regulatory region and its 3D interaction . We therefore do not expect that the identification of candidate genes in cells derived from patients will reveal a substantial number of additional interactions. On the other hand, these variants are expected to cause dysregulation of the candidate genes and thereby contribute to the disease, possibly under very specific conditions, i.e. during certain stages of development or in presence of specific stimuli [16, 53].
Our study provides a proof of principle for the usage of chromatin–chromatin interactions for the identification of candidate genes. The approach presented here complements, but does not replace, previously reported approaches for candidate gene identification . Candidate gene prioritization models for GWASs currently use multiple types of information, for example protein–protein interactions, expression patterns, and gene ontology. We propose that these algorithms should take chromatin interactions into account to optimize gene prioritization.
We have used 4C-seq to study chromatin interactions at loci that have been associated to IBD through GWASs using 4C-seq in cell types that are involved in the pa thogenesis of IBD we identified 902 novel candidate genes, consisting of multiple noteworthy genes like SMAD5, IL10RA, and ATG9A.
We conclude that 4C-seq and other 3C-derived methods can be applied to candidate gene identification in diseases with a complex genetic background and complement the classical candidate gene identification approaches.
DLD-1 cells were cultured in RPMI-1640 with 10% FCS and standard supplements. Cells were harvested for 4C template preparation by trypsinization at 60–80% confluence.
Monocyte and peripheral blood lymphocyte (PBL) isolation
Peripheral blood was collected from two healthy donors (one for monocyte isolation, one for PBL isolation) in sodium-heparin tubes. Peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll-Paque gradient centrifugation. PMBCs were incubated with magnetic CD14+ microbeads (Milteny, order no. 130-050-201) according to the manufacturer’s manual. Thereafter cells were magnetically separated by the AutoMACS™ Separator; the negative fraction consisted of PBLs, the positive fraction of monocytes.
Circular chromosome conformation capture: sequencing
For each cell type, one 4C-template was prepared. 4C-chromatin preparation, primer design, and library preparation were described previously . 10 × 106 cells were used for chromatin preparation per cell type (monocytes, PBLs, and DLD-1). Primer sequences are listed in Additional file 6: Table S1. The library preparation protocol was adapted to make it compatible with the large number of viewpoints. Details can be found in the Additional file 2: Supplementary data, Methods.
Libraries were sequenced using the HiSeq2500 platform (Illumina), producing single end reads of 50 bp.
The raw sequencing reads were de-multiplexed based on viewpoint-specific primer sequences (the datasets are accessible through GEO Series accession number GSE89441). Reads were then trimmed to 16 bases and mapped to an in silico generated library of fragends (fragment ends) neighboring all DpnII sites in human genome (NCBI37/hg19), using the custom Perl scripts. No mismatches were allowed during the mapping and the reads mapping to only one possible fragend were used for further analysis. To create the 4C signal tracks in the UCSC browser, we have generated the .*bed files with information for each mappable fragend on the coordinates and their covered/non-covered (1 or 0) status. Visualization of the tracks in the UCSC browser was done with the following settings: windowing function: mean; smoothing window: 12 pixels.
Identification of the interacting genes
First, we calculated the number of covered fragends within a running window of k fragends throughout the whole chromosome where the viewpoint is located. This binary approach (i.e. a fragend is covered or is not covered in the dataset) was chosen to overcome the influence of polymerase chain reaction (PCR)-efficiency-based biases, however this approach decreases the dynamic range of the 4C-seq and may overestimate the strength of distal interactions compared to proximal interactions. The k was set separately for every viewpoint so it contains on average 20 covered fragends in the area around the viewpoint (+/– 100 kbp), e.g. when 100 out of 150 fragends around the viewpoint were covered the window size was set to 30 fragends. Next, we compared the number of covered fragends in each running window to the random distribution. The windows with a significantly higher number of covered fragends compared to random distribution (p < 10–8 based on binominal cumulative distribution function; R pbinom) were considered as a significant 4C signal. The following criteria were defined for the identification of the candidate genes: (1) the transcriptional start site (TSS) co-localizes with a significant 4C-seq signal (p < 10–8) within 5 kbp; (2) the susceptibility variant or other variant in linkage disequilibrium (LD) co-localizes with the H3K27ac signal (that marks activating regulatory elements) in the cell type from which the 4C signal was obtained (68 loci in monocytes, 73 in lymphocytes, and 52 in intestinal epithelial cells) ; and (3) the gene is expressed (log2(RPKM) > –0.5) in the assayed cell type (Additional file 1: Table S2). Datasets used for expression analysis are listed in Additional file 7: Table S3. Quality measures for the 4C library preparation and sequencing can be found in Additional file 2: Supplementary data, Figures S1–S3. The use of single 4C templates per cell type was validated in a biological duplicate of the lymphocyte 4C template that is derived from a different donor (Additional file 2: Figure S4A) and the reproducibility in other chromatin interaction datasets was established by intersecting our findings with two Hi-C datasets  (Additional file 2: Figure S4B and Additional file 7: Table S3).
TSS occupancy by H3K27ac and H3K4me3
The publicly available datasets of H3K27ac and H3K4me3 occupancy were accessed from the UCSC/ENCODE browser (http://genome.ucsc.edu/ENCODE/). Datasets are listed in Additional file 7: Table S3. The occupancy around 2 kbp +/– of TSS of was calculated using custom Perl scripts and Cisgenome  functions.
A manual look-up was performed for expression quantitative trait loci (eQTL) in the Genotype-Tissue Expression (GTEx) database (accession dates; eQTL-genes: 05-2016; p values: 09-2016). The presence of eQTL genes for each of the 92 IBD-associated SNPs was performed in four different tissues: colon-transverse; colon-sigmoid; small intestine-terminal ileum; and whole blood . Next, for each gene for which an IBD-associated SNP turned out to be an eQTL, its presence among the 4C-seq identified genes was evaluated (Additional file 3: Table S4). All transcripts in the GTEx database that were not included in the gene annotation (UCSC genes 2009) that was used for the analysis of the 4C-seq data were removed from the analysis.
eQTLs were analyzed using the Stockholm Atherosclerosis Gene Expression (STAGE)  dataset (Additional file 2: Supplementary data, Methods). Identified loci from GWAS for IBD were matched with imputed and genotyped SNPs and were selected for eQTL discovery. We compared the amount of eQTLs present in “SNP-candidate gene”-pairs and “SNP-control gene”-pairs. Control genes are genes within the same locus that are not interacting with the IBD-associated locus. An empirical false discovery rate was estimated for each eQTL gene by shuffling patient IDs 1000 times on genotype data as described previously .
Gene set enrichment analysis (GSEA)
GSEA  was performed using gene expression datasets  from intestinal biopsies obtained from ulcerative colitis patients (datasets available at GSE11223). The “normal uninflamed sigmoid colon” and “UC inflamed sigmoid colon” were used and the fold changes in expression were calculated using the GEO2R tool  with default settings. Significance of the enrichment was calculated based on 1000 cycles of permutations.
Signaling pathway analysis
The IL10 signaling pathway components were retrieved from Ingenuity Pathway Analysis (IPA®, QIAGEN Redwood City). Genes upregulated upon IL10 signaling (target genes) and genes involved in the bilirubin cascade were removed before further analysis. The interactions between the members of the IL-10 signaling pathway were visualized using the GeneMania tool <http://www.genemania.org/>.
The general pathway analysis was performed with the Ingenuity Pathway Analysis software (IPA®, QIAGEN Redwood City), based on the candidate genes from the three cell types, separately.
Upstream regulators that are enriched regulators of the candidate genes in our datasets were identified with the Ingenuity Pathway Analysis software (IPA®, QIAGEN Redwood City), based on the candidate genes from the three cell types separately. The Ingenuity’s Upstream Regulator Analysis algorithm predicts upstream regulators from gene datasets based on the literature and compiled in the Ingenuity knowledge base.
Tracks used for rs630923 and rs2382817
All tracks were accessed from the UCSC/ENCODE browser (http://genome.ucsc.edu/ENCODE/). Datasets are listed in Additional file 7: Table S3. Haploblock structures were visualized with Haploview ; pairwise LD statistics of variants with a distance up to 500 kbp were used in the analyses (Fig. 4, Additional file 2: Supplementary data, Figure S9).
Colon biopsies were obtained by colonoscopy. The biopsies were macroscopically and pathologically normal. Crypt isolation and culture of human intestinal cells from biopsies have been described previously [59, 60]. In summary, human organoids were cultured in expansion medium (EM) containing RSPO1, noggin, EGF, A83-01, nicotinamide, SB202190, and WNT3A. The medium was changed every 2–3 days and organoids were passaged 1:4 every 9 days.
Five to seven days after passaging, the organoids were exposed to 10 μL sterilized E. Coli-lysate (control organoids were not stimulated). After 6 h of exposure, the organoids were harvested and RNA was extracted using TRIzol LS (Ambion™). Complementary DNA was synthesized by performing reverse-transcription (iScript, Biorad). Messenger RNA (mRNA) abundances were determined by real-time PCR using primer pairs that target HNF4α and NFKB1 (Additional file 6: Table S1) with the SYBR Green method (Bio-Rad). ACTIN mRNA abundance was used to normalize the data.
circular chromatin conformation capture - sequencing
autophagy related 9A
complement-decay accelerating factor
decay accelerating factor
- DLD-1 cells:
D.L. Dexter-1 cells
DNA regulatory element
- E. Coli:
expression quantitative trait loci
fetal calf serum
genome-wide association study
acetylation of histone H3 at lysine 27
trimethylation of histone H3 at lysine 4
hepatocyte nuclear factor 4 alpha
inhibitor of nuclear factor kappa-B kinase subunit epsilon
Interleukin 10 receptor subunit alpha
Interleukin 10 receptor subunit beta
kilo base pairs
lamina propria mononuclear cells
mitogen-activated protein kinase kinase kinase 7
mega base pairs
membrane co-factor protein
nuclear factor kappa B
peripheral blood lymphocytes
peripheral blood mononuclear cells
polymerase chain reaction
protein inhibitor of activated STAT 1
reads per kilobase of exon per million reads mapped
- RPMI medium:
Roswell Park Memorial Institute medium
named after their homologous genes Mothers Against Decapentaplegic (MAD) and the Small Body Size protein (SMA) in Drosophila and C. Elegans, respectively
single nucleotide polymorphism
signal transducer and activator of transcription
TRAF family member-associated NFKB activator
transforming growth factor beta-1
- Th17 cells:
T-helper 17 cells
- Th2 cells:
T-helper 2 cells
tumor necrosis factor
transcriptional start site
University of California, Santa Cruz
We thank Professor J. Cho and Dr. S. Middendorp for critically reading the manuscript and S. Cardoso for helping with the isolation of the immune cells.
Claartje A. Meddens is supported by the Alexandre Suerman Stipendium (UMC Utrecht). Magdalena Harakalova is supported by NIH Ro1 grant LM010098. Folkert W. Asselbergs is supported by a Dekker scholarship—Junior Staff Member 2014 T001—Netherlands Heart Foundation and UCL Hospitals NIHR Biomedical Research Centre. Michal Mokry is supported by OZF/2012 WKZ fund and by the Broad Medical Research Program at CCFA ID:368408.
Availability of data and material
The 4C-seq data discussed in this publication have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE89441. All other publicly available datasets are listed in Additional file 4: Table S3.
CAM: Study concept and design; acquisition of data; analysis and interpretation of data; drafting of the manuscript. MH: Drafting of the manuscript; acquisition of data, critical revision of the manuscript for important intellectual content. NAMD: Acquisition of data. HH: Acquisition of data. HFA: Data analysis. EPJGC: Critical revision of the manuscript for important intellectual content. JLMB: Data analysis. FWA: Material support; study supervision; critical revision of the manuscript for important intellectual content. EESN: Critical revision of the manuscript for important intellectual content; study supervision. MM: Study concept and design; analysis and interpretation of data; drafting of the manuscript; study supervision. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Biopsies were obtained by ileo-colonoscopies that were performed as part of standard diagnostic procedures. Human Material Approval for this study was obtained by the Ethics Committee (Medisch Ethische Toetsings Commissie, METC) of the University Medical Center Utrecht (www.umcutrecht.nl/METC), protocol number 10/402.
Blood was obtained from the Mini Donor Dienst (MDD) at the UMCU. The MDD is approved by the Ethics Committee of the UMCU, protocol number 07/125.
All patients have given informed consent to participate in this study. All experimental procedures in this study comply with the Declaration of Helsinki.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Kaser A, Zeissig S, Blumberg RS. Inflammatory bowel disease. Annu Rev Immunol. 2010;28:573–621.View ArticlePubMedPubMed CentralGoogle Scholar
- Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet. 2010;42:1118–25.View ArticlePubMedPubMed CentralGoogle Scholar
- Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:119–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Anderson CA, Boucher G, Lees CW, Franke A, D’Amato M, Taylor KD, et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat Genet. 2011;43:246–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu JZ, van Sommeren S, Huang H, Ng SC, Alberts R, Takahashi A, et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet. 2015;47:979–89.View ArticlePubMedPubMed CentralGoogle Scholar
- Rioux JD, Xavier JR, Taylor KD, Silverberg MS, Goyette P, Huett A, et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet. 2007;39:596–604.View ArticlePubMedPubMed CentralGoogle Scholar
- Libioulle C, Louis E, Hansoul S, Sandor C, Farnir F, Franchimont D, et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 2007;3:e58.View ArticlePubMedPubMed CentralGoogle Scholar
- Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006;314:1461–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Mokry M, Middendorp S, Wiegerinck CL, Witte M, Teunissen H, Meddens CA, et al. Many inflammatory bowel disease risk loci include regions that regulate gene expression in immune cells and the intestinal epithelium. Gastroenterology. 2014;146:1040–7.View ArticlePubMedGoogle Scholar
- Raychaudhuri S, Plenge RM, Rossin EJ, Ng AC, International Schizophrenia Consortium, Purcell SM, et al. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 2009;5:e1000534.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet. 2007;81:1278–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Rivas MA, Beaudoin M, Gardet A, Stevens C, Sharma Y, Zhang CK, et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet. 2011;43:1066–73.View ArticlePubMedPubMed CentralGoogle Scholar
- Shlyueva D, Stampfel G, Stark A. Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet. 2014;15:272–86.View ArticlePubMedGoogle Scholar
- Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. Systematic localization of common disease-associate variation in regulatory DNA. Science. 2012;337(6099):1190–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Kleinjan DJ, Coutinho P. Cis-ruption mechanisms: Disruption of cis-regulatory control as a cause of human genetic disease. Brief Funct Genomic Proteomic. 2009;8:317–32.View ArticlePubMedGoogle Scholar
- Schaub MA, Boyle AP, Kundaje A, Frazer KA. Linking disease associations with regulatory information in the human genome Toward mapping the biology of the genome. Genome Res. 2012;22(9):1748–59.View ArticlePubMedPubMed CentralGoogle Scholar
- McVicker G, van de Geijn B, Degner JF, Cain CE, Banovich NE, Raj A, et al. Identification of genetic variants that affect histone modifications in human cells. Science. 2013;342:747–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Kilpinen H, Waszak SM, Gschwind AR, Raghav SK, Witwicki RM, Orioli A, et al. Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription. Science. 2013;342:744–7.View ArticlePubMedGoogle Scholar
- Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009;459:108–12.View ArticlePubMedPubMed CentralGoogle Scholar
- de Laat W, Klous P, Kooren J, Noordermeer D, Palstra RJ, Simonis M, et al. Three-dimensional organization of gene expression in erythroid cells. Curr Top Dev Biol. 2008;82:117–39.View ArticlePubMedGoogle Scholar
- Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E, Lynch M, et al. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet. 2014;46:205–12.View ArticlePubMedGoogle Scholar
- Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Nica AC, Dermitzakis ET. Expression quantitative trait loci: present and future. Philos Trans R Soc Lond Ser B Biol Sci. 2013;368:20120362.View ArticleGoogle Scholar
- Wright JB, Brown SJ, Cole MD. Upregulation of c-MYC in cis through a large chromatin loop linked to a cancer risk-associated single-nucleotide polymorphism in colorectal cancer cells. Mol Cell Biol. 2010;30(6):1411–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, et al. Sup Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat Genet. 2015;47:598–606.View ArticlePubMedGoogle Scholar
- Jäger R, Migliorini G, Henrion M, Kandaswamy R, Speedy HE, Heindl A, et al. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat Commun. 2015;6:6178.View ArticlePubMedPubMed CentralGoogle Scholar
- Martin P, McGovern A, Orozco G, Duffus K, Yarwood A, Schoenfelder S, et al. Capture Hi-C reveals novel candidate genes and complex long-range interactions with related autoimmune risk loci. Nat Commun. 2015;6:10069.View ArticlePubMedPubMed CentralGoogle Scholar
- Sladek FM, Zhong W, Lai E, Darnell JE. Liver-enriched transcription factor HNF-4 is a novel member of the steroid hormone receptor superfamily. Genes Dev. 1990;4:2353–65.View ArticlePubMedGoogle Scholar
- UK IBD Genetics Consortium, Barrett JC, Lee JC, Lees CW, Prescott NJ, Anderson CA, et al. Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nat Genet. 2009;41:1330–4.View ArticleGoogle Scholar
- Dekkers JF, Wiegerinck CL, de Jonge HR, Bronsveld I, Janssens HM, de Winter-de Groot KM, et al. A functional CFTR assay using primary cystic fibrosis intestinal organoids. Nat Med. 2013;19:939–45.View ArticlePubMedGoogle Scholar
- Vernot B, Stergachis AB, Maurano MT, Viestra J, Neph S, Thurman RE, et al. Personal and population genomics of human regulatory variation. Genome Res. 2012;22:1689–97.View ArticlePubMedPubMed CentralGoogle Scholar
- Chahar S, Gandhi V, Yu S, Desai K, Cowper-Sal-Iari R, Kim Y, et al. Chromatin profiling reveals regulatory network shifts and a protective role for hepatocyte nuclear factor 4α during colitis. Mol Cell Biol. 2014;34:3291–304.View ArticlePubMedPubMed CentralGoogle Scholar
- Saitoh T, Akira S. Regulation of innate immune responses by autophagy-related proteins. J Cell Biol. 2010;189:925–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Murphy TL, Tussiwand R, Murphy KM. Specificity through cooperation: BATF-IRF interactions control immune-regulatory networks. Nat Rev Immunol. 2013;13:499–509.View ArticlePubMedGoogle Scholar
- Darsigny M, Babeu JP, Dupuis AA, Furth EE, Seidman EG, Levy E, et al. Loss of hepatocyte-nuclear-factor-4alpha affects colonic ion transport and causes chronic inflammation resembling inflammatory bowel disease in mice. PLoS One. 2009;4:e7609.View ArticlePubMedPubMed CentralGoogle Scholar
- de Wit E, Vos ES, Holwerda SJ, Valdes-Quezada C, Verstegen MJ, Teunissen H, et al. CTCF binding polarity determines chromatin looping. Mol Cell. 2015;60:676–84.View ArticlePubMedGoogle Scholar
- Raviram R, Rocha PP, Muller CL, Miraldi ER, Badri S, Fu Y, et al. 4C-ker: a method to reproducibly identify genome-wide interactions captured by 4C-Seq experiments. PLoS Comput Biol. 2016;12:e1004780.View ArticlePubMedPubMed CentralGoogle Scholar
- Fairfax BP, Humburg P, Makino S, Naranbhai V, Wong D, Lau E, et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science. 2014;343:1246949.View ArticlePubMedPubMed CentralGoogle Scholar
- van de Werken HJG, de Vree PJ, Splinter JE, Holwerda SJ, Klous P, de Wit E, et al. 4C technology: protocols and data analysis. Methods Enzymol. 2012;513:89–112.View ArticlePubMedGoogle Scholar
- Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol. 2008;26:1293–300.View ArticlePubMedPubMed CentralGoogle Scholar
- GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5.View ArticleGoogle Scholar
- Hägg S, Skogsberg J, Lundstrom J, Noori P, Nilsson R, Zhong H, et al. Multi-organ expression profiling uncovers a gene module in coronary artery disease involving transendothelial migration of leukocytes and LIM domain binding 2: the Stockholm Atherosclerosis Gene Expression (STAGE) study. PLoS Genet. 2009;5:e1000754.View ArticlePubMedPubMed CentralGoogle Scholar
- Foroughi Asl H, Talukdar HA, Kindt AS, Jain RK, Ermel R, Ruusalepp A, et al. Expression quantitative trait Loci acting across multiple tissues are enriched in inherited risk for coronary artery disease. Circ Cardiovasc Genet. 2015;8:305–15.View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Elbert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Noble CL, Abbas AR, Cornelius J, Lees CW, Ho GT, Toy K, et al. Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis. Gut. 2008;57:1398–405.View ArticlePubMedGoogle Scholar
- NCBI. GEO2R. http://www.ncbi.nlm.nih.gov/geo/geo2r/.
- Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.View ArticlePubMedGoogle Scholar
- Sato T, Stange DE, Ferrante M, Vries RG, Van Es JH, Van den Brink S, et al. Long-term expansion of epithelial organoids from human colon, adenoma, adenocarcinoma, and Barrett’s epithelium. Gastroenterology. 2011;141:1762–72.View ArticlePubMedGoogle Scholar
- Lin F, Spencer D, Hatala DA, Levine AD, Medof ME. Decay-accelerating factor deficiency increases susceptibility to dextran sulfate sodium-induced colitis: role for complement in inflammatory bowel disease. J Immunol. 2004;172:3836–41.View ArticlePubMedGoogle Scholar
- Glocker EO, Kotlarz D, Boztug K, Gertz EM, Schaffer AA, Noyan F, et al. Inflammatory bowel disease and mutations affecting the interleukin-10 receptor. N Engl J Med. 2009;361:2033–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu B, Tahk S, Yee KM, Fan G, Shuai K. The ligase PIAS1 restricts natural regulatory T cell differentiation by epigenetic repression. Science. 2010;330:521–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Allaire JM, Darsigny M, Marcoux SS, Roy SA, Schmouth JF, Umans L, et al. Loss of Smad5 leads to the disassembly of the apical junctional complex and increased susceptibility to experimental colitis. Am J Physiol Gastrointest Liver Physiol. 2011;300:G586–97.View ArticlePubMedGoogle Scholar
- Portillo JC, Greene A, Schwartz I, Subauste MC, Subauste CS. Blockade of CD40-TRAF2,3 or CD40-TRAF6 is sufficient to inhibit pro-inflammatory responses in non-haematopoietic cells. Immunology. 2015;144(1):21–33.View ArticlePubMedGoogle Scholar
- Sanyal A, Lajoie BR, Jain G, Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012;489:109–13.View ArticlePubMedPubMed CentralGoogle Scholar
- Bell AC, West AG, Felsenfeld G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell. 1999;98:387–96.View ArticlePubMedGoogle Scholar
- Saitoh T, Fujita N, Hayashi T, Takahara K, Satoh T, Lee H, et al. Atg9a controls dsDNA-driven dynamic translocation of STING and the innate immune response. Proc Natl Acad Sci U S A. 2009;106:20842–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Hampe J, Franke A, Rosenstiel P, Till A, Teuber M, Huse K, et al. A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nat Genet. 2007;39:207–11.View ArticlePubMedGoogle Scholar
- Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn’s disease susceptibility. Nat Genet. 2007;39:830–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Cadwell K, Liu JY, Brown SL, Miyoshi H, Loh J, Lennerz JK, et al. A key role for autophagy and the autophagy gene Atg16l1 in mouse and human intestinal Paneth cells. Nature. 2008;456:259–63.View ArticlePubMedPubMed CentralGoogle Scholar
- Shuai K, Liu B. Regulation of JAK-STAT signalling in the immune system. Nat Rev Immunol. 2003;3:900–11.View ArticlePubMedGoogle Scholar