Quantitative trait loci mapped for TCF21 binding, chromatin accessibility and chromosomal looping in coronary artery smooth muscle cells reveal molecular mechanisms of coronary disease loci

Background To investigate the epigenetic and transcriptional mechanisms of coronary artery disease (CAD) risk, as well as the functional regulation of chromatin structure and function, we have created a catalog of genetic variants associated with three stages of transcriptional cis-regulation in primary human coronary artery vascular smooth muscle cells (HCASMC). Results To this end, we have used a pooling approach with HCASMC lines to map regulatory variation that mediates binding of the CAD associated transcription factor TCF21 with ChIPseq studies (bQTLs), variation that regulates chromatin accessibility with ATACseq studies (caQTLs), and chromosomal looping with HiC methods (clQTLs). We show significant overlap of the QTLs, and their relationship to smooth muscle specific genes and the binding of smooth muscle transcription factors. Further, we use multiple analyses to show that these QTLs are highly associated with CAD GWAS loci and correlated to lead SNPs in these loci where they show allelic effects. We have verified with genome editing that identified functional variants can regulate both chromatin accessibility and chromosomal looping, providing new insights into functional mechanisms regulating chromatin state and chromosomal structure. Finally, we directly link the disease associated TGFβ1-SMAD3 pathway to the CAD associated FN1 gene through a response QTL that modulates both chromatin accessibility and chromosomal looping. Conclusions Together, these studies represent the most thorough mapping of multiple QTL types in a highly disease relevant primary cultured cell type, and provide novel insights into their functional overlap and mechanisms that underlie these genomic features and their relationship to disease risk.


Background
While there has been considerable success identifying loci in the human genome that are associated with a broad range of human diseases, including coronary artery disease (CAD) [1][2][3], most identified regions represent non-exonic regulatory sequence and are thus difficult to associate to a particular gene or functional annotation of the genome. Toward this end, numerous studies have explored the genetics of gene expression and mapped expression quantitative trait locus variants to inform on which SNPs and which genes are causally related to disease in the associated loci [4][5][6][7]. A number of these expression quantitative trait variants (eQTLs) have been found to be the causal variants in disease loci, and in general promoted causal gene discovery [8].
The regulatory variants that modulate other genomic features have also been characterized. QTLs for histone modification and chromatin accessibility, and more recently chromosomal looping have also been mapped [9][10][11][12]. Identification of genomic QTLs has been accelerated by use of a pooling approach that significantly reduces the number of individuals that have to be studied, and promoted the mapping of allelic variation that regulates transcription factor binding [13] and chromatin accessibility in a large number of ethnically diverse individuals [9]. Such efforts have provided important information regarding the regulatory structure of the non-coding sequence in the genome and identified how such variation may regulate the risk for complex human disease. However, in general these studies conducted with large high throughput genomic datasets have not been validated with targeted follow-up studies, and the ramifications of their allele-specific effects not investigated. Also, these studies have primarily been conducted in lymphoblastoid cell lines and restricted to a single type of QTL, and thus have not been examined in the context of overlap of QTL functions and their relevance for a broad range of human diseases.
To better understand the functional basis of regulatory features of the human genome, and to accelerate understanding of the transcriptional mechanisms by which these features contribute to CAD, we have mapped quantitative trait variation in disease relevant primary cultured human coronary artery vascular smooth muscle cells (HCASMC). We have identified QTLs for binding of the CAD-associated transcription factor TCF21 (bQTLs), chromatin accessibility (caQTLs), and chromosomal looping (clQTLs), investigated their relationship to one another, and to eQTLs mapped in the same cell type, as well as their overlap with CAD associated genetic variation.

bQTL, caQTL and clQTL calling in pooled HCASMC lines
We obtained primary HCASMC lines from commercial vendors. Genotypes were called with whole genome sequencing or genotyping with Illumina chips, phased and imputed against 1000 Genome phase 3 data before merging [14]. We pooled 65 lines for TCF21 ChIPseq and Hi-C, and 71 lines for ATACseq experiments at passage 4-6 ( Fig. 1a, Methods). After sequencing, we obtained more than 400 M reads each for pooled TCF21 ChIPseq and ATACseq libraries and 800 M reads for the pooled Hi-C library. Standard pipelines were employed for each of these genomic approaches. We called 22381 high-confidence binding peaks from TCF21 ChIPseq with fold enrichment >10, P-value <10 -20 cutoffs and 18601 open chromatin regions from ATACseq with fold enrichment >4, P-value <10 -10 . Most of these TCF21 binding peaks (19705, >88%) and open chromatin regions (17129, >92%) overlapped with the data previously identified in an individual HCASMC line [15]. For the Hi-C data, in addition to standard data processing, we also assigned sequencing reads to the SNPs genotyped in our pooled HCAMSC lines, generating a total of more than 377 million (M) valid interacted reads pairs. Of these, approximately 21.6 M allele-specific interactions with at least one SNP inside the loop boundaries were identified (Additional file1: Suppl Figs. 1a, 1b). Using these interactions, we were be able to find 7916 loops by FitHiC and 3443 loops by HiCCUPS with a stringent P-value cutoff (10 -10 ). Here we show an example genome region at 7p15.2, comparing our pooled ATACseq track and Hi-C loops for heterologous cell types with our previously published H3K27ac HiChIP loops in an individual HCASMC line [16], and ENCODE generated ChIA-PET loops (Fig. 1b). Our pooled Hi-C loops showed similar patterns but with a higher resolution and more chromosomal contacts compared to the previously reported data. More importantly, our data showed high resolution (5kb) of allele-specific interactions, revealing the differential chromosomal architectures associated with individual SNPs (Fig. 1c).
To call the TCF21 binding (bQTL), chromatin accessibility (caQTL) and chromosomal looping (clQTL) quantitative trait loci, we employed a published regression-based approach which uses post-assay allele frequencies together with genotypes of each sample to infer the proportion of each sample in the pool [9,13]. Comparing the pre-assay vs. post-assay allele frequencies allows the identification of outlier SNPs where these frequencies are significantly different from one another, which indicates a cis-acting QTL variant. Although the pre-post frequency distributions showed a slight reference allele frequency bias after regression, consistent with previously reported studies [13,16], the majority of variants showed a balanced pattern (Figs. 1d, 1e and 1f). With fold enrichment-optimized [9] at P value cutoff 10 -4 , we obtained 5315 significant bQTLs, 8346 caQTLs and 7084 clQTLs with the regression analysis (Additional file 2: Suppl Table. 1).

QTLs are highly associated with GWAS CAD loci
An initial question we wanted to address was the relationship of mapped QTLs with coronary artery disease (CAD) associated variation. To examine this question, we used GWAS catalog SNPs [17] supplemented with CARDIoGRAMplusC4D variant data from a 1000 Genomesbased meta-analysis [18]. First, we extracted lead SNPs in CAD-associated categories "Atherosclerosis", "Coronary artery calcification", "Coronary artery", "Coronary heart", "Kawasaki disease" and "Myocardial infarction" from the GWAS catalog along with the lead SNPs in CARDIoGRAMplusC4D. We also randomly selected 27 unrelated diseases to serve as background. We then investigated the enrichment of these lead SNPs in a ±1 kb window around the QTLs. This analysis revealed that the lead SNPs in CAD-associated categories have strong colocalization (P<10 -10 ) with the bQTLs, caQTLs and clQTLs. The highest enriched category for bQTL variants was "Myocardial infarction" and for caQTLs "CARDIoGRAMplusC4D" and "Coronary artery calcification". clQTL variants were enriched for "Myocardial infarction," "CardiogramplusC4D," and the highly significant term "Coronary artery," which was the second most significant category (P<10 -100 ) ranked by P-value (Figs. 2a, 2b and 2c). This finding suggested that these QTLs in HCASMC may play an important role in the genetic mechanism of HCASMC mediated disease progression. All of the QTLs showed enrichment for blood pressure and hypertension phenotypes which are highly consistent with known SMC functions, and association of TCF21 with blood pressure have been identified by population studies with multiple racial ethnic groups [19][20][21]. The highly significant enrichment of breast cancer variants in the bQTL category is consistent with TCF21 being a known tumor suppressor and that is dysregulated in breast cancer, but the enrichment of breast cancer variants among the caQTLs and clQTLs is surprising and suggests a similar genomic and genetic architecture between HCASMC and breast cancer risk genes [22].
To further validate these results, we also evaluated the association of QTLs with disease risk variants by linkage disequilibrium (LD) comparison. We selected all of the GWAS lead SNP-QTL pairs which were separated by less than 100kb, calculated the R 2 for each pair, and then ranked the categories by the ratio of pairs with R 2 >0.8 compared to the total number of pairs. Each of the three types of QTLs showed CAD related categories as among the three most significant (Figs. 2d, 2e and 2f). Interestingly, these analyses also showed some QTL-specificity: "Myocardial infarction" is enriched in bQTL, "Atherosclerosis" only in clQTL, while "Kawasaki disease" is enriched only for caQTL and clQTL variants (Figs. 2d, 2e and 2f). These results were consistent with the distance-dependent enrichment results (Figs. 2a, 2b and 2c).
We next focused on the CAD-associated categories only, comparing the R 2 distributions of the LD pairs to the significant QTLs (P<10 -4 ) with those of non-significant SNPs (P>0.5).
While the bQTLs showed a trend in the correct direction, they did not show a significant correlation (Fig. 2g). However, the caQTLs (Fig. 2h) and clQTLs (Fig. 2i) showed a statistically greater correlation than non-significant SNPs. The fold change of R 2 >0.8 pairs of significant to non-significant QTLs indicated enrichments for the majority of the cardiovascular diseases (Additional file 1: Suppl Figs. 2a, 2b, and 2c). These data demonstrate that the QTLs we identified are significantly enriched in CAD-associated loci.

QTLs colocalize with SMC transcription factor binding and epigenetic modifications
We have previously shown that open chromatin regions in HCASMC are enriched for CADassociated loci and multiple SMC-specific transcription factor (TF) binding sites, such as TCF21 [23]. We extended these studies by investigating TF motif enrichment analysis at the identified QTLs. We created a window of analysis that extended 50 bp upstream and downstream of each significant (P<10 -4 ) QTL and scanned these sequences by matching to the HOMER known motifs or JASPAR core 2018 vertebrate database [24]. Non-significant QTLs (P>0.5) with the same window size served as background. The most enriched motifs around bQTLs were TCF21 and its bHLH Class I dimerization partner TCF12 (Fig. 3a). The AP-1 complex subunit (ATF3 and JUNB), Hippo pathway transcription factor TEAD1 and the chromatin regulator CTCF motifs were also enriched in these regions of the genome, suggesting that they could be regulated together with TCF21 binding at these loci (Fig. 3a). For caQTLs, we found that the AP-1 complex is the dominant TF motif nearby, along with those for SMC functional TFs such as TEAD1/3, SMAD3, TCF21 and ARNT (Fig. 3b). These enrichments and colocalizations extend our previous studies [15,23]. A similar analysis was also performed with clQTLs, which showed ZEB1 and ZEB2 binding site enrichment and also enrichment for a number of TF motifs that haven't been reported to be associated with CAD or SMC function (Additional file 1: Suppl Figs. 3a, 3b).
To investigate the allele-specific TF binding at these QTLs, we searched the motifs that were differentially enriched among the high-chromatin accessibility or TCF21 bound alleles compared to the low-chromatin accessibility or TCF21 less bound alleles for the same QTLs in a ±20 bp window. The results show that TCF21/TCF12, AP-1 complex motifs and a bHLH E-box assigned to MYOD are only enriched in open bQTL alleles with no significant TF motifs found in closed alleles (Fig. 3c). The AP-1 complex motif was associated preferentially with open caQTL alleles, presumably through pioneer functions that promote chromatin accessibility ( Fig.   3d) [15]. Interestingly, we found accessible ZEB1 and ZEB2 binding sites at closed caQTL alleles, suggesting that they may have the opposite function of AP-1 in SMC (Fig. 3e) [25].
We generated heatmaps to investigate co-localization of the QTLs with each other, and with HCASMC ATACseq and ChIPseq data. TCF21 bQTLs were shown to colocalize with previous ChIPseq data [26] (Fig. 3f) (Figs. 3f, 3g). However, chromosomal loops were enriched at their anchors for chromatin accessibility as well as H3K27ac and H3K3me1 chromatin marks (Fig. 3h), with those loops having clQTLs in their anchors showing higher chromatin accessibility and H3K27ac levels than those loops without clQTLs (Fig. 3i). Taken together, these data show that variation regulating TCF21 binding colocalizes with caQTLs in regions of open chromatin that have enhancer histone marks, and are enriched for clQTLs at looping anchors, suggesting that epigenetic effects mediate a significant proportion of genomic molecular phenotypes.

Co-localizing QTLs identify genes that mediate key SMC functional roles
Our next goal was to characterize loci with bQTL, caQTL and clQTL colocalization. We first overlapped these QTLs to examine exact matching and included our published HCASMC eQTL data in this analysis [14]. The number of eQTLs (187299) was comparable to GTEx QTL numbers and QTL-gene pairs (Additional file 1: Suppl Figs. 4a, 4b). We found 446 direct overlaps between bQTLs and caQTLs, 135 between caQTLs and clQTLs, 47 between bQTLs and clQTLs, and 26 overlaps for all three (Fig. 4a). Each QTL group had numerous exact matches with eQTLs while there were few overlaps for all four QTL groups. Interestingly, the overlaps between bQTLs and caQTLs were found to be directional. TF-bound alleles of bQTLs had more matches with open alleles of caQTLs than those with closed alleles while a comparison of less-bound alleles of bQTLs with caQTLs showed opposite results (Additional file 1: Suppl   Fig. 4c). We attribute these findings to the colocalization of TCF21 binding and open chromatin regions as we previously described (Figs. 3g, 3h) [15].
Further, we used LD R 2 >0.8 instead of exact matching to determine overlap. As expected, the overlaps among bQTLs, caQTLs and clQTLs, and their overlaps with eQTLs all increased ( Fig. 4b). To further characterize the associations between eQTLs, bQTLs, caQTLs and clQTLs, we calculated the R 2 of all the LD pairs within 100 kb maximum distance, and found that the R 2 distributions of eQTL to significant (P<10 -4 ) bQTLs, caQTLs or clQTLs was statistically higher than those to non-significant (P>0.5) QTLs (Fig. 4c, Additional file 1: Suppl Figs. 4d, 4e and 4f). These data suggest that these QTLs may share a group of the same target genes with eQTLs and could potentially provide a mechanism for their transcriptional regulation.
We next assigned the QTLs to their single nearest genes within 50 kb maximum distance, as the potential target genes. Gene Ontology analysis of the identified genes showed a strong association of these genes with SMC functions, including "extracellular matrix organization", "blood vessel morphogenesis", "focal adhesion", "angiogenesis", "cell differentiation", "cell division", "coronary artery disease", and "myocardial infarction" (Figs. 4d, 4e and 4f). The targets of these three QTL groups share 98 genes, which have similar gene ontology results as the individual analyses (Fig. 4g, Additional file 1: Suppl Fig. 4g). These data provide a group of novel candidate loci that may contribute to CAD and were examined further in the following studies.

QTLs located in CAD GWAS causal loci show allele-specific TCF21 binding, chromatin accessibility and chromosomal looping
Given the association of the three types of QTLs with GWAS CAD loci and the QTL target genes with SMC functions, we intersected these CAD SNPs with the QTL colocalized loci and then searched their nearest genes. We found 86 QTLs that associate with 151 CAD SNPs. These SNPs are located in 62 loci across the genome (Additional file 1: Suppl Fig. 5a left). After evaluating the corresponding genes according to published studies, we selected 36 candidate causal genes that are highly likely to be associated with CAD risk. We then separated them into three groups by their distances to CAD SNPs and QTLs (Additional file 1: Suppl Fig. 5a right).
Overall, most of these gene were expressed in both GTEx coronary artery tissues (Additional file 1: Suppl Fig. 5b) and HCASMC (Additional file 1: Suppl Fig. 5c). We further investigated the genes in the first group to derive our validation candidates.
These genes, ARNTL, CCBE1, CDH13, EMP1 and FN1 have been reported as CADassociated, or related to vascular development or function [2,18,28]. In addition, they are involved in epithelial-mesenchymal transition (EMT) processes, which are prominent in vascular disease [29][30][31][32]. We validated the QTLs in CDH13 and FN1 loci with multiple approaches. For CDH13, bQTL rs7198036, and the combination caQTL and clQTL rs12444113 were noted to be located ~10kb downstream of the transcription start site (TSS) (Fig. 5a). We performed ATAC-qPCR and ChIP-qPCR in HCASMC heterozygous lines for these QTLs using Taqman genotyping primers, and confirmed that the G allele had higher chromatin accessibility than the C allele at rs12444113 (Fig. 5b) and allele A had a higher TCF21 occupancy than allele G at rs7198036 (Fig. 5c). To verify the clQTL allele specificity, we employed dCas9-KRAB inhibition "CRISPRi" and investigated CDH13 expression. The two gRNAs targeting rs12444113 efficiently reduced CDH13 transcription (Fig. 5d) and chromatin accessibility at the CDH13 TSS region (Fig. 5e). Since the QTLs are located more than 10kb from the TSS, it is more likely to be a long-range regulatory effect rather than a local cis-effect. To further confirm this, we designed an allele-specific 3C-PCR assay to detect the chromosomal interactions with different alleles using HCASMC heterozygous lines. The data confirms that rs12444113 has much greater contact with the CDH13 TSS than the transcription end site (TES), while they both show allele specificity (Fig. 5f). The measured imbalance for these alleles was consistent with the pre-post allele frequencies in the QTL regression data.
With a similar approach, we validated the caQTL and clQTL rs2692224 variant located ~40kb downstream of the FN1 TSS (Fig. 6a). Combined ATAC-qPCR (Fig. 6b) that TGFβ1 and its primary nuclear signaling molecule SMAD3 are both putative CADassociated genes [23], we sought to investigate a causal mechanism by which this disease related signaling pathway could be linked to FN1 disease association, through the function of QTLs identified in these studies. As described above, inhibiting the region surrounding rs2692224 with CRISPRi leads to the repression of FN1 expression. We used TGFβ1 to stimulate multiple HCASMC lines which have different genotypes for rs2692224, including three lines with open allele homozygous C|C, three lines with closed allele homozygous A|A, and two heterozygous A|C lines. We first evaluated FN1 mRNA levels and found that the increase in expression with TGFβ1 stimulation in the cell lines with the C allele was significantly higher than those with the A allele (Fig. 7a). Similarly, the chromatin accessibility responses to TGFβ at this locus were greater with the rs2692224 C alleles (Fig. 7b). In addition, the 3C-PCR results showed that chromosomal interactions are more active in the cell lines containing the C allele compared with the lines containing the A allele (Fig. 7c). Overall, the C|C homozygous HCASMC are most sensitive to TGFβ1, A|C heterozygous are less sensitive while A|A homozygous are most insensitive to FN1 expression, chromatin accessibility and chromosomal interaction. These functional studies provide significant evidence that variant rs2692224 serves to link the transcriptional response of the CAD associated gene FN1 to stimulation by the disease associated TGFβ1 pathway, through mechanisms that regulate chromatin accessibility and chromosomal interactions.  [9,13]. The individual ChIP-PCR, ATAC-PCR and 3C-PCR assays that we performed represent the most extensive validation of this approach. QTLs of the CAD-associated transcription factor TCF21 were of particular interest because of the enrichment of its binding in other CAD loci [26] and contribution to the risk at these loci where it modulates the epigenome and thus expression of causal genes [15,27]. We have previously shown that CAD-associated variants are enriched in HCASMC regions of open chromatin [23], and that these cells contribute a significant portion of CAD risk [14]. There was at most 40% overlap of the HCASMC caQTLs with those reported previously for LCLs [9], suggesting that regulatory variation is quite different between these two cell types and underscoring the importance of these smooth muscle cell data for study of vascular disease genetics. We have shown previously that chromosomal looping at CAD loci is regulated by associated allelic variation [16] and mapping of HCASMC clQTLs significantly expands this work with identification of specific variants that are candidate regulators of this genomic feature. To the best of our knowledge this is the first report of mapping clQTLs in primary cultured human cells.

Discussion
The finding of colocalization of different classes of QTLs in HCASMC is highly informative. It has been established that TF binding can promote open chromatin and create caQTLs [10,43], and that individual QTLs can influence multiple molecular phenotypes within the same region [42]. In our data there was significant colocalization of bQTLs with caQTLs, and both of these with eQTLs. While the degree of direct overlap of bQTLs and caQTLs was modest (Fig. 4a), there was significant colocalization when variants with LD (R 2 >0.8) were considered (Fig. 4b), and our analyses showed enriched colocalization across the genome for bQTLs and caQTLs (Fig. 4c). Interestingly, both of these classes of variants were also noted to be enriched at JUN binding sites, consistent with a pioneer function for the AP-1 complex (Fig.   3h) [15]. Further, enrichment of these QTLs with the smooth muscle cell MYOCD-SRF complex, SRF binding CArG boxes, indicated a role for these regulatory variants in SMC processes (Fig.   3i). Gene Ontology analysis of genes located nearby to QTLs showed high level enrichment for terms related to SMC processes such as matrix organization, cell-cell interactions -adherence junctions, and vascular development and angiogenesis. By comparison, clQTLs had less colocalization and a different set of enrichment categories, suggesting an independent mechanism from bQTLs and caQTLs.
TF motifs nearby to the bQTLs and clQTLs indicate interactions between the binding and function of these two types of regulatory activities. As expected, there is primarily enrichment for TCF21 and its Class I bHLH dimerization partner TCF12 nearby to the TCF21 bQTLs.
Enrichment for AP-1 motifs was identified for both the bQTLs and caQTLs, consistent with its known role in HCASMC [15] and its relationship to lineage determining factors in multiple other cell types [44][45][46]. TEAD motifs were represented nearby to the bQTL and caQTL regions, and these factors are known to play a significant role in SMC lineage determination and disease risk [47]. Further, caQTLs were associated with SMAD3, a known potent regulator of blood vessel development [48][49][50]and SMC function [51,52]as well as CAD association [23,27].
Interestingly, analysis of the closed caQTLs showed low level enrichment for motifs that mediate binding of ZEB factors, which are known to promote closure of DNA [25].
Perhaps most importantly, mapping these data provide highly useful information for developing credible sets of variants in disease loci implicated by GWAS. For those QTLs that overlap, especially for overlapping bQTLs and caQTLs, there is an increased likelihood that these variants are the active regulatory variant in the region, and thus more likely to contribute to the risk association in the associated locus. However, these QTLs suffer from the same ambiguity regarding functional identity as all variation mapped at a genome-wide level; it is often difficult to discern the functional versus the correlated variants in a haplotype block. In the CDH13 and FN1 loci where we have validated the identified QTLs, these QTLs are not in LD with the reported lead SNPs and thus to contribute to disease association would have to be identifying another allele not reported in the GWAS data. The average number of alleles for each GWAS locus is 2-3 and nearby independent alleles are often not reported, so this is a distinct possibility. To link the FN1 locus QTLs with CAD, we showed that stimulation of HCASMC by TGFβ, a known causal CAD signaling pathway [23,27,53], produces a change in the looping pattern as predicted by the clQTL. These molecular QTLs will provide support to fine mapping efforts aimed at identifying disease causal variation, limiting credible sets of genes identified by other methods, and better understanding of the molecular functions of regulatory variation in the human genome.

Conclusions
In experiments described here we have used a pooling approach with primary cultured HCASMC to map regulatory variation that mediates binding of the CAD associated transcription factor TCF21 (bQTLs) with ChIPseq studies, mapped variation that regulates chromatin accessibility (caQTLs) with ATACseq studies, and chromosomal looping (clQTLs) with HiC methods. We showed that these QTLs are highly associated with CAD GWAS loci and correlated to lead SNPs in these loci, co-localize with smooth muscle cell transcription factor binding and epigenetic modification, show allele-specific function in CAD GWAS loci, and that these functional mapped variants can serve as response QTLs that link expression of causal CAD genes to epigenetic stimulation. Together, these studies represent the most thorough mapping of multiple QTL types in a highly disease relevant primary cultured cell type, and provide novel insights into their functional overlap and mechanisms that underlie these genomic features and their relationship to disease risk.

Hi-C
The Hi-C protocol was modified from HiChIP with excluding the antibody purification [54]. hours. Re-ligated nuclei were lysed by 0.5% SDS and sheared for 2 min by sonication before extensive centrifugation. Supernatant was incubated at 65°C over-night with Protease K and the DNA was purified. For libraries construction, set aside DNA by 150 ng to the biotin capture step.
Biotin labeled DNA was pre-bound to Streptavidin C-1 beads (Thermo, 65-001) in biotin binding buffer. After Tween buffer and TD buffer washing, Tn5 transposition was performed on beads at 55°C for 10 min. Libraries were generated by PCR amplification with Nextera adapters added after washing the beads. Samples were size selected by PAGE purification (300-700 bp) for effective paired end sequencing and adapter dimer removal. All libraries were sequenced on the Illumina HiSeq 4000 instrument to total 800 million 75 bp paired end reads. Whole-genome sequencing data were processed with the GATK best practices pipeline [14] . cutadapt trimmed reads were aligned to the hg19 reference genome with bwa. Duplicate reads in alignment result were marked with picard markduplicate. We performed indel realignment and base recalibration with GATK. The GATK HaplotypeCaller was used to generate gVCF files, which were fed into GenotypeGVCFs for joint genotype calling. We recalibrated variants using the GATK VariantRecalibrator module. We further phased our call set with Beagle. We first used the Beagle conform-gt module to correct any reference genotypes if they are different from hg19. We then phased and imputed against 1000 Genomes phase 3 version 5a. Variants with imputation allelic r 2 less than 0.8 and Hardy-Weinberg Equilibrium Pvalue less than 1×10 -6 were filtered out.

Pooled sequencing data processing
Quality control of pooled ChIPseq data was performed using fastqc, and then low-quality bases and adaptor contamination were trimmed by cutadapt. Filtered reads were mapped to hg19 genome using bwa mem algorithm. Duplicate reads were marked by picard markduplicate module and removed with unmapped reads by samtools. macs2 callpeak was used for peaks calling and input as control. Similar approaches were used for pooled ATACseq with the following modifications. We used bowtie2 to align reads to hg19 genome. bedtools was used for reads format converting and awk was used for Tn5 shifting. macs2 callpeak with --broad was used for peak calling with lambda background.
Pooled HiC paired-end reads were aligned to the HCASMC phasing data masked hg19 genome using the HiC-Pro pipeline. Default settings were used to remove duplicate reads, assign reads to MboI restriction fragments, filter for valid interactions, and generate binned interaction matrices. Aligned reads were assigned to a specific allele on the basis of phasing data. HiC-Pro filtered reads were then processed using the hicpro2juicebox and hicpro2fithic functions. FitHiC and HiCCUPS ware used to identify high-confidence loops using default parameters. The HiC matrix file was further converted to h5 format by HiCExplorer hicConvertFormat module.
Genome bias of the matrix was corrected by hicCorrectMatrix and hicFindTADs was employed to find transcription activation domain (TAD).

Mapping and analyzing QTLs
Processed alignments of pooled ChIPseq, ATACseq and HiC reads were further realigned by the hornet (https://github.com/TheFraserLab/Hornet) pipeline. HCASMC phasing data were combined with 1000 Genome data of CEU population, filtered by MAF>2.5%. Mapped reads that overlap the combined SNPs were identified. For each read that overlaps a SNP, its genotype was swapped with that of the other allele and the read was re-mapped. Re-mapped reads that fail to map to exactly the same location in the genome were discarded.
Pre-and post-allele frequencies, and the resulting p-values, were calculated using published pipeline cisVar (https://github.com/TheFraserLab/cisVar). This regression-based approach uses post-allele frequencies together with genotypes of each sample to infer the proportion of each sample in the pool [9,13]. These proportions will be weighted by any genome-wide differences, since these will be naturally incorporated into the post-frequencies used as input to the regression. In this way, pre-allele frequencies already account for some types of trans-acting variation, increasing the power for mapping cis-acting differences.

GWAS overlap
HCAMSC eQTL data came from a genome-wide association of gene expression with imputed common variation identified in 52 HCASMC lines [14] . CARDIoGRAMplusC4D variants data was from 1000 Genomes-based GWAS meta-analysis [18]. Direct overlap of QTLs and GWAS Catalog SNPs was performed with bed2GwasCatalogBinomialMod1Ggplot script from gwasanalytics package. The calculation criteria of this script were described previously [55].

Motif analysis
We used HOMER findMotifsGenome.pl script to search for known motifs and to generate de novo motifs among the significant QTLs compared to the non-significant QTLs in ±50bp windows. These results were further validated by MEME. We also searched the motifs that were differentially enriched among the high-chromatin accessibility or TCF21 binding (open) alleles compared to the low-chromatin accessibility or TCF21 binding (closed) alleles in the shared QTLs. The open allele plus 20 bp on each side (40 bp total) was used as input to HOMER findMotifs.pl script, with the closed alleles of the same QTLs used as the background comparison set, or on the opposite direction. Therefore, all significant enrichments are due to QTL variants within the motifs themselves (since any motifs flanking the QTLs would be present in both comparison sets). PWMScan was used for position weight matrix scan. We obtained AHR:ARNT (MA0006.1) motif from JASPAR. The JUN and SRF peaks came from the published data [23,56].

Cis-regulatory functional enrichment and network analysis
We utilized the Genomic Regions Enrichment of Annotations Tool (GREAT 3.0) to search the QTL nearest genes, with the parameter "Single nearest gene", which was limited to 50kb. Gene ontology analysis from GREAT output was done with PANTHER database. Pathways, biological processes, cellular component, phenotype and GAD disease enrichment were carried out using default settings.

Data visualization
ChIPseq and ATACseq bigWig tracks were converted from filtered alignments using bedtools and UCSC utilities. HiC matrix generated by hicpro2juicebox was visualized by Juciebox. The HiCCUPS and FitHiC output were converted to bigInteract format by custom awk command lines. ChIA-PET data came from ENCODE database (ENCSR000CAD and ENCSR361AYD).
Genome browser sessions were generated by WashU Epigenome Browser or pyGenomeTracks.

CRISPRi validation
We used dCas9 fused to KRAB to knockdown the enhancer region where candidate SNPs were located. The gRNAs targeting SNPs within the maximum 100 bp distance were designed using

RNA isolation
RNA for all samples was extracted using the RNeasy mini kit (Qiagen 74106). HCASMC RNA (500 ng) were reverse transcribed using the High capacity RNA-to-cDNA Synthesis kit (Applied Biosystems 4387406).

Quantitative and genotyping PCR
The purified cDNA or dsDNA samples were assayed by quantitative PCR with ABI ViiA 7 and Power SYBR Green Master Mix (ABI 4368706) using custom designed primers (Additional file 3: Suppl Table. 2). ChIP samples were normalized by input, ATAC transposed samples were normalized by genomic DNA which was extracted using Quick-DNA Microprep Kit (Zymo D3020), and 3C libraries were normalized by post-ligation whole genome DNA without biotin purification. Heterozygous genotypes at the candidate loci were determined using TaqMan SNP genotyping qPCR assays (Thermo Fisher Scientific C___2110683_10, C___8714882_10, C___2042333_10, C__27515710_10, C__29302709_10, C__31103265_10, C___3202579_10).
Assays were repeated at least three times. Data shown were average values ± SD of representative experiments.