Functional genomics atlas of synovial fibroblasts defining rheumatoid arthritis heritability

Background Genome-wide association studies have reported more than 100 risk loci for rheumatoid arthritis (RA). These loci are shown to be enriched in immune cell-specific enhancers, but the analysis so far has excluded stromal cells, such as synovial fibroblasts (FLS), despite their crucial involvement in the pathogenesis of RA. Here we integrate DNA architecture, 3D chromatin interactions, DNA accessibility, and gene expression in FLS, B cells, and T cells with genetic fine mapping of RA loci. Results We identify putative causal variants, enhancers, genes, and cell types for 30–60% of RA loci and demonstrate that FLS account for up to 24% of RA heritability. TNF stimulation of FLS alters the organization of topologically associating domains, chromatin state, and the expression of putative causal genes such as TNFAIP3 and IFNAR1. Several putative causal genes constitute RA-relevant functional networks in FLS with roles in cellular proliferation and activation. Finally, we demonstrate that risk variants can have joint-specific effects on target gene expression in RA FLS, which may contribute to the development of the characteristic pattern of joint involvement in RA. Conclusion Overall, our research provides the first direct evidence for a causal role of FLS in the genetic susceptibility for RA accounting for up to a quarter of RA heritability. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02460-6.

inflammatory and destructive autoimmune arthritis with a complex genetic basis. RA affects 0.5-1% of the world population and leads to disability, high morbidity burden, and premature mortality [1]. GWAS have identified over 100 loci for RA susceptibility [2]. Genetic risk variants at the majority of these loci do not map to the exons of protein coding genes. Potential gene regulatory functions of these noncoding genetic risk variants have been investigated in immune cells based on genome-wide mapping of epigenetic modifications [3], chromatin interactions [4], correlation with variation in gene expression (eQTLs) [5], or linear proximity to coding genes in DNA sequence [2]. These studies have demonstrated an enrichment of RA genetic risk variants in immune cell enhancers [3], but omitted the analysis of synovial fibroblasts or fibroblast-like synoviocytes (FLS), the resident stromal cells of the joints, even though they are responsible for the production of many immune-related cytokines and chemokines [6,7].
In addition to immune cells, FLS play a decisive role in the pathogenesis of RA and are essential for the maintenance of normal joint functions. FLS from different joints have different epigenomes, transcriptomes, and functions, which may contribute to the characteristic pattern of joint involvement in different types of arthritis [8,9]. FLS substantially contribute to joint inflammation and destruction in RA [10]. RA FLS have an activated phenotype characterized by resistance to apoptosis, increased proliferation, secretion of matrix-degrading enzymes, and production of cytokines and chemokines that promote immune cell differentiation and survival. However, the cause of the activation of FLS in RA is unknown and it is unclear whether this activation leads to or is a consequence of the disease. Defining the contribution of FLS to the heritability of RA will provide essential insights into this question.
For the first time, we have comprehensively mapped RA genetic risk variants to active regulatory DNA elements in FLS. We generated multidimensional epigenetic data in primary FLS, isolated from patients, to create a detailed outline of their chromatin landscape. We conducted genetic fine mapping of RA loci by computing sets of credible single-nucleotide polymorphisms (SNPs) driving GWAS signals. We integrated the credible SNP sets and chromatin datasets to provide evidence that RA risk variants can be functionally relevant in FLS. We used chromatin conformation data to determine enhancer-promoter interactions between risk variants in noncoding DNA regulatory regions of FLS and their target genes. Furthermore, we assessed the influence of the pro-inflammatory cytokine tumor necrosis factor (TNF) on these interactions, chromatin accessibility, and gene expression in FLS. We combined FLS data with published data of human tissues and cells [4,11,12] to identify putative causal SNPs, enhancers, genes, and cell types for RA risk loci. Finally, we functionally verified enhancerpromoter interactions by CRISPR-Cas technology and showed transcriptional effects of fine-mapped risk variants in FLS samples from RA patients.

Results
Integration of epigenetic datasets to define the chromatin landscape of FLS As a first step in our analysis, we generated diverse epigenetic and transcriptomic datasets from our primary FLS samples (Additional file 1: Table S1): chromatin immunoprecipitation sequencing (ChIP-seq) for six histone marks (H3K4me3, H3K4me1, H3K27me3, H3K36me3, H3K27ac, H3K9me3), Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq), cap analysis gene expression sequencing (CAGEseq), chromatin conformation analysis (HiC, Capture HiC), and RNA sequencing (RNA-seq) (Additional file 2: Table S2, quality control metrics in additional file 3: Dataset S1, details in Online methods). For Capture HiC (CHiC), prey fragments containing previously reported lead SNPs at RA loci were used [2] (details in Online methods). We integrated these datasets and assigned 18 pre-trained chromatin states to the genome of FLS using ChromHMM [13]. We identified A/B compartments and Topologically Associating Domains (TADs) and determined significant chromatin interactions. Finally, we incorporated RNA-seq data from FLS. These analyses provided a comprehensive annotation of the epigenome and transcriptome of FLS (Fig. 1a, b).
We cross-validated the individual datasets to confirm the quality of the generated FLS data. As expected, open chromatin regions showed high enrichment of promoters (transcription start sites [TSS]) and active enhancers (Fig. 2a). CHiC interactions were enriched for promoters (TSS), sites of transcription, and enhancers ( Fig. 2b). At TAD boundaries, transcription and promoter states were enriched (Fig. 2c). Basal gene expression was highest in active TSS (Fig. 2d). Taken together, these analyses validated that we accurately captured chromatin states and chromatin interactions in FLS and that we have generated a comprehensive epigenetic and transcriptomic map of FLS genomes.

TNF induces changes in chromatin organization that correspond to altered gene expression in stimulated FLS
To explore the effect of a pro-inflammatory environment on the chromatin landscape and transcriptional regulation of FLS, we performed HiC, CHiC, ATAC-seq, and RNAseq experiments in FLS with and without stimulation with TNF (Additional file 2: Table S2).
We first computed changes in A/B compartments, which are large, cell-type-specific organizational units of the genome, associated with chromatin activity (A = open chromatin, B = closed chromatin) [14]. 94.8% of A and 95.7% of B compartments were consistent between basal and stimulated FLS. One of the genomic regions that changed from inactive to active after TNF stimulation contained RA-associated variants that interact with the TNFAIP3 gene. Small changes in A/B compartments after stimulation are expected, as A/ B compartments infer chromatin activity at DNA segments in low resolution.
To increase the resolution, we used TADcompare [15] to explore the influence of TNF on the organization of TADs in FLS. Genes within the same TAD tend to be coregulated and gene promoters and enhancers often interact within the same TAD [16]. Between our conditions, we identified an average of 4116 TAD boundaries in FLS samples. While 79% of TAD boundaries were unchanged between basal and stimulatory conditions, 21% of differential TAD boundaries exhibited a change in position or strength (Fig. 3a).
By analyzing CHiC data (details in Online methods), we observed around 800 quantitatively differentially interacting regions between basal and stimulated FLS. The intensity of the differential interactions between the regions correlated with the fold change of expression of the interacting genes (Fig. 3b). Notably, interaction strength increased after stimulation for genes with differential expression, irrespective of whether expression increased or decreased after stimulation, thereby suggesting that chromatin interactions influence activating and repressive TNF transcriptional responses in FLS (Fig. 3b).  To further explore the regulation of gene transcription after TNF stimulation, we focused on CHiC baits and prey that exhibited increased interaction strength with genes regulated after stimulation. We overlapped these regions with open chromatin peaks in stimulated cells. We then used Hypergeometric Optimization of Motif EnRichment (HOMER) to detect known transcription factor binding sites (TFBS) or DNA motifs with high similarity to known TFBS (de novo DNA motif discovery), that were overrepresented at the sites with open chromatin, increased chromatin interactions, and differential gene expression (Fig. 3c).
Enrichment analysis of known TFBS in open chromatin identified TPA response elements (TREs; TGA(G/C)TCA) as the most enriched motif in the data sets with increased as well as decreased gene expression (Additional file 4: Dataset S2). TPA response elements serve as canonical binding sites for the subunits of the Activator Protein-1 (AP-1) transcription factor. Open chromatin sites with increased CHiC interactions, but decreased gene expression in stimulated FLS were additionally enriched for BACH2 (broad-complex-tramtrack-bric-a-brac and Cap'n'collar homology 2) binding sites (Additional file 4: Dataset S2). De novo DNA motif discovery in the dataset with Fig. 3 Effect of TNF stimulation on the chromatin landscape in FLS. a Comparison of TADs boundaries between basal and TNF-stimulated FLS by TADCompare. Number of non-differential (green) and differential TAD boundaries is shown. Differential TAD boundaries are classified as boundary position changes (complex, merge, shifted, split) or strength change (differential boundary magnitude). Complex, merged, and split boundary changes represent the most disruptive changes of the 3D structure of the genome. b Pearson correlation of the loop intensity as determined by CHiC with change in the expression of nearby genes (log fold change). Negative numbers in the x-axis indicate downregulation, positive numbers upregulation of gene expression. c Graphical representation of the RNA-seq, ATAC-seq, and CHiC data integration to identify transcription factor binding sites in TNF-stimulated FLS. This figure was created using BioRender. d De novo DNA motif discovery identified two motifs (motif 2 and motif 3) with high similarity to the binding sites of homeobox (TAATTA) and forkhead box transcription factors (TAAA) in the dataset with TNF-repressed genes.
decreased levels of gene expression after TNF stimulation showed enrichment for two DNA motifs with high similarity to binding sites for several developmental transcription factors from homeobox and forkhead box protein families (Fig. 3d).
In summary, by combining CHiC, ATAC-seq, and RNA-seq analyses, we showed that the FLS genome exhibits changes in 3D structure upon TNF stimulation. We confirmed the activating and repressive actions of AP-1 in regulating the TNF response of FLS and we suggest that developmental transcription factors can serve as potential novel repressors of transcriptional response to TNF in FLS.

FLS and immune cells are drivers of RA heritability
We used the generated knowledge on regulatory DNA elements in FLS to quantify the heritability of RA that can be attributed to active regulatory DNA elements in FLS. We considered RA risk loci attaining genome-wide significance (p < 5 × 10 −8 ) in the European ancestry component of the largest published trans-ethnic RA GWAS metaanalysis [2] and computed the partitioned heritability [17] in FLS and other cell types (HLA regions excluded; details in "Methods" section). Epigenetic data for non-FLS cell types were acquired from published datasets [11]. We defined active regulatory elements of the genome as the union of H3K4me1, H3K4me3, and H3K27ac peaks, as these histone modifications are associated with transcriptional activity and enhancer/ promoter elements. With this approach, we estimated that 12-24% of the non-HLA RA heritability can be attributed to the active DNA regulatory elements in FLS samples (Fig. 4a). This analysis showed that both immune cells and FLS mediate the effects of association signals and contribute notably to the heritability of RA.
Epigenetic annotation of fine-mapped SNPs in immune cells and FLS refines the putative causal credible set SNPs for more than 30% of the RA risk loci We then aimed to further characterize the RA SNPs in active DNA regulatory regions in FLS (Additional file 5: Fig S1). We first used approximate conditional analyses implemented in genome-wide complex trait analysis (GCTA) [18] to dissect the previously identified RA risk loci [2]. Where lead SNPs at genomic loci mapped within 1 Mb of each other, the loci were merged (Additional file 6: Table S3). We identified 73 distinct signals of association with RA at locus-wide significance (p < 10 −5 ), with each signal being potentially driven by different underlying causal variants (Additional file 7: Table  S4). For each signal, we performed fine mapping to derive credible SNP sets that together account for ≥ 99% of the posterior probability of causality for the RA association. Across all 73 signals, the RA credible SNP sets included a total of 8787 variants, of which 2654 variants had posterior probability of causality > 0.01% (Additional file 8: Table S5; Additional file 5: Fig S1).
We then overlapped the 2654 RA credible SNPs with the FLS epigenome and identified 274 SNPs within 23 associated signals mapping to active DNA regulatory elements in FLS (Fig. 4b). We also calculated the total posterior probability across the credible SNP sets found within active DNA regulatory elements for 111 primary cell types and tissues, whose epigenomes were published by the Roadmap Epigenomics Mapping Consortium [11] (Fig. 4b, Additional file 5: Fig S2). As expected, several credible SNP sets exhibited high posterior probability in active DNA regulatory elements from B and T  Table S5 column R). Intriguingly, we identified several credible SNP sets that were active in FLS only, but not in B and T cells (n = 9 signals; Fig. 4b, Additional file 8: Table S5 column R). Based on our genetic fine-mapping analysis, we assigned the association signals to three categories (Additional file 8: Table S5 column K, Table 1). First, well characterized signals ("category 1," n = 19), where the credible set included ten or fewer SNPs or ≤ 3 SNPs contributing > 80% of the posterior probability of causality. Second, localized associated signals ("category 2", n = 26), where the credible set included ≤ 20 SNPs with similar low posterior probabilities (10-20%) or the lead credible SNP accounted for > 20% of the posterior probability. Third, poorly characterized signals ("category 3," n = 28), where genetic fine mapping was largely ineffective, resulting in large (> 20) credible SNP sets with equally negligible posterior probabilities (< 5%) (Additional file 8: Table S5 column K, Table 1).
By mapping the credible SNP sets to the annotated active promoters and enhancers in T cells, B cells, and FLS, we further refined nine of 19 category 1 loci to ≤ 3 credible SNPs in active enhancers in either immune cells (n = 5 signals), FLS (n = 1 signal), or both (n = 3 signals) ( Table 1, Additional file 8: Table S5 columns L, M, N). Similarly, we narrowed down the number of putative causal SNPs to ≤ 3 for 18 of the 26 category 2 signals, after mapping enhancer marks to the credible set SNPs in immune cells (n = 7 signals), FLS (n = 3 signals), or both (n = 8 signals) ( Table 1, Additional file 8: Table  S5 columns L, M, N). Thus, by integrating genetic fine mapping with functional chromatin annotation in immune cells and FLS, we identified 27 association signals (37%) that harbor ≤ 3 putative causal RA risk variants having high posterior probabilities and mapping to cell type-specific active enhancers. Examples of the functional genome organization at category signals 1-3 in FLS are shown in Additional file 5: Fig S3-S5.
Integrative analysis of genetic, expression, and epigenetic datasets links putative causal genes and cell types We then used our genetic fine mapping and epigenetic datasets to determine candidate effector genes (proximal and interacting in FLS/immune cell types) and their expression in relevant cell types.
In total, 9 of the 73 signals were assigned exclusively to FLS, with 2 further signals assigned to FLS and B cells, and 12 to all three analyzed cell types based on SNPs in cell-type-specific enhancers (Table 2, Additional file 8: Table S5 column based on O, P, Q, labelled in column R). To assign putative target genes to the association signals in FLS, we identified significant CHiC interactions between the regions containing a (See figure on previous page.) Fig. 4 Heritability and causal SNPs in FLS. a Partitioned heritability (h2) of RA attributed to active regions in each sample of FLS (n = 7) and 111 available Roadmap cell types/tissues (Epigenomics Mapping Roadmap Consortium [11]). b The sum of posterior probability overlapping active DNA regulatory elements across blood and T cell samples (Epigenomics Mapping Roadmap Consortium; green bar), FLS samples (red bar), human stem cells (HCS), and B cells (Epigenomics Mapping Roadmap Consortium; blue bar) at each of the 73 sites. Active DNA regulatory elements were defined as the union of H3K4me3, H3K4me1, and H3K27ac marks Table 1 Identified risk loci for RA with less than three credible SNPs and the cell type in which they are active  Table 1 Identified risk loci for RA with less than three credible SNPs and the cell type in which they are active  Table 1 Identified risk loci for RA with less than three credible SNPs and the cell type in which they are active    credible SNP set (CHiC baits, see details in "Methods") and a gene promoter. We defined gene promoters by downloading all transcripts from Ensembl (version 98) and assigning a 1000-base pair window directly upstream of each transcript as a promoter.
In total, we determined 220,000 promoters for 57,602 genes, including noncoding RNA. Across the 73 signals of association, gene target assignments yielded a total of 228 and 227 interacting, expressed FLS target genes in basal and TNF-stimulated conditions, respectively, with 188 gene targets shared between the conditions (Additional file 8: Table S5 columns W and X). Credible SNPs in category 1 and 2 signals found predominantly in FLS implicated genes including GRHL2, MYCBP, and RUNX1 ( Table 2, Additional file 5: Figure S3). FLS-assigned genes that were associated with category 3 association signals, which showed negligible posterior probability (< 2%) in immune enhancer SNPs, included SPRED2, RCAN1, CDK6, and RBPJ ( Table 2, Additional file 5: Figure S5). Notably, the 24 credible SNPs in the RBPJ association signal and the 41 credible SNPs in the CDK6 association signal were reduced to just six and three SNPs, respectively, mapping to FLS-specific enhancers. The RBPJ SNPs were localized in FLS-specific enhancers, with none found in T or B cells (Additional file 8: Table S5, rs11933540). This indicated that the putative causal SNPs in the RBPJ association signal might specifically affect the function of FLS in RA.
We then integrated the credible set SNPs with our previously established CHiC dataset from B cell (GM12878) and T cell (Jurkat) lines [12,19]. We found that the RA credible sets assigned to immune cell types associated with genes that are vital in T and B cellspecific activities (Table 3, Additional file 8: Table S5 columns AA to AD). Genes in category 1 and 2 signals, which associated with active immune cells enhancer regions, included CTLA4, IL2RA, and GATA3 for T cells and BLK for B cells (Table 3, Additional file 8: Table S5 columns AA to AD). Of note, the ANKRD55/IL6ST locus (rs7731626 in Additional file 8: Table S5) had a single SNP in the credible set, an eQTL with both ANKRD55 and IL6ST [20] confined to an enhancer exclusive to T cells in our analysis. Immune cell-assigned genes from category 3, where credible SNPs in immune enhancers accounted for > 30% of the posterior probability, but had negligible posterior probability (< 5%) in FLS enhancers, included STAT4, CXCR5, CD28, and MYC.
These analyses highlighted a number of SNP-enhancer-gene combinations that could be assigned to an immune cell or fibroblast-driven risk of developing RA. We were able to assign > 60% of the non-HLA RA association signals with a putative causal cell type (FLS, B cells, T cells) and putative causal gene (Additional file 8: Table S5 column R not "none"). Compared to previous gene assignment results [2], our method provides empirical evidence for an additional 104 RA-associated genes at the 73 European association signals.
TNF-induced alterations in 3D chromatin structure assign additional RA risk genes to FLS At 17 of the 73 associated signals, we observed a change in chromatin interactions in stimulated FLS, which were linked to 35 genes (Table 4). RNA-seq showed that the expression of 17 of the 35 genes was increased upon TNF stimulation in FLS (FDR < 0.05) (e.g., TRAF1, TNFAIP3, IFNAR2) ( Table 4, Additional file 5: Fig S6). Nine of the 35 genes were downregulated after TNF in FLS (FDR < 0.05) (e.g., RBPJ and RNF41) ( Table 4).  Since TADs have been shown to define probable limits of gene regulation, we also overlapped TADs with the association signals. We observed that each credible SNP set is usually found within one or two adjacent TADs. We then Genes with a log2fold change > ± 0.5 and padj < 0.05 are marked in bold. lfcSE standard error for log2 fold change, stat statistic value for the null hypothesis, padj p value adjusted for multiple testing using Benjamini-Hochberg examined the genes within these TADs in basal and stimulated FLS and found that alterations in TAD boundaries after stimulation led to different genes being overlapped by associated TADs. Genes found within stimulation-specific TADs included TNFAIP3, JAZF1, ZFP36L1, INFGR1, and LBH. Several of these genes, including TNFAIP3, JAZF1, IFNGR1, and LBH, also showed differential gene expression between basal and stimulated states ( Table 5, Additional file 8: Table S5 columns AH and AI). The correlated change in chromatin structure, interaction strength of RA implicated regions, and gene expression upon stimulation demonstrated how these loci are dynamic and active in FLS and suggests that RA-associated variants could affect the transcriptional response to TNF in FLS. The intergenic region on chromosome 6q23 between OLIG3 and TNFAIP3, which contains eight credible SNPs (rs17264332 in Additional file 8: Table S5), was dynamically linked to the TNFAIP3 gene through DNA accessibility, chromatin interactions, and gene expression. The organization of this genomic region changed from a closed, inactive (compartment B) to an open, active chromatin conformation (compartment A) upon TNF stimulation of FLS (Fig. 5a), and TAD boundary strength increased in TNF-stimulated FLS (Table 5). These substantial alterations to the chromatin organization coincided with a strong increase in the expression of the interacting TNFAIP3 gene in FLS (Fig. 5a, Table 4, Additional file 5: Fig S6).
Similarly, we demonstrated stimulation-induced changes in chromatin activity in the IFNGR2 region (rs73194058, Additional file 8: Table S5) in FLS. Our CHiC analysis showed that the credible set SNPs in this region interacted with several genes relevant to the interferon (IFN) pathway, such as IFNAR2, IL10RB, IFNA R1, and IFNGR2 (Additional file 8: Table S5 columns W to Z and Fig. 5b). TNF stimulation of FLS induced dynamic changes in chromatin interactions at this locus and increased the expression of IFNAR2, IFNAR1, and IFNGR2 (Additional file 5: Fig S1, Table 4, Fig. 5b). Additionally, chromatin accessibility in the region of IFNAR2 changed from an inactive B to an active A compartment in stimulated FLS (Fig. 5b).
The TNFAIP3/IFNGR1 region on chromosome 6 and the IFNAR1/IFNGR2 region on chromosome 21 interacted with genes encoding five subunits of the IFN I/III receptors in FLS (Fig. 5a,b), suggesting a close genetic link between FLS function and IFN response in RA.

RA functions
To predict biological processes influenced by potential transcriptional effects of risk variants active in FLS, we conducted analyses to predict protein-protein interaction, pathway enrichment, and functional annotation clustering. For these analyses, we included all target genes of the RA association signals that were assigned to FLS as a causal cell type ("All" and/or "FLS" in column R of Additional file 8: Table S5).
We found significantly enriched protein-protein interactions for the genes in the loci active in FLS by using STRING protein-protein interaction networks (PPI enrichment  Table 4). b The genomic IFNGR2 region of the credible SNP set on chromosome 21 (rs73194058 in Additional file 8: Table S5, red arrow) interacted with several nearby genes. These interactions were further enhanced by TNF stimulation (see also Table 4). Chromatin at the IFNAR2 gene locus changed from a closed (light red bar) to open (blue bar) state in stimulated FLS. RNAseq tracks show one randomly chosen RA hand, RA knee, and RA shoulder sample with (black) or without (gray) TNF stimulation p value < 1.0e−016; Fig. 6a) and identified additional functional connections between the assigned genes by literature search. For instance, ZFP36L, CDK6, and RUNX1 were all assigned to signals active in RA (Additional file 8: Table S5 column R), are functionally connected, and regulate cell proliferation. CD40 (rs4239702 in Additional file 8: Table S5), RBPJ (rs11933540 in Additional file 8: Table S5), and TRAF1 (rs10985070 in Additional file 8: Table S5) may constitute another genetically influenced interlinked functional network in FLS.
Gene Ontology (GO) molecular function analysis (Fig. 6b) and functional annotation clustering of enriched pathways with the genes associated with credible set SNPs in Fig. 6 Predicted functional networks of genes that were associated with SNPs active in FLS. a A proteinprotein interaction network was established using STRING with default settings (medium confidence). The obtained network had more interactions than expected by chance with a protein-protein-interaction enrichment p value of 1.28e−08. The thickness of the lines indicates the strength of data support. Colors, distances, and location on the map were assigned randomly. b Functional enrichment of genes interacting with SNPs active in FLS was detected using ToppFun in default settings. Significant terms for GO molecular function are shown. FDR = false discovery rate; B&H = Benjamini-Hochberg ; B&Y = Benjamini-Yekutieli FLS (Additional file 8: Table S5 column R and columns U-Z) revealed several clusters highly relevant to RA pathogenesis. These clusters included enrichment of genes involved in IFN response and viral defence (IFNAR1, IFNAR2, CD40, IFNGR, C5, IL-10RB) (Database for Annotation, Visualization and Integrated Discovery DAVID [21,22] enrichment score 1.22), as well as lipid metabolism and fatty acid synthesis (FADS1-3, ACOX2, LCLAT1, JAZF1, DAGLA) (DAVID enrichment score 1.91). In addition, "cilium morphogenesis" emerged as an enriched term (DAVID enrichment score 0.87) and several genes associated with RA risk SNPs in FLS were connected to the formation of the primary cilium (C5orf30, GSN, TMEM138, TMEM216, CNTRL, INCENP, ACTR2).
Overall, by integrating epigenetic and transcriptional data in FLS, we identified several functional relationships among RA risk variants and their target genes active in FLS. The multi-level effects of RA risk variants on key signalling pathways may contribute to the accumulated genetic risk in driving FLS activation and proliferation in RA.

RA risk SNPs in the RBPJ enhancer region confer joint-specific genetic effects in FLS
Our epigenetic and functional analyses of the RBPJ association signal identified RBPJ as a candidate causal and functional gene in FLS (rs11933540 in Additional file 8: Table  S5). Mapping of the 24 credible SNPs to FLS enhancers in the RBPJ association signal reduced the number of likely causal SNPs to six, which lie in active chromatin in FLS (Additional file 8: Table S5 rs11933540). To confirm enhancer activity of these regions, we selected three SNPs in three different regions within the RBPJ association signal (rs7441808, rs35944082, rs874040, Fig. 7a). We cloned oligonucleotides (31 bp) with the respective risk and non-risk variants in the middle into luciferase promoter vectors and transfected them into a human fibroblast cell line (HT1080). All three regions showed enhancer activity; however, luciferase activity was similar for both alleles (Fig.  7b). For rs874040, chromatin conformation analysis showed direct interactions with the RBPJ gene (Fig. 7a). To functionally establish that the rs874040-containing enhancer region can regulate the expression of RBPJ, we transduced FLS with lentiviral particles containing dCas9-VPR and two guide RNAs (g9 or g12) targeting the rs874040containing enhancer region (Additional file 5: Fig S7). FLS transduced with the activating dCas9-VPR and guide RNAs increased the expression of RBPJ compared to FLS transduced with the respective guide RNAs without dCas9-VPR (Fig. 7c). Even though the upregulation of RBPJ expression was modest (30%), which could be due to enhancer redundancy in this region and is consistent with previous data showing the limited efficiency of enhancer activation with the dCas9-VPR system [23], this experiment verified the regulation of RBPJ expression by the rs874040-containing enhancer region.
FLS homozygous for the risk allele of rs874040 exhibited lower expression of RBPJ mRNA compared to FLS with the wild-type variant. This effect was, however, present only in FLS from upper extremity joints, and not from lower extremity joints (Fig. 7d). It is known that RBPJ binds to the promoter of HES1 and represses its transcription [24]. Accordingly, the expression of HES1 was increased in FLS from patients homozygous for rs874040 in upper extremity joints (Fig. 7e). TNF stimulation significantly downregulated RBPJ mRNA expression in FLS (Table 4), and FLS from RA patients expressed less RBPJ than FLS from patients with arthralgia (Fig. 7f). These data indicate Fig. 7 RBPJ expression in FLS is affected by genotype and disease. a Fine mapping, epigenetic and chromatin conformation analyses at the RBPJ locus. Black arrows indicate rs7441808, rs35944082, and rs874040, which were selected for further analysis. b Luciferase reporter assays showing relative enhancer activity of oligonucleotides containing risk (gray) and wild-type variants (black) of rs7441808, rs35944082, and rs874040 compared to empty vectors (set to 1). One sample t test. c RBPJ expression in FLS transduced with VP64-p65-Rta dCas9 (VPR) and two different guide RNAs (g9 and g12) targeting the genomic region around chr4:26106575 (rs87040). RBPJ expression was normalized to FLS that were transduced with respective guide RNAs but not VPR-dCas9 (set to 1). -deltaCt = cycle of threshold of RBPJ expression-cycle of threshold RPLP0. One sample t test. d RBPJ expression in FLS isolated from individuals homozygous for rs874040 in the locus near the RBPJ gene (0), heterozygous (1), or homozygous for the wild-type variant (2). Upper extremity joints included joints of the hand, elbows, and shoulders; lower extremity joints included hips, knees, and joints of the feet. One-way ANOVA. e Expression of HES1 in the same FLS cohort. One-way ANOVA. f RBPJ expression in individuals with joint pain, but no histological signs of arthritis (arthralgia), OA and RA. One-way ANOVA with Bonferroni correction that genetic predisposition and a pro-inflammatory environment can affect RBPJ expression in FLS, which might lead to increased activation of the Notch signalling pathway via HES1.
To explain the joint-specific effect of rs874040, we explored the enhancer landscape and the chromatin interactions in different upper and lower extremity joints. CAGEseq data showed that the enhancer activity within the RBPJ locus is higher in knee FLS compared to shoulder and hand FLS (Fig. 8a). ATAC-seq peaks largely overlapped with CAGE-seq enhancer signals, being more abundant in knee FLS than in shoulder or hand FLS (Fig. 8b). Shoulder FLS appeared to mainly use an upstream enhancer that interacted with the RBPJ risk locus (green boxes, Fig. 8a-c). Overlap of ATAC-seq and , and shoulders (n = 2) and representative depiction of ChromHMM regulatory regions in FLS from knees (n = 3), MCP joints (n = 2), and shoulders (n = 2). Green box highlights main enhancer in shoulders. c Chromatin interactions in FLS from knees (n = 2), MCP joints (n = 2), and shoulders (n = 2) as measured by CHiC. Green box highlights interaction of the shoulder enhancer with the risk locus CAGE-seq analyses was weaker in hand FLS, but CAGE-seq data indicated that hand FLS used an enhancer within the risk locus (red box, Fig. 8a). Additionally, chromatin interactions within the locus were generally weaker in hand FLS (Fig. 8c). Knee FLS activated several enhancers (Figs. 8a,b) and exhibited strong chromatin interactions across the locus (Fig. 8c). We analyzed DNA-binding motifs in the enhancer overlapping the RA risk locus spanning chr4:26090045-26090465 (hg19) (red box in Fig. 8a) by using the JASPAR2020 database [25]. This enhancer contained TFBS for different HOX transcription factors (HOXA6, HOXA7, HOXA10, HOXB2, HOXB6, HOXB7, HOXB13,  HOXD3, HOXD9, HOXD13), similar to the DNA motifs identified in open chromatin at repressed genes after TNF stimulation in FLS (Fig. 3d), which are expressed in a joint-specific manner in FLS [26].
Together, these data suggest that joint-specific differences in chromatin interactions and enhancer usage could underlie the joint-specific effects of rs874040 on RBPJ expression in upper extremity joints. This illustrates that RA genetic risk can be different between the joints, thereby shaping a specific pattern of joint involvement in RA.

Discussion
Deciphering the role of causal genetic variants underlying GWAS loci in RA, albeit challenging, provides an unbiased strategy to understand the core disease pathways and guide drug discovery [2,27]. Here we demonstrate that a significant proportion of the 73 European ancestry non-HLA RA association signals contain disease-associated variants that are located within active regulatory DNA elements in FLS. Linking these DNA regions with target genes indicates genes and biological pathways that trigger RA susceptibility by stromal cell activation in the joint. Thus, we provide for the first time substantial evidence for an independent, causal role of FLS in RA genetic susceptibility and pathogenesis.
As a first step in our analysis, we created a comprehensive map of the epigenetic landscape of FLS with and without TNF stimulation and assessed key regulators of the transcriptional response of FLS to TNF. By increasing the resolution of our measurements from A/B compartments to TADs and chromatin interactions, we found substantial changes in 3D structure of the FLS genome upon TNF stimulation. We confirmed AP-1, which is intimately linked to the pathogenesis of RA [28], as major transcription factor regulating changes in gene expression of FLS after TNF. AP-1 binding sites were enriched at enhancer sites of genes with increasing as well as repressed gene expression in TNF-stimulated FLS. This is in line with findings that different subunits of the AP-1 family form homo-and heterodimeric transcription factor complexes with distinct activating and repressing functions [29]. Additionally, our data suggest that BACH2 may play a notable role in regulating the TNF response of FLS in RA. Like AP-1, BACH2 belongs to the basic region leucine zipper (bZIP) family, but has a slightly different DNA sequence binding site (TGCTGAGTCA) and has a bric-a-bractramtrack-broad-complex (BTB) domain, which specifically interacts with co-repressors to repress transcription [30]. BACH2 is a highly conserved repressor with a central function in terminal differentiation, maturation, and activity of B and T cells [31]. Intronic SNPs within the BACH2 gene have been associated with the risk of different immune-mediated diseases, including RA [32,33]. Furthermore, de novo motif discovery indicated a potential role for developmental transcription factors of the homeobox and forkhead box protein families in transcriptional repression of genes in TNFstimulated FLS. Since some of these transcription factors are exclusively expressed in FLS at distal joint locations (HOXA13, HOXD13) [8], this suggests that TNF responses of FLS may be different at specific joint locations. However, de novo DNA motif discovery algorithms have some pitfalls. Due to the non-random nature of the genomic sequence, the rate of false positives is high and partial overlap of enriched motifs with known transcription factor binding motifs may be coincidental. Thus, these results should be considered with caution.
With our approach, we were able to assign RA association signals to immune and/or stromal cells. Genes implicated in T cells, but not in FLS, showed a pattern of involvement in "canonical" T cell immunity, including CTLA4, CD28, IL2RA, and GATA3. Similarly, genes enriched in B cell-specific enhancers were involved in B cell biology, including IRF8, BLK, and TAB1. The stromal activation observed in RA joints was clearly reflected in the predicted function of the identified FLS-specific regulatory variants, many of which were previously associated with RA pathogenesis and are connected by functional networks. For example, several genes linked to RA credible SNPs in FLS were implicated in cell proliferation and tumor development (e.g., SPRED2 [34], GRHL2 [35], CDK6 [36], RUNX1 [37], and ZFP36L [38]). The transcription factor ZFP36L negatively regulates the expression of CDK6 [39] by binding to the 3′UTR region of the CDK6 gene, which contains the credible set SNPs at this locus. CDK6 in turn interferes with DNA binding of Runx1 [40]. Furthermore, CD40 activation in FLS increased the expression of several cytokines relevant in RA, including VEGF and RANKL [41,42]. RBPJ, a regulatory transcription factor of the Notch signalling pathway, has been shown to repress the activation of CD40 [43]. Similarly, TRAF1 can negatively regulate CD40 activity [44].
Most notably, the association signals on chromosome 6 (TNFAIP3/IFNGR1) and chromosome 2 (IFNAR/IFNGR2) could critically impact the contribution of FLS to the development of RA. Chromatin interactions in these regions connected RA risk variants with several genes encoding the subunits of type I (IFNAR1, IFNAR2), type II (IFNGR1, IFNGR2), and type III (IL-10RB) interferon receptors in FLS. Furthermore, they tightly linked the IFN response to TNF stimulation by interconnection with the TNFAIP3 gene, encoding the TNF signalling repressor A20, and by their reaction to TNF stimulation. All three types of interferons signal via the JAK-STAT signalling pathway, which, along with TNF, is one of the central therapeutic targets in RA. IFN pathways are strongly associated with the pathogenesis of RA and IFN-responsive genes are induced in FLS upon stimulation with TNF [45]. A type I interferon gene signature is detectable in up to two thirds of patients with RA [46], and it associates with an increased risk of developing RA as well as with therapeutic response to biological DMARDs like TNF inhibitors [47,48]. In FLS, TNF induces an extensive interferon gene response via secondary autocrine production of IFNβ and the activation of the IRF1-IFNβ-IFNAR-JAK-STAT1 axis [49,50]. Down syndrome (trisomy 21) leads to increased dosage of the IFN receptors encoded on chromosome 21, which results in a type I interferon gene signature with constant activation of interferon pathways in fibroblasts [51]. Notably, people with Down syndrome are at increased risk of developing erosive, inflammatory seronegative arthritis of their hands and wrists [52]. Together, this strongly suggests a causal role for stromal activation of IFN pathways in the development of RA.
We showed that RA risk allele rs874040 is associated with reduced expression of RBPJ in FLS in a location-specific manner. RBPJ, also called CBF1 or CSL, is a key transcriptional regulator of the Notch signalling pathway [53]. In the absence of Notch signalling, RBPJ represses Notch target genes (e.g., HES1). Upon activation of Notch signalling, RBPJ binds to the intracellular domain of the Notch receptor and enhances Notch-dependent gene expression. Loss of RBPJ leads to activation of dermal fibroblasts and promotes their transformation into cancer-associated fibroblasts (CAFs), which play a crucial role in tumor development and growth [54,55]. Activation of Notch signalling was shown in RA FLS and induced FLS proliferation [56]. Furthermore, Notch signalling is critical for shaping the synovial environment by guiding the development of THY1+ sublining FLS, a subset of FLS that is expanded in RA synovial tissues [57]. Constitutive lower levels of RBPJ in FLS from individuals carrying the RBPJ risk variant could favor synovial enrichment of THY1+ sublining FLS, which are considered critical for the development of RA. Joint-specific differences in the chromatin landscape in this locus exemplify how genetic risk could result in the specific patterns of joint involvement that typically occur in chronic inflammatory joint diseases. Additionally, joint-specific expression of HOX transcription factors [8], for which we suggest a role in gene repression after TNF stimulation in FLS, could contribute to jointspecific differences in the susceptibility to RA.
Further pathways that we found enriched in genes targeted by RA credible SNPs were connected to lipid metabolism and the primary cilium. FADS1 and 2 have been implicated in the production of anti-inflammatory unsaturated fatty acids in LPS-treated macrophages, contributing to the resolution phase of LPS-driven inflammatory response in macrophages [58]. Changes in the lipid metabolism have been suggested in RA FLS [59], but specific functional data does not exist so far. The primary cilium serves as a hub for several cell signalling pathways, e.g., Notch [60] and wnt signalling [61]. In FLS, it was shown that TNFR1 and TNFR2 localize to the cilium pit [62]. The cilium connected proteins C5orf30 and GSN that we found interacting with RA risk variants in FLS were previously shown to be negative regulators of arthritis in mice [63,64]. Another ciliary protein, SPAG16, was found to be a genetic risk factor for joint damage progression in RA patients, increasing the production of matrixmetalloproteinases in FLS [65]. Future studies are required to demonstrate how changes in lipid metabolism and primary cilium affect the function of FLS and influence RA pathogenesis.
With our approach, we could connect several RA association signals with potential pathogenic genes and pathways in RA FLS. However, our data does not allow any conclusion on differences in the chromatin landscape that might exist between healthy FLS and RA FLS as previously shown [66]. Therefore, we cannot exclude the possibility that there are differences in chromatin interactions and open chromatin between healthy and RA FLS that we cannot detect in our study and that would affect our results. However, the epigenetic landscape in RA FLS might already be changed before the onset of the disease, as shown for changes in DNA methylation [67] and this might be one reason why these genetic risk factors trigger the disease in some people but not in others. To address this, longitudinal studies analyzing the epigenetic landscape at different stages of disease development are needed. Another limitation of our study is the low samples size used in most experiments, which again is due to the more difficult accessibility of synovial tissues compared to, e.g., blood. The statistical analysis must therefore be interpreted with caution.

Conclusions
Overall, our research significantly advances the knowledge about putative causal SNPs, enhancers, genes, and cell types affected by genetic risk loci in RA. Our analysis can direct future studies to investigate pathways that are genetically affected in a cell-typespecific way. This will ultimately enable the connection of an individual's genetic risk with the causal pathways and cell types that drive disease, paving the way to stratified treatment decisions and precision medicine.

Patients and cell culture
Synovial tissues were obtained from OA and RA patients undergoing joint replacement surgery at the Schulthess Clinic Zurich, Switzerland. Patient's characteristics are described in Additional file 1: Table S1. RA patients fulfilled the 2010 ACR/EULAR (American College of Rheumatology/European League Against Rheumatism) criteria for the classification of RA [68]. Samples from patients with joint pain without inflammation or cartilage destruction (healthy, 3 male/3 female, mean age 39, range  were obtained from the Queen Elizabeth Hospital in Birmingham, UK. Synovial tissues were digested with dispase (37°C, 1 h) and FLS were cultured in Dulbecco's modified Eagle's medium (DMEM; Life Technologies) supplemented with 10% fetal calf serum (FCS), 50 U ml −1 penicillin/streptomycin, 2 mM L-glutamine, 10 mM HEPES, and 0.2% amphotericin B (all from Life Technologies). Purity of FLS cultures was confirmed by flow cytometry showing the presence of the fibroblast surface marker CD90 (Thy-1) and the absence of leukocytes (CD45), macrophages (CD14; CD68), T lymphocytes (CD3), B lymphocytes (CD19), and endothelial cells (CD31). Cell cultures were negative for mycoplasma contamination as assessed by MycoAlert mycoplasma detection kit (Lonza). FLS were used between passages 4-8. Information on the assays performed on each sample is given in Additional file 2: Table S2.

RNA sequencing
RNA sequencing data from unstimulated samples (Additional file 2: Table S2) was retrieved from the European Nucleotide Archive (ENA) with the primary accession code PRJEB14422. A detailed description of sample preparation and sequencing procedures is given in Frank-Bertoncelj et al. [8]. For RNA sequencing of TNFstimulated FLS, cultured FLS were treated with 10 ng/ml human recombinant TNF (Roche) for 24 h or were left untreated. Total RNA was isolated using the RNeasy Mini kit (Qiagen) including on-column DNAase I digestion. Part of the libraries (n = 12) were prepared using the NEB Next Ultra Directional RNA-seq protocol with ribosomal depletion and were sequenced using Illumina HiSeq4000 with 75 bp paired end reads. The additional libraries (n = 20) were generated using the Illumina TruSeq Stranded total RNA protocol with the TruSeq Stranded total RNA Sample Preparation Kit and were sequenced using Illumina Novaseq 6000. All Fastq-files were mapped to hg19 and sequence reads assigned to genomic features using STAR [69] and featureCounts [70], respectively. We used svaseq R [71] package (version 3.36.0) to find and remove hidden batch effects. Differential gene expression analysis was performed with DESeq2 [72] R package (version 1.28.1) according to standard protocol. The normalization was performed as part of the standardized DESeq2 workflow (applying the concept of variance stabilizing transformations (VST) [73,74].

ChromHMM chromatin state inference
The de-duplicated, uniquely mapping reads of the ChIP sequencing were binarized with the BinarizeBam script provided by the chromHMM software [13]. This script splits the genome into 200 bp bins and compares the coverage of the alignment file at each bin with the input sequence file to determine if any histone modification is present in the bin (1 = yes, 0 = no). The pre-trained 18 state chromHMM model based on the six histone marks was applied to the binarized bed files, using the MakeSegmentation script provided and the model parameters downloaded from the Roadmap Epigenomics web portal. The methods employed by Ernst et al. [13] were replicated where possible from the data processing stages to the chromatin state inference.

ATAC sequencing
Cultured RA FLS were stimulated with 10 ng/ml TNF for 24 h or were left untreated (Additional file 2: Table S2). From each patient cell line, 50'000 cells were prepared according to the protocol by Buenrostro et al [76]. ATAC-seq libraries were sequenced on Illumina HiSeq 4000 with 75 bp paired end reads. The reads were QC'd with FastQC for read quality, and the Nextera-transposase adaptors were trimmed with cutadapt [77]. The reads were aligned with Bowtie 2 to the GRCh38 human reference. PCR duplicates were identified and removed by Picard Tools prior to peak calling using MACS2 [78]. Both broad and narrow peaks were called as ATAC-seq can have properties of both.

HiC and capture HiC
Cultured human FLS from RA patients were treated with 10 ng/ml TNF for 24 h or were left untreated (Additional file 2: Table S2). Cells (1-3 × 10 7 per condition), were scratched in 10 ml DMEM, spun down, suspended in 35 ml DMEM, and fixed (2% formaldehyde in DMEM, 10 min, RT, with mixing on a rocker). The reaction was quenched with cold 0.125 M glycine. Cells were incubated at RT for 5 min, followed by 15 min incubation on ice and centrifugation (1500 rpm, 10 min, 4°C). Pellets were suspended and washed in cold PBS (1500 rpm, 10 min, 4°C). Washed pellets were snap frozen and stored at − 80°C. HiC libraries from RA FLS samples were generated as previously described [14]. They were sequenced on Illumina HiSeq 4000 with 75 bp paired end reads. The reads were processed using the HiC Pro pipeline [79], and the correlation between samples were calculated with HiCrep [80].
TADs were called with TADCompare [15]. The regions targeted by the capture HiC (CHiC) were generated based on the LD regions of the lead disease-associated SNPs for RA, juvenile idiopathic arthritis (JIA), psoriatic arthritis (PsA), and psoriasis (Ps). This resulted in a total of 242 distinct risk variants. Then, 120 bp capture baits were designed for all HindIII digestion fragments overlapping these regions as previously described in Martin et al. [4]. Significant CHiC interactions were identified through the CHiCAGO pipeline [81], where the suggested threshold of CHiCAGO score > 5 was used. Differential interactions were identified with DESeq2, where the read counts of each interaction were treated similar to the gene count of RNA-seq.

Transcription factor binding site prediction
We extracted differentially interacting regions from our CHiC data, where the strength in chromatin interaction (log-fold change of read counts between basal and stimulated) correlated with nearby genes. We overlapped these interaction regions (bait and prey fragments) with our ATAC-seq peaks. These ATAC-seq peaks were standardized and re-centered to 200 bp each. We then used the findMotifsGenome.pl software from the HOMER suite [82] to identify significantly enriched motifs in these ATAC-peaks compared to random background sequences chosen by HOMER.

Partitioned heritability
We defined active chromatin regions of the genome for each FLS sample and publicly available Roadmap samples, based on the union of H3K27ac, H3K4me1, and H3K4me3 histone peaks. We used the partitioned heritability software from the LDSC [17] package to quantify the non-HLA RA heritability attributed to these active regions in each sample, based on the summary statistics from the Okada et al. trans-ethnic metaanalysis [2].

Derivation of RA credible set SNPs
For each locus, we dissected distinct RA association signals using approximate conditioning implemented in GCTA [18], based on (i) European ancestry summary statistics from the Okada et al. trans-ethnic meta-analysis [2]; and (ii) a reference panel of European ancestry haplotypes from the 1000 Genomes Project to approximate linkage disequilibrium between SNPs. We identified index SNPs for each distinct signal, at a locus-wide significance threshold of p < 10 −5 , using the --cojo-slct option. For each locus with multiple distinct signals, we derived the conditional association summary statistics for each distinct signal by conditioning out the effects of all other index SNPs at the locus using the --cojo-cond option.
For each distinct signal, we first calculated the posterior probability, π j , that the jth variant is driving the association, given by where the summation is over all variants at the locus. In this expression, Λ j is the approximate Bayes' factor [83] for the jth variant, given by where β j and V j denote the estimated allelic effect and corresponding variance from the European ancestry component of Okada et al. [2]. In loci with multiple distinct signals of association, summary statistics were obtained from the approximate conditional analysis. In loci with a single association signal, summary statistics were obtained from unconditional analysis. The parameter ω denotes the prior variance in allelic effects, taken here to be 0.04. The 99% credible set for each signal was then constructed by (i) ranking all variants according to their Bayes' factor, Λ j ; and (ii) including ranked variants until their cumulative posterior probability of driving the association attained or exceeded 0.99.

Pathway analysis and protein-protein interaction network
The genes assigned to FLS (Additional file 8: Table S5, column R) and listed in Additional file 8: Table S5, columns U-Z were analyzed by STRINGv11 (interactions settings to medium confidence levels) [84], ToppFun on ToppGene Suite [85], and DAVID v6.8 [21,22] with default settings.

Luciferase reporter assay
Single-stranded oligonucleotides corresponding to 31 nucleotide fragments of the human genome with the variant in the middle including a BamHI and SalI restriction site were purchased (Microsynth). Double-stranded oligonucleotides were generated by mixing equal amounts of complementary oligonucleotides and incubated in a thermocycler for 5 min at 95°C and then slowly cooled to room temperature (− 1°C/min). Double-stranded oligonucleotides were cloned downstream from the luciferase gene in the pGL3-promoter vector (Promega). 8 × 10 4 HT1080 cells were transfected with 1 μg of the pGL3-promoter vector together with 0.1 ng of the pRL-SV40 vector (Promega) using 1.5 μl of Lipofectamine 2000 (Invitrogen). After 18 h, firefly and renilla luciferase activity was measured using the Dual-Glo Luciferase Assay System (Promega). Firefly luciferase activity was corrected for renilla luciferase activity and the data were normalized to cells transfected with the empty pGL3-promoter vector.