Deep molecular profiling of immune cells in the BLUEPRINT Human Variation Panel
The analyses described in this study are based on the publicly available resource provided by the BLUEPRINT Human Variation Panel [24]. The resource contains genome-wide molecular profiles of CD14+CD16− classic monocytes, CD66b+CD16+ neutrophils, and CD4+CD45RA+ naïve T cells. These leukocyte types were chosen due to their important role in mediating immune cell processes, their relative abundance in peripheral blood, allowing for examination of multiple cellular traits, as well as the availability of experimental protocols to prepare cell populations of high purity (>95%). Monocytes and neutrophils are myeloid cells that share the same bone marrow-residing granulocyte-macrophage precursor cell. Monocytes migrate to sites of infection and differentiate into macrophages and dendritic cells to induce an immune response. As part of the innate immune system, neutrophils move within minutes to sites of infection during the acute phase of inflammation. Naïve T cells are lymphoid cells that are part of the adaptive immune system, representing mature helper T cells that have not yet recognized their cognate antigen.
Across an initial cohort of 200 healthy individuals representative of the UK population, purified preparations of these primary cells were probed for gene expression using total RNA sequencing (RNA-seq) and DNA methylation using Illumina Infinium HumanMethylation450 BeadChips (“450 K arrays”). Detailed information about the experimental and analytical strategies for quantifying these cellular traits are provided in the “Methods” section. Additional file 1: Figures S1, S2, and S3 give an overview of the data quality assessment of the gene expression and DNA methylation data sets. All individuals were further profiled for DNA sequence variation using whole-genome sequencing to allow for cell type-dependent, quantitative assessment of the genetic and epigenetic determinants of transcriptional variance [24].
In this study, we exploited this resource, selecting all 125 donors for whom matched gene expression and DNA methylation data sets were available across the three immune cell types. The key analytical advance of the work presented here concerns the measurement and interpretation of differential variability. That is, the identification of loci at which gene expression and DNA methylation levels show significantly greater variation within one cell type compared to the other cell types. An overview of the study design and analytical concept is provided in Fig. 1a.
Genome-wide patterns of differential gene expression variability across immune cell types
We first assessed inter-individual expression variability of 11,980 protein-coding, autosomal genes that showed robust expression in monocytes, neutrophils, and T cells (“Methods”). We applied an improved analytical approach for the assessment of differential variability (“Methods”), taking into account the strong negative correlation between mean gene expression levels and expression variability (Additional file 1: Figure S4).
Figure 1b gives an overview of the number of identified HVGs that are cell type-specific, shared between two of the studied immune cell types, or common to all three. Neutrophils were found to have the largest number of HVGs overall (n = 1862), as well as of cell type-specific HVGs (n = 1163). In contrast, we found only a small number of cell type-specific HVGs in monocytes and T cells (n = 14 and 3, respectively). In addition, we identified 271 genes that were highly variable across all three immune cell types using a rank-based approach (“Methods”). Mature neutrophils (as profiled here) show low proliferative capacity and reduced transcriptional and translational activity [25, 26]. The latter could potentially impede comparable assessment of differential variability if the relationship between variability and mean expression levels was not taken into account. Thus, using our analytical approach, we assessed and confirmed that overall reduced gene expression levels did not technically confound the observed increased variability of gene expression levels in neutrophils (Additional file 1: Figure S4).
We then aimed to replicate the detected HVG levels in an independent sample cohort. We retrieved a gene expression data set generated using Illumina Human HT-12 v4 Expression BeadChips consisting of CD16+ neutrophils derived from 101 healthy individuals; these donors were, on average, 34 years of age (range 19–66 years) and 50% were male [27]. Of the 11,023 gene probes assessed on the array platform, 6138 could be assigned to a corresponding gene identifier in our data set. First, we ranked all 11,980 genes analyzed in our study according to gene expression variability (EV) values from high to low. Then, we assessed the position of the top 100 genes with highest and lowest EV values from the independent validation data in this ranking to confirm that the variability patterns are consistent between the two data sets. Neutrophil-specific HVGs measured using RNA-seq were also found to be hypervariable using expression arrays in the independent cohort of healthy individuals (Fig. 1c, d).
In summary, we devised and assessed a novel method for the identification of differential gene expression variability. Overall, we found strongly increased variability of gene expression in neutrophils compared to monocytes and T cells and replicated the detected neutrophil-specific HVG patterns in an external cohort.
Biological significance of differentially variable genes across immune cell types
Next, we explored the characteristics of the identified HVGs. We performed ontology enrichment analysis of gene sets using the GOseq algorithm [28]. This method takes into account the effect of selection bias in RNA-seq data that can arise due to gene length differences [28]. Additional files 2 and 3 summarize the annotation data of all identified HVGs and observed gene ontology enrichment patterns, respectively.
Genes showing expression hypervariability across all three cell types were enriched in biological processes related to chemotaxis, migration, and exocytosis (Additional file 3). For neutrophil-specific HVGs, we found gene ontology enrichment in oxidoreductase activity and cellular processes related to virus response and parasitism (Additional file 3). Notable genes among those with hypervariable expression values were CD9 (Fig. 2a), CAPN2 (Fig. 2b), and FYN (Fig. 2c). CD9 showed increased variability across all three cell types. The gene encodes the CD9 antigen, a member of the tetraspanin family. It functions as cell surface protein that forms complexes with integrins to modulate cell adhesion and migration and mediate signal transduction [29, 30]. The neutrophil-specific HVGs CAPN2 and FYN encode a calcium-activated neutral protease involved in neutrophil chemotaxis [31] and a tyrosine-protein kinase implicated in intracellular signal transduction [32], respectively.
Taken together, functional enrichment of HVG sets revealed that many of the identified HVGs are involved in mediating immune-related processes. This suggests that neutrophils exhibit specific gene loci that are highly adaptable to external cues.
Determinants of inter-individual cell type-specific gene expression variability
Following the discovery and characterization of genes that present hypervariable expression levels between individuals, we next aimed to delineate potential sources of heterogeneity that can be associated with differences between individuals. We hypothesized that these sources mainly relate to genetic variation, age, sex, and lifestyle factors.
First, we determined the subset of cell type-specific HVGs that correlated with genetic variants. We retrieved gene sets with a local (cis) genetic component designated by expression quantitative trait locus (eQTL) and variance decomposition analyses, as described in the BLUEPRINT Human Variation Panel (Additional file 1: Figure S5a). In neutrophils, we found that 638 of the 1163 cell-specific HVGs (55%) associate with cis genetic variants (Additional file 2), at least partly explaining the observed gene expression variability. These data are consistent with previous reports, highlighting the role of genetic variants in mediating transcriptional variance [33–35].
Second, we correlated cell type-specific HVGs with various quantitative traits measured in individual donors: demographic information (age, body mass index, and alcohol consumption); cellular parameters as assessed by a Sysmex hematology analyzer (e.g., cell count and size); and season (i.e., minimum/maximum temperature and daylight hours of the day on which blood was drawn). The results of this analysis are provided in Additional files 2 and 4. In neutrophils, we identified 49 HVGs that show significant association with at least one of the measured traits (Fig. 2d). For example, we found NFX1, a nuclear transcription factor that regulates HLA-DRA gene transcription [36], to associate with neutrophil granularity (Fig. 2e). An increase in neutrophil granularity can be reflective of a potential infection; this parameter is routinely monitored in a clinical setting. FYN gene levels (reported above) were negatively correlated with neutrophil percentage (Fig. 2f).
Third, we investigated whether sex was an important source of inter-individual (autosomal) gene expression variability. We found only two of the 1163 neutrophil-specific HVGs, SEPT4 and TMEM63C, to be differentially expressed between sexes (Additional file 1: Figure S6a), and high expression variability was observed for both sexes in these genes. However, in neutrophils we identified a surprisingly large number of sex-specific differentially expressed genes of small effect size, which corresponded to important immune cell functions. We present a detailed analysis of these genes in the “Sex-specific differential gene expression across immune cell types” section.
In conclusion, we found that genetic makeup is an important determinant of transcriptional variability. Donor demographic and lifestyle factors also contributed towards transcriptional variability.
Neutrophil-specific hypervariable genes not mediated by cis genetic effects
Next, we studied in detail the subset of neutrophil-specific genes that showed hypervariable expression but did not associate with local genetic variants (n = 525). Although some of these genes could be mediated by distal (trans) genetic factors not detected in the BLUEPRINT Human Variation Panel, it is conceivable that expression heterogeneity of this gene set was primarily due to external triggers or stochastic fluctuations.
We generated a correlation matrix of expression levels of the 525 HVGs and identified clusters of correlated genes that may act in concert or be co-regulated. The identified co-expression network contained 259 connected genes and consisted of three distinct gene modules (Fig. 3). We inferred biological functions corresponding to the three gene modules. All modules were highly enriched for genes with important immune-related functions.
The first and largest gene module (n = 105 genes, green in Fig. 3) showed enrichment for inclusion body, receptor signaling, and immune response activation. The second module (n = 78 genes, yellow) was enriched in biological processes related to RNA processing and chaperone binding. The third gene module (n = 33 genes, red), contained many genes with particularly high variation in their expression patterns. RSAD2, an interferon-inducible antiviral protein, showed the highest variability among many other interferon-inducible genes present in module three. These genes are essential in innate immune response to viral infections [37]. Gene ontology and pathway analyses of all genes in the network module further showed a strong enrichment for response to type I interferon and several viral disease pathways, including influenza A, herpes simplex, and hepatitis (Additional file 1: Figure S7). A detailed functional annotation of all three network modules is provided in Additional file 5.
Sex-specific differential gene expression across immune cell types
In our analysis, we only detected differences in mean gene expression levels between male and female donors with log-fold change ≥1, for 21 genes in neutrophils, two of which were also found to be HVGs in neutrophils (Additional file 1: Figure S6a). Nonetheless, when no minimum log-fold change criterion was applied, we found that sex-dependent mean expression of autosomal genes (Additional file 1: Figure S6b) was highly abundant in neutrophils (n = 3357 genes) compared to T cells (n = 895) and monocytes (n = 64).
As many autoimmune diseases have a higher incidence in females, and females show generally elevated immune responses compared to males [38], we hypothesized that genes with elevated gene expression levels in females may account for the increased incidence rates. Indeed, genes with higher mean expression levels in neutrophils derived from females (n = 682) were enriched in immune response and related pathways (Additional file 6). In contrast, genes with increased mean expression in male donors (n = 2675) were enriched in basic cellular processes, such as RNA processing and translation (Additional file 6). In addition, in male donors, genes were strongly enriched in cellular compartments, such as nuclear lumen (Additional file 6).
Genome-wide patterns of differential DNA methylation variability across immune cell types
Following the analyses of differential gene expression variability, we then applied our improved analytical approach to determine the inter-individual variability of DNA methylation levels at 440,905 CpG sites (“Methods”). Again, our method accounted for confounding effects due to the correlation between mean and variability measurements (Additional file 1: Figure S8).
Concordant with our findings for gene expression variability (Fig. 1b), we found that neutrophils had the largest number of hypervariable CpG positions (HVPs) overall (n = 1053), as well as cell-specific HVPs (n = 261). Neutrophils and monocytes shared a considerable number of HVPs (n = 380) in contrast to T cells (Fig. 1e). Finally, we identified 212 HVPs common to all three cell types. An overview of the number of HVPs is shown in Fig. 1e.
Following the discovery of HVPs, we examined whether these sites were overrepresented at particular gene elements and epigenomic features. To this end, we focused on cell type-specific HVPs, correlating their DNA methylation levels with distinct cellular characteristics and molecular pathways. In Additional file 7, we summarize the detailed annotation of all HVPs across the three profiled immune cell types. In neutrophils, we found that cell type-specific HVPs were depleted at CpG islands, which typically occur near transcription start sites (P = 6.37 × 10−19, hypergeometric test; Fig. 4a), and enriched at intergenic regions (P = 0.03; Fig. 4b).
We hypothesized that cell type-specific HVPs localize at distal gene regulatory elements such as enhancer sequences, of which many are known to be also cell type-specific [39]. To test this hypothesis, we retrieved reference chromatin state maps of primary human monocytes, neutrophils, and T cells from the data repository provided by the BLUEPRINT Consortium [40]. Chromatin states are defined as spatially coherent and biologically meaningful combinations of multiple chromatin marks [41, 42]. A total of five chromatin states were designated, which correspond to functionally distinct genomic regions, namely active promoters, enhancers, and regions related to transcriptional elongation and polycomb-repression. In addition, a “variable” chromatin state was defined here, indicating frequent changes of local chromatin structure across samples of the same cell type. Indeed, neutrophil-specific HVPs were found to be strongly enriched in the enhancer (P = 1.32 × 10−12, hypergeometric test; Fig. 4c) and variable chromatin states (P = 3.81 × 10−8; Fig. 4c).
Biological significance of immune cell type-specific hypervariable CpGs
To interpret the potential cellular and biological implications of cell type-specific hypervariable CpGs, we annotated the genes in close proximity to each CpG using the Genomic Regions Enrichment of Annotations Tool (GREAT) [43]. This tool is valuable in assigning putative functions to sets of non-coding genomic regions [43].
Overall, we found enrichment in gene ontology terms attributed to genes close to HVPs in a cell type-dependent context (Additional file 8). For example, genes located near neutrophil-specific HVPs were enriched in gene signatures related to acute Streptococcus pneumoniae infection and cysteine synthase activity; the latter molecular process is important to hold off infections [44]. Consistent with established neutrophil function, this suggests that the identified HVPs play a role in regulating the expression of neutrophil-specific genes in response to infection.
In Fig. 4d, we provide an example of a neutrophil-specific HVP at the promoter of the ITGB1BP1 gene, encoding the integrin beta 1 binding protein 1. Integrins are essential cell adhesion proteins that induce intracellular signaling pathways upon activation by matrix binding [45, 46]. They function as signal transducers allowing for rapid responses to cell surface signals [46]. Notably, the highlighted HVP mapped to a variable chromatin state at this locus, indicating that it influences local chromatin dynamics upon an internal or external trigger (Fig. 4d).
In conclusion, we show that cell type-specific HVPs clustered in enhancer and dynamic chromatin states at intergenic regions, suggesting they play a role in the regulation of cell type-specific gene expression programs in response to environmental changes. Genes in proximity to HVPs were enriched in gene sets relevant to important immunological functions.
Determinants of inter-individual cell type-specific DNA methylation variability
Subsequent to the identification and annotation of CpGs with hypervariable DNA methylation levels, we explored potential reasons for the discovered inter-individual DNA methylation heterogeneity.
In agreement with our findings for gene expression variability, we determined that a large proportion of cell type-specific HVPs correlated with cis genetic variants reported in the BLUEPRINT Human Variation Panel (Additional file 1: Figure S5b). In neutrophils, we found that 167 of the 261 cell type-specific HVPs (64%) associated with DNA methylation quantitative trait loci (Additional file 7). Our data further revealed that none of the cell type-specific HVPs were differentially methylated between male and female donors. The complete numerical results of all correlation analyses are provided in Additional file 9.
HVPs specific to monocytes showed frequent association with seasonal effects, such as temperature and daylight (n = 12/117 HVPs; Additional file 1: Figure S9). This finding is consistent with recent analyses reporting fluctuations of gene expression levels in monocytes depending on the season and circadian rhythm [47]. Many CD4+ T cell-specific HVPs particularly correlated with donor age (n = 14/46 HVPs; Additional file 1: Figure S9), in line with previous findings on age-related DNA methylation changes in T cells [48, 49]. These alterations are especially interesting in the context of immunosenescence, for which dysregulation in T-cell function is thought to play a crucial role [50, 51]. Naïve CD4+ T cells have further been reported to become progressively longer-lived with increasing age [52], which possibly also impacts their DNA methylation patterns.
Correlation of DNA methylation variability with transcriptional output
DNA methylation at active gene elements can directly control the regulation of gene expression. While methylated gene promoters usually lead to transcriptional silencing, methylated gene bodies typically lead to transcriptional activation [53]. We next aimed to probe this paradigm in the context of gene expression and DNA methylation variability.
We measured the correlation of DNA methylation variability with transcriptional output at the level of single genes. Specifically, we studied cell type-specific HVPs that map to gene promoters and bodies, correlating their DNA methylation level with the gene expression level in the same individuals. At promoters, 30.1% (range 23.5–33.3%) of HVPs showed a negative correlation with gene expression (Fig. 5a), in support of the conventional role of DNA methylation in gene repression. At gene bodies, a small subset of HVPs (5.0%; range 0.0–10.8%) showed a positive correlation with gene expression (Fig. 5b). Additional file 10 gives a full account of these genes and numeric results.
An example is provided in Fig. 5c, showing a monocyte-specific HVP at the gene promoter of MSR1. At this CpG site, DNA methylation levels were significantly correlated with gene repression (Benjamini–Hochberg (BH)-corrected P < 2.2 × 10−16, Spearman’s rank correlation). MSR1, encoding the CD204 antigen, is involved in endocytosis of modified low-density lipoproteins.
Relationship between DNA methylation variability and gene expression variability
Finally, we examined global patterns of DNA methylation variability in relation to transcriptional variability. In neutrophils, highly variable gene expression levels were observed at promoters exhibiting highly variable DNA methylation levels, and also at promoters showing very stable DNA methylation levels (Fig. 5d). For DNA methylation variability at gene bodies, this relationship was weaker and showed a linear tendency (Fig. 5e). Importantly, these global patterns were consistent across all three immune cell types (Additional file 1: Figure S10).
To characterize these promoter regions further, we counted the number of transcription factor binding motifs at these regions (“Methods”). We found an accumulation of binding motifs at promoters presenting either highly variable or very stable DNA methylation levels (Fig. 5f; Additional file 1: Figure S8). Next, we explored the properties of the 100 genes that showed both the highest expression variability and the highest DNA methylation variability at their promoters. We found that of the 100 genes in each cell type, 66 were common to all three cell types; in turn, ten of these 66 genes encode transcription factors. For example, in neutrophils this included ELF1, a transcriptional regulator of genes involved in immune response signaling pathways [54]. Neutrophil-specific HVGs were also enriched at genes with promoter sequences that contain the consensus binding motif of ELF1 (BH-corrected P = 1.2 × 10−5; MSigDB analysis).
Taken together, these results provide evidence that DNA methylation variability and gene expression variability could be mediated by the sequence-specific binding of transcription factors, such as ELF1 in neutrophils. Future studies will be required to further investigate the functional relevance of the observed correlation.