Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models

The three-dimensional (3D) organization of chromosomes can be probed using methods like Capture-C. However, it is unclear how such population-level data relate to the organization within a single cell, and the mechanisms leading to the observed interactions are still largely obscure. We present a polymer modeling scheme based on the assumption that chromosome architecture is maintained by protein bridges, which form chromatin loops. To test the model, we perform FISH experiments and compare with Capture-C data. Starting merely from the locations of protein binding sites, our model accurately predicts the experimentally observed chromatin interactions, revealing a population of 3D conformations. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0909-0) contains supplementary material, which is available to authorized users.


Background
The three-dimensional spatial organisation of mammalian chromosomes in vivo is a topic of fundamental importance in cell biology [1][2][3][4][5].Understanding how chromatin conformation becomes modified on a local scale in order to up-regulate transcription from genes during differentiation or development is critical not only to decipher a fundamental biological process, but also to delineate the role this process may play in human disease and potential therapies.The higher scale organisation of chromatin in the nucleus also has important roles to play in this regard [5][6][7][8][9] as the spatial structure of chromosomes is tightly linked to transcription.For instance, active genes can cluster at nuclear speckles [10,11]; conversely peripheral lamina-associated domains (LADs) comprise of regions of the DNA that are not generically transcriptionally active [12,13].The three dimensional structure of the genome is, therefore, intimately related to its function.
Thanks to the development of high-throughput experimental techniques based on chromosome conformation capture (3C) [1], such as Hi-C and Capture-C [2-4, 14, 15], it is now possible to probe experimentally which regions of the genome of a given cell type are spatially proximate in vivo.A major result obtained with these methods has been the discovery that chromosomes are organised in a series of topologically-associated domains (TADs) [2][3][4], which are separated by boundaries, but whose biological nature remains elusive.While the TAD boundaries are thought to be largely conserved across cell types, the arrangement of the chromatin within a TAD is not [16].This internal organisation depends on the activity of the genes within a domain, and is likely related to the action of cis-regulatory elements (DNA regions where the binding of a transcription factor (TF) can regulate the expression of a gene which is tens or hundreds of kilo-base-pairs (kbp) away) [17,18].
The pattern of interactions revealed by most 3C based experiments is an average over a large population of cells, yet it has become clear that there is a remarkable variability in both chromosomal conformation and chromatin interactions between different cells [19,20].Thus it is an important challenge to understand how the chromosome conformation in single cells leads to the observed population average, and to decipher the mechanism underlying such arrangements.To address this issue, here we present an in silico investigation of the local folding and resulting interaction maps of important active gene loci in mouse erythroblasts.We concentrate on the well studied α and β globin loci which have long been model systems for understanding cis-regulatory interactions [14,[21][22][23][24][25][26][27][28][29][30].These loci are known to have tissuespecific organisation, and expression of the different genes within the loci varies through development and erythropoiesis.As a comparison, we also study embryonic stem cells where these genes are not active.Our main result is that our model predicts patterns of contacts which are close to that found by high-resolution Capture-C experiments, reproduces the changes in such patterns following differentiation, and explains existing observations on the biology of the globin loci in mouse.Our predictions also compare favourably with new fluorescence in situ hybridisation (FISH) experiments that give spatial separation measurements between specific genomic locations in individual cells.This level of agreement is especially remarkable because it essentially involves no fitting.
Our model builds on the minimal assumption that the spatial organisation of eukaryotic chromosomes is maintained largely through the action of proteins or protein complexes which can form bridges by simultaneously binding to more than one site in the genome, and forming loops from the intervening chromatin [4,[31][32][33][34][35][36].We treat the chromatin fibre as a simple bead-and-spring polymer (Additional file 1: Figure S1), and coarse-grain the bridge forming protein complexes into single units.We then "paint" the polymer according to bioinformatic data characterising protein binding and chromatin state in the relevant cell type, and use molecular dynamics to simulate the motion of the region of interest (see Additional file 1: Figure S1 for a schematic diagram, and Additional file 15: Supplementary Methods for the full details of the model).The chromatin fibre and proteins diffuse as though subject to the thermal fluctuations of the nucleoplasm; the protein complexes can bind and dissociate from the chromatin and form bridges, and the fibre adopts conformations which are consistent with the entropic and energetic constraints of the system.By repeatedly running the simulation with different random thermal motions, we can generate a population of equilibrium conformations representing a population of cells.Some examples of other studies where polymer models have been applied to study chromatin are [20,[31][32][33][34][37][38][39][40].
To keep our model as simple as possible we use the locations of DNase1 hypersensitive sites (DHSs) as a proxy for binding sites of a generic type of protein bridge, which we imagine is made up from complexes of transcription factors and other DNA-binding proteins.The choice of DHSs as binding sites is justified due to their well documented tendency to correlate with open chromatin, or euchromatin, and with peaks in ChIP-seq data for many transcription factors [41], such as GATA1, Nfe2 Scl/Tal1 and Klf1, all of which are known to be important for globin regulation (see Additional file 2: Figure S2).The interactions between the many transcription factors and co-factors which might form the bridging complexes involved in cis-regulatory binding are not well characterized, and the DHS approximation avoids the need to make any assumptions.One factor that most certainly has a chromatin architectural role is the CCCTC-binding factor (CTCF) [4,35,40,[42][43][44].This protein is thought to form dimers which drive looping between some of its specific binding sites scattered along the chromosomes of eukaryotic organisms.In particular, convergent CTCF binding sites have been proposed to delimit the extent of chromatin domains, which might be extruded through a looping complex, possibly comprising cohesin [40,44,45].CTCF is therefore a bridge with an architectural role, and has indeed been dubbed a "global genome organizer" [4,35,42].Interestingly, chromatin has been found to compact on depletion of RAD21 and CTCF [37].To reflect its perceived importance, we treat CTCF proteins as separate bridges in the simulations; in this case the binding sites are placed at peaks in the ChIP-seq data for CTCF binding (see Additional file 2: Figure S2).Our model therefore includes two species of putative protein bridges, which we denote CTCF and DHS binding proteins (or bridges) respectively.Furthermore, we consider the hypothesis that some histone modifications (e.g.H3K4 monomethylation at enhancers or trimethylation at active promoters) act to recruit bridging proteins [46].We include this in the model by introducing a weaker, non-specific interaction between the bridges and H3K4me1 modified regions (which are not already labelled as CTCF or DHS bridges); since the hypersensi-tive sites at regulatory elements are often surrounded by H3K4me1 modified regions, these act as a funnel which effectively directs proteins to their high affinity binding sites [47].

Results
Chromatin folding in the mouse α globin locus First, we use our model to predict the folding of a 400 kbp region around the mouse α globin locus (chr11:31960000-32360000, mm9 build; each polymer bead represents 400 bp, or two nucleosomes, see Figure 1A and Methods).This well studied cluster contains five globin related genes: the ζ globin gene (Hba-x, expressed in embryonic erythroid cells, but silent in adult cells), two copies of the α globin gene (Hba, expressed in foetal and adult erythroblasts) and two θ globin genes (Hbq1 and Hbq2, only weakly expressed in adult tissue).Expression of the genes in the cluster is controlled by several regulatory elements: the multi-species conserved elements R1-4 and the mouse specific R(m).Some of these are contained within the introns of Nprl3, one of several widely expressed genes which surround the locus; the R2 element (known as HS-26 in mouse and equivalent to HS-40 in human) is thought to be particularly important for globin regulation [21,23,27].Figure 1A shows the binding sites for CTCF and DHS across the region considered (informed by ChIP-seq and DNase-seq data for adult erythroid cells -see Additional file 2: Figure S2); the positions of the H3K4me1 methylation marks are also indicated (from ChIP-seq data for the same cell type, see Additional file 2: Figure S2).In our simulations, proteins bind strongly to the CTCF or DHS labelled beads, and also weakly to the H3K4me1 marks.Some typical snapshots from our simulations are shown in Figure 1B and Additional file 16: Video S1 (CTCF and DHS binding proteins are shown as red and green spheres respectively), while the average contact map is shown in Figure 1C.
As anticipated, one of the main strengths of our approach is that it naturally outputs information on each member of the population of chromatin conformations (these can be thought of as representing different cells, or the same cell at different times), which we can then further interrogate.A clustering analysis (i.e., grouping the conformations by similarity; see Additional file 15: Supplementary Methods for details) of 1000 simulated conformations reveals that the locus folds into four main representative structures (Figure 2).The main distinction between these structures is whether a single bridginginduced globular domain forms (of size ∼70 kbp), or whether it breaks into two smaller microdomains, one containing around 40 kbp, and the other one around 25 kbp.The size of these globular microdomains does not exceed 100 kbp, so these are much smaller than TADs (the median size of a TAD is 1 Mbp [3]); interestingly, though, their size is comparable to that of the sub-TAD domains observed within active regions [4], and also to that of the so-called supercoiling domains recently found in mammalian cells [48].
In the most common representative structure, which accounts for 53% of the total observed conformations for the locus, there is a single globular domain containing the promoters of the globin genes, the promoters of the two neighbouring genes Mpg and Nprl3, and all five known regulatory elements.A similar representative structure, which accounts for 6% of conformations, also has a single globular domain, but the region which contains the Nprl3 promoter is in a loop outside the globule.A third representative structure accounts for 14% of the conformations: here two globular microdomains form, where the α genes interact with only the two genomically closest regulatory elements.The fourth structure, which is adopted by about 25% of the conformations, has again two microdomains, but their composition is different: now the α genes are no longer in the same microdomain as the regulatory elements.We expect that these genes should be transcriptionally inactive when the locus adopts this structure.Finally, there are a small number (∼ 1%) of conformations which do not fit into any of these four clusters.It is also interesting to note that the ζ gene and Mpg seldom interact with the elements (these genes are not widely expressed in adult erythroid cells).The arrangement within the domains can be further probed by looking at which promoters are directly interacting with the different regulatory elements in each conformation (see Additional file 3: Figure S3).We find, for example, that one or more of the α promoters interacts with one or more of the elements in 65% of conformations, and that Hba-a1 interacts with the elements in 53% of conformations whereas Hba-a2 interacts in only 41%.This is qualitatively consistent with experiments in which mRNA expression from the two α globin paralogues was measured independently (on the basis of 3 sequence divergence), which showed that the gene situated linearly closer to the enhancer elements, Hba-a1, is always expressed at a higher level [26].
Importantly, we can also compare the interactions predicted by our simulations with recent high-resolution Capture-C data [14] which mapped the chromosomal contacts within a number of cis-regulatory landscapes in mouse erythroblasts (see Additional file 15: Supplementary Methods).Specifically, Figure 3A compares Capture-C and in silico patterns of contacts with the promoters of the two α globin paralogues (which cannot be separated in the experimental data as they share the same sequence).Figure 3B shows a similar plot for the Mpg promoter.The results show that, remarkably, with the sole input of the ChIP-seq and DNase-seq data giving the locations of the protein binding sites, we can reproduce to a good accuracy the Capture-C profiles.In   1C), as is a schematic of each representative structure (right).The proportion of simulated conformations adopting a given structure gives a prediction of the frequency with which that structure will occur in a population of cells.
particular, we reproduce the contacts between the α promoters and the five known regulatory elements; we also reproduce the fact that there is some interaction between the regulatory elements and the Nprl3 promoter (see Additional file 4: Figure S4), but far fewer interactions with the Mpg promoter, despite the fact that this gene is a similar genomic distance away from the elements as the α genes.
To further assess the level to which the population of locus conformations predicted by our model gives a faithful representation of the organisation of the α globin locus in real cells, we performed FISH experiments (see Methods) to obtain distributions of the separations of probes at different positions across the locus.These measurements also allow us to parametrise the physical size of the 400 bp simulation beads by fitting the means of each dis- tribution (see Methods and Additional file 5: Figure S5); this is the only fitted parameter in our model, and the fit yields a size of 15.8 nm, which is reasonable given that 400 bp corresponds to two nucleosomes.Plotting the experimental and simulation separation distributions on the same axes (Figures 3C-D, and Additional file 5: Figures S5D-G) reveals that once more the simulations give an accurate prediction of the structure of the locus; for example the separation of the α promoters and pE at the regulatory elements R1-3 shows a narrow distribution peaked about a mean value of ∼ 200 nm, whereas the separation of the promoters and a probe p58 at roughly the same genomic distance, but telomeric to the locus, shows a much broader distribution with a mean closer to 300 nm.We can also define a quantitative score Q, taking values between 0 and 1, which indicates how well our simulations predict the experimental Capture-C interaction profiles (see Additional file 15: Supplementary Methods for details).By combining Capture-C data from a number of promoters across the locus, we can obtain a mean Q value along with a standard error (Additional file 6: Figure S6).This allows us to compare results from different model set-ups.Specifically, we examined the effect on the experiment-simulation comparison scores of changes in: (i) chromatin stiffness; (ii) number of bridges; and (iii) level of coarse-graining (see Additional file 15: Supplementary Methods and Additional file 6: Figure S6).For the first two cases we find only a modest effect on the Q-score for the simulated configurations (Additional file 6: Figure S6); if we decrease the resolution of our model by changing the coarse-graining, then this performs less well.Interestingly the representative structures found from the clustering analysis of the population of conformations found in silico are always the same.What changes in some cases is the proportion of conformations which adopt each representative structure.In the model where the chromatin was stiffer, the globular microdomain structure containing all of the regulatory elements occurred less often, whereas the structure where the Nprl3 promoter loops out was more likely; this is because holding the Nprl3 promoter in the microdomain requires bending of the chromatin fibre, which is disfavoured when this is stiff.Also, when we examined the effect of changing the number of protein complexes in the simulations, we found that, as more proteins are introduced, there is a greater likelihood that the locus adopts a structure with two globular microdomains; this is because forming more protein bridges between chromatin binding regions, while being energetically favourable, leads to the formation of more loops whose entropic cost increases non-linearly with the number of loops [49].

Chromatin folding of the β globin locus
We have also applied our chromosome-and-bridges model to the mouse β globin locus (chr7:110800000-111200000, mm9 build; Figure 4, Additional file 7: Figure S7, and Additional file 8: Figure S8).This locus contains five globin genes: the y gene, βh1 and 2, and two β globin genes β-Major and β-Minor.The expression of each gene depends on the stage of development (the y and βh1 genes are predominantly expressed in embryos, while the β genes take over in adults), and is controlled by interactions with a series of DHSs in a region known as the locus control region (LCR) [21,24].Unlike the α globin locus, the β globin genes are surrounded on either side by a condensed chromatin region, containing genes which are not expressed in erythroid cells.As with the α globin case, we use ChIP-seq and DNase-seq data to label a bead-and-spring polymer which represents the gene locus (see Figure 4A, and Additional file 7: Figure S7).A clustering analysis of a population of 500 simulated conformations reveals that the most abundant representative structure of the β globin locus (43% of the total conformations, see schematics in Figure 4C and dendrogram in Additional file 8: Figure S8), features a single globular domain, where the β Major and Minor promoters co-localised with the five regulatory elements in the LCR, and with a CTCF site on the telomeric side near the Olfr65 gene.A further 16% of conformations adopt a similar representative structure, but the promoters interact only with the LCR.We also note that when the locus adopts these structures, there is an interaction between the CTCF sites in the LCR and the one on the centromeric side of the β genes near the Olfr67 gene (these contacts are just visible on the left and bottom edges of the top two contact maps in Additional file 8A: Figure S8A) which has previously been observed in both definitive erythroblasts and erythroid progenitors, but is absent in non-erythroid tissue [22,24].This is consistent with the hypothesis that CTCF mediated loops in progenitors hold the locus in a structure poised to facilitate β globin expression upon differentiation [24] (though see below).A third representative structure, which accounts for 9% of the simulated conformations, has the β promoters interacting only with the DHS near Olfr65.The Capture-C data, along with previous work [22,24], confirms the prediction that this site (usually denoted HS-60) interacts with the β globin promoters; indeed it has been previously shown that there are interactions between all hypersensitive sites in the locus [22] and the pair of sites HS-60/-62 are normally taken to demarcate the boundary of the locus.Whether this particular DHS (HS-60) has enhancer properties remains unclear, however it binds Scl/Tal1 (a transcription factor thought to play a key role in hematopoietic differentiation [50]), is near to a CTCF binding site (HS-62), and is within a region marked by monomethylation of histone H3 Lys4, which is normally associated with enhancers.In the remaining 32% of the conformations (bottom two schematics in Figure 4D), the β globin promoters are still together, but do not interact with the hypersensitive sites (Additional file 8A: Figure S8A).We note that the microdomains which form in each type of the five representative structures have more "looped out" regions (consistent with conclusions from 3C experiments in Ref. [22]) than in the α globin locus (compare contact maps in Figures 1C and 2  are seen between the blocks of highly probable interactions in the β globin case).This indication that the β globin locus is less compact than the α globin case is borne out in measurements of the overall 3-D size of the simulated loci (see distributions of the radius of gyration of the polymer in Additional file 9: Figure S9G compared to the α globin case in Figure 7G).
As in the case of the α globin locus, our simulations predict contact patterns which are in good agreement with Capture-C data, both for the β Major and Minor gene promoters (Figure 4E) and for the Hbb-y promoter (Figure 4F).This demonstrates that our model is not gene-specific, but can be applied, in principle, genomewide, at least to active regions; the two bridges which we model, CTCF and DHS binding proteins, are indeed found in most euchromatic, open chromatin, regions.
Given its relatively low computational cost (harvesting 500 conformations for a 400 kbp chromosome region at a 400 bp resolution can be done in about a day with a multi-core machine, see Additional file 15: Supplementary Methods), we expect this modelling to be useful in predicting the overall folding of previously uncharacterised active chromosomal loci -the knowledge of the predicted population of 3-D structures can then direct further high-resolution Hi-C, Capture-C or fluorescence hybridisation experiments (as in Figures 3 and 4E-F) to characterise that region more accurately.
The model accurately reproduces differences in locus folding across cell types Importantly, because data showing protein binding, hypersensitive sites and histone modifications are available for different cell types, we can also predict changes in the three-dimensional organisation of a chromosomal region across cell types or at different times in development.We show in Figure 5 how the folding of the globin loci differs in mouse embryonic stem cells (where the globin genes are inactive) with respect to the organisation predicted for erythroblasts.The bioinformatic data used to inform our modelling for stem cells are given in Additional file 10: Figure S10.
Figure 5A shows the contact map predicted from simulations of the α globin locus.Our model predicts that in ES cells the contacts are much sparser than in erythroblasts, that the bridging-induced domain around the α globin gene is lost (Figure 5B), and that no interactions with the regulatory elements are observed; the same is true of the neighbouring Mpg promoter.Once again, the contacts observed in silico reproduce the experimental ones (Figures 5C), with some minor inaccuracies for Mpg (which likely originate from our approximation that all DHSs are the same in regards to bridge formation, but nevertheless highlight the principle that the locus can adopt a completely different shape in a different cell type).When repeating the analysis for the β globin locus we find that the loss of non-local contacts is even more dramatic (Figures 5D-E), and the agreement with the data even more remarkable (Figures 5F), with all non-local (i.e. off diagonal) interactions being absent.
To further demonstrate the wide applicability of the model, we also perform a set of simulations for a region surrounding the Slc25a37 (Mitoferrin1) gene in both mouse erythroblasts and embryonic stem cells.This gene encodes a mitochondrial protein essential for iron import into mitochondria, however much less is known about this locus than about the α or β globin, and so our results represent a true prediction of its folding.The input data used was similar to that of the globin loci, and are given in Additional file 11: Figure S11.As shown in Figure 6 the simulations predict that in the erythroid cells (where the gene is active) the locus forms a compact domain around Slc25a37 and Entpd4 ; the Slc25a37 pro-moter interacts strongly across the Slc25a37 gene, but also with two distinct regions between the nearby Synb and Gm16677 genes (Figure 6E, top panel).These are enriched for mono-methylation of Lysine 4 of Histone H3 (see Additional file 11D: Figure S11D), suggesting that sites within these regions have enhancer activity (as was also proposed in Ref. [51]).In order to test these predictions we compare with new Capture C experiments (performed as detailed in Ref. [14]).As before, our very simple model gives a remarkable agreement with the data: strong interaction with the putative enhancer regions is observed in the erythroid, but not the stem cells.Some longer distance interactions which are predicted in both cell types are not found in the experimental data; these errors are due to our approximation that bridges can form between any DNase hypersensitive sites, and the agreement would likely be improved with a different choice of input data (e.g. using TFs involved in regulation of this gene).
The typical 3-D structures of the globin loci are preserved in CTCF or other TF knock-outs Another strength of our approach is that it is easy to alter the protein binding profiles in our simulations to investigate e.g.genome modifications or protein knockouts etc., and predict the consequences of these for the 3-D organisation in vivo.For example, we can switch off interactions with the hypersensitive sites, and only include the CTCF bridges in the simulation, or simulate a CTCF knock-out by switching off interactions with the CTCF sites and any hypersensitive sites where only CTCF binds (i.e.DHSs which bind CTCF, but none of the other TFs implicated in globin regulation).
In the case of the α globin locus we find that, surprisingly, for both the CTCF and DHS knock-outs the same folded structures can still form (Figures 7A-D).For the CTCF knock-out, the relative proportions of each structure found in the clustering analysis remain largely unchanged (Figure 7E): the most common one is again the single globular domain containing the α promoters and all regulatory elements.If we assume that the level of α globin expression correlates with the fraction of conformations in which one or more of the α promoters is interacting with one or more of the regulatory elements, then this expression level also remains largely unchanged (the genes are active in 65-70% of conformations, see Figure 7F).For the DHS knock-out on the other hand, the number of conformations showing regulatory element interactions drops to less than 20%.There is also a change in the proportions of the different groups found by the clustering analysis, with the structure in which the Nprl3 promoter loops out of a single domain becoming most common.Nevertheless, it is remarkable that despite loss of binding at the regulatory elements (which presumably reduces α globin expression), the CTCF sites near the Hbq1 and Hbq2 promoters, and within the introns of the α-globin locus Nprl3 gene (green and yellow in Figure 1A) are sufficient to allow the locus to fold into the same representative structures.We can also measure the effect on the overall size of the domain by calculating the radius of gyration of the polymer; Figure 7G shows the distribution for each of the in silico knock-outs.We see that loss of protein binding generally leads to an expansion of the locus, with the DHS knock-out having more effect than the CTCF case.
A similar scenario applies to CTCF and DHS knockouts in the β globin locus (Additional file 9: Figure S9).Here, however the contact map for each of the groups identified by the clustering analysis (Additional file 9: Figures S9A-C) shows some subtle differences between the knock-outs.Again the CTCF knock-out appears to have little effect, leading to only small changes in the fraction of simulations adopting each structure or the contacts between the β promoters and the LCR.The DHS knock-out leads to a notable reduction in the promoter-LCR interactions, and a reduction in the number of conformations adopting the structure where the β promoters interact with the hypersensitive site near the Olfr65 gene.This locus also expands upon protein knock-outs, albeit to a lesser extent than the α globin case; this is probably due to the β globin locus being less compact initially.
Given the suggestion that CTCF proteins play a key role in genome organisation, it might seem surprising that the knock-out simulation shows a relatively minor change in the folding structures and promoter-enhancer interaction in both globin loci.However, CTCF is known to have a variety of different functions, for instance it acts as a barrier against the spreading of repressive heterochromatin, or as an insulator, preventing interactions with other nearby chromosome regions [42].A recent study suggested that a depletion of CTCF has only a mild effect on the domain organisation of chromosomes as found via Hi-C experiments [52], and a ChIA-PET analysis of the contacts made between CTCF-bound regions found that, only a fraction of the 40,000 CTCF binding sites are involved in these [53]: presumably, this fact is related to the recently discovered importance of CTCF binding site directionality in loop formation [4,43,45].In the specific case of the β globin locus, another recent study found that reducing the abundance of CTCF protein, or disrupting a specific CTCF binding site within the locus in erythroid progenitor cells leads to a loss of chromosome looping; however upon differentiation to mature erythroblasts, these cells are still able to express β globin, and fruitful interactions between the promoters and the LCR can still form [25] (i.e.setting up loops in progenitor cells appears not to be necessary).Together this suggests that the globin loci may be examples where CTCF-mediated chromosome loops are not crucial in determining the 3-D organisation, though of course CTCF is likely to have some other function (e.g.protecting other nearby genes from activation) and may still play an important organisational role at a larger scale [28].
In our simulations the CTCF bridges certainly do form loops, but in their absence the overall folding patterns can be maintained by the other bridges.

Discussion
In this work we have shown that a minimal polymer model informed by large bioinformatic datasets on protein binding can successfully reproduce the pattern of Capture-C contacts observed in the well studied α and β globin loci within mouse erythroblasts (a cell type where these genes are highly active), and also within the less understood Slc25a37 (Mitoferrin1) locus.Our model is built on the hypothesis that there exists architectural protein bridges, which we assume are either CTCF, or generic bridges made up by complexes of transcription factors and other DNA-binding proteins.The only inputs we require are ChIP-seq data for CTCF binding, and the map of DNase1 hypersensitive sites, which we take as a proxy for the location of the binding sites for the generic protein bridges (DHS bridges).Importantly, our approach differs from other recent polymer modelling studies which also have predictive power [20,29,36], in that it does not rely on fitting to pre-existing 5C or Hi-C data.Due to this feature, it can be applied to relatively poorly characterised loci (e.g., Mitoferrin1, see Figure 6), for which only few data exist (e.g., DNase tracks); the model can then be developed when needed as more experimental data become available.
Our model generates a population of conformations, hence we can predict, for instance, the distribution of distances between selected targets on the globin locus.These results compare very favourably with our FISH measurements, which allow us to estimate the physical size of the beads in our coarse-grained polymer (or equivalently, the DNA packing density in the chromatin fibre in the globin locus; this is the only fitting parameter in our model).The packing we obtain (15.8 nm for 400 bp) is consistent with open chromatin, which is reasonable since the region we focus on is highly active.
The fact that our model generates a population of con- , where ri is the position of the ith chromatin bead in the polymer, and r is the mean position of all N chromatin beads.
formations, rather than a single average conformation, is important because it gives an estimate of the stochasticity and fluctuations in in vivo 3-D organisation.A key result of our model is that the conformations of the loci we studied can be grouped into a handful of representative structures, which account for different fractions of the whole population.In both the α and β globin loci, the analysis suggests that there is a split in these structures between two main types: those in which there is a single globular domain which includes the active genes together with their regulatory elements, and those where the globule splits into two microdomains.The single globule structures are favoured by bridging, while the competing structure requires less bending and looping, and costs less entropy.(This is because there are more ways to place two microdomains in space than there are for a single one, and also because the entropy of forming n loops in the same place scales non-linearly with n [49]).There is a subtle balance between these contributions, which are both of the order of a few k B T , therefore both structures coexist in the population.A consequence of this is that the globin loci are naturally poised close to a transition between two different 3-D folding phenotypes; because the competition between bridging and entropy is likely to be a generic feature, we suggest that the plasticity associated with this balance between competing effects may be an underlying principle in the organisation of active regions genome-wide.This suggests that the cell could tip the balance one way or another by changing the abundance or specificity of bridges, or the properties of the fibre (e.g. by histone modification or chromatin remodelling).
In future work it will be interesting to compare these predictions with experimentally determined chromatin dynamics through cell differentiation, for example examining the α globin genes using techniques that permit imaging of the locus during erythroid differentiation in live cells.Another application of the work might be to provide some explanation of how the Hba-x gene is silenced in adult erythroblasts: in all of our predicted conformations it does not contact the known enhancer elements nor the surrounding gene promoters.It may also be informative to repeat the modelling for primitive erythroblasts, when sufficient protein binding and DNase hypersensitive data becomes available for that cell type.
As we have seen, our model can be further exploited to predict the organisational consequence of the knock-out of proteins such as CTCF (or our generic DHS bridge).Similarly, one can perform an in silico experiment which follows the consequences of modifying some genomic region within a locus.An intriguing example is the deletion of the R2 (HS-26) hypersensitive site in the α globin locus, which has been shown experimentally to result in a 50% reduction of α globin RNA levels [23] (a much milder phenotype than the severe α thalassemia which results from a deletion of the equivalent HS-40 element in humans [27]).Removing the R2 site in our simulation only leads to a ∼ 3% reduction in the number of conformations where the α promoters interact with the remaining regulatory elements.We can make our model more complex by replacing DHS binding proteins with bridges which bind to specific TF binding sites.For instance, GATA1 and Klf1 are a minimal set of TFs (see Additional file 2: Figure S2) which can interact to form bridges between the α globin promoters and the regulatory elements, and which can discriminate between the different elements (i.e.GATA1 binds to R1-4 only, whereas Klf1 binds to R2, and the α promoters only).Thus we use a model with three protein species, binding strongly to GATA1, Klf1, and CTCF sites respectively (no longer considering hypersensitive sites), and weakly to H3K4me1 modified regions (using ChIP-seq data as shown in Additional file 2: Figure S2), and repeat the in silico R2 knock-out experiment (see Additional file 12: Figure S12).Quite remarkably, in a "wild type" simulation, this more detailed model reproduces the differences in peak heights for interactions between the α promoters and elements R1-3 as shown in the Capture-C data (i.e.there is a higher probability of interaction with R2 than R1 and R3; Additional file 12: Figure S12A).For the R2 knock-out case, the three bridge model shows a ∼ 20% reduction in the number of conformations where the α promoters interact with the remaining regulatory elements (much closer to what might be expected given the experimentally observed effect on α globin RNA levels).Therefore, our approach can be generalised to ac-commodate more biological detail in a modular fashion, in the cases where this detail is known.
We anticipate that the main application of our in silico chromosome folding model will be to investigate regions of mammalian and other eukaryotic genomes which are currently poorly characterised.The approach relies only on DNase hypersensitivity and protein binding data, which are available genome-wide for many organisms and cell types.Our technique is fast and inexpensive, so that it can be used to predict the organisation of a large number of wild-type and modified genomic loci prior to, for example, a combination of detailed Capture-C, 5C or FISH experiments, directing focus to those regions whose predicted structure was deemed to be of particular interest.The ease with which genome modifications can be incorporated makes it highly applicable for investigation of the effect on 3-D chromatin structure of, for example single nucleotide polymorphisms at enhancers, which have been implicated in many diseases.
In the present work we have focussed on looping interactions within a gene locus, at a sub-TAD length scale.Polymer models, and the principal of protein bridges driving chromatin conformations, can easily be adapted to treat larger looping and organisation at the chromosome and genome scale, and this will be the subject of a future study.

Polymer model and simulation scheme
The chromatin fibre is modelled as a simple coarsegrained bead-and-spring polymer, where each bead represents 400 bp of DNA, or roughly two nucleosomes.The positions of the beads are updated via a molecular dynamics scheme (Langevin dynamics) using the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) [54] software.Pairs of beads adjacent along the polymer back-bone interact via finitely extensible nonlinear elastic (FENE) springs, and the polymer is afforded a bending stiffness via a cosine interaction between triplets of adjacent beads.We choose parameters such that the persistence length is 4 beads, which is reasonable for euchromatin [55].The beads also interact with each other via a Weeks-Chandler-Anderson potential, meaning they cannot overlap.Protein complexes are modelled as single spheres which interact with each other also via a Weeks-Chandler-Anderson potential (i.e. they have a steric interaction only).Each chromatin bead represents a region of the chromosome locus of interest, and is labelled as binding or not for the various protein species according to the input data.Proteins interact with chromatin beads labelled as binding via a shifted, truncated Lennard-Jones interaction which has short-range repulsive and longer-range attractive parts; they interact with non-binding chromatin beads again via  S2 (below).Recombineering was carried out mixing 50 µl of cells with 150 ng to 300 ng of purified DNA in a 0.1 cm wide cuvette using a Bio-Rad gene pulser set at 1.8 kV.Immediately after electroporation, 1 ml of SOC media was added, and cells were further grown at 37 • C for 1 hour before being plated on selective agar media containing 100 µg/ml ampicillin.
In vitro cultured mouse foetal liver cells (expressing α and β globin genes) were settled on poly-l-lysine coated coverslips, fixed with 4% paraformaldehyde in 0.25 M HEPES and permeabilised with 0.2% Triton-X 100.FISH was performed using 7 kbp plasmid FISH probes, labelled with either Cy3-dCTP (GE Healthcare Life Sciences) or digoxygenin 11-dUTP (Roche Life Science).The genomic locations of the FISH probes is shown in Additional file 5: Figure S5A.Probes were hybridised in pairs (as in Additional file 5: Figures S5B,D-G).Following hybridisation and detection using sheep anti-digoxygenin FITC (Roche Life Sciences) and rabbit anti-sheep FITC (Vector Laboratories), nuclei were imaged on a Deltavision Elite (GE Healthcare Life Sciences) using x100 super-plan apochromat oil 1.4 N.A. objective (Olympus) with a z-step size of 200 nm.Images were restored by deconvolution using Huygens Pro-fessional software (Scientific Volume Imaging).Probe signal pairs were analysed using a specifically designed Fiji algorithm that measures the 3-D euclidean distance (in microns) between thresholded signal centroids.Each measurement was adjusted to account for chromatic shift by using a displacement vector calculated from 0.1 µm Tetraspeck TM microspheres (Life Technologies) collected using the same imaging parameters as in the experiments.
We can parametrise the physical size of the chromatin beads in our simulations by fitting to the mean separation of each pair of probes as measured in the experiment.Additional file 5: Figure S5B shows a scatter plot of mean values from each pair of probes, with error bars showing the standard error in the mean; we use a linear leastsquares fit weighted using the experimental error in the mean to estimate the bead diameter as 15.8 nm.Since we fit to the mean for all probe pairs, the quality of the predicted distributions can still be assessed by comparing the simulation and experiment for each individually.
of the work, and helped to draft the manuscript.JMB performed the FISH experiments, DW performed the image analysis, and CB made the FISH probes.JD and JRH contributed Capture-C data on Mitoferrin1.VJB conceived and designed the experimental part of the work and helped to draft the manuscript.DM helped to conceive the computational part of the work, and helped to draft the manuscript.All authors read and approved the final manuscript.(Note that where available, control data were used in the peak calling, meaning that peaks seen in the pile-up of reads which did not show significant enrichment above the control were not called.)Since there are DHS located at the binding sites of each of these proteins, we reduce the complexity of our model (and the need for assumptions about the interaction between TFs) by using these as a proxy for protein binding sites.(H)-(I) ChIP-seq data showing relevant histone modifications: monomethylation and trimethylation of H3K4 (associated with enhances and promoters respectively).Data from Ref. (56).All plots are aligned according to the horizontal axis.  .Points show the mean over the set of targets captured in the experiment, and error bars show the error in this mean.The flexibility of the polymer was varied (making is less flexible by increasing the persistence length from 4 bead diameters to 8 bead diameters, or making it more flexible by decreasing the persistence length to 2 bead diameters); the number of protein complexes was varied, either decreasing from 20 to 10 copies of each species, or increasing to 30 of each species.We also considered a simulation where the colouring of each bead was randomly shuffled (see Additional file 15: Supplementary Methods).Finally, we reduced the resolution of the model by increasing the amount of chromatin represented by each bead from 400 bp to 600 bp.(B) Plot showing how the contact maps differ between each set of experiments, measured by χ 2 (see Additional file 15: Supplementary Methods).(C) Plot showing the proportion of conformations found to be forming the different structures identified by the clustering analysis.Schematics of these structures are shown below the plot.(G) Plot showing the distribution of the radius of gyration of the locus across the simulated conformations.The radius of gyration is defined as given in the caption to Figure 7.Additional file 15: Supporting Methods Langevin dynamics simulations.Chromatin regions and protein complexes are represented by beads, and the position of the ith bead in the system evolves according to the Langevin equation

A
where r i is the position of bead i with mass m i , γ i is the friction due to an implied solvent, and η i is a vector representing random uncorrelated noise such that The noise is scaled by the energy of the system, given by the Boltzmann factor k B multiplied by the temperature of the system T , taken to be 310 K for a cell.The potential U i is a sum of interactions between bead i and all other beads, and we use phenomenological interaction potentials as described below.For simplicity we assume that all beads in the system have the same mass m i = m.Equation (S1) is solved in LAMMPS using a standard Velocity-Verlet algorithm.
For the chromatin fibre the ith bead in the chain is connected to the i + 1th with a with a finitely extensible non-linear elastic (FENE) spring given by the potential where r i,i+1 = |r i − r i+1 | is the separation of the beads, and the first term is the Weeks-Chandler-Andersen (WCA) potential which represents a hard steric interaction which prevents adjacent beads from overlapping; here d ij is the mean of the diameters of beads i and j.The diameter of the chromatin beads is a natural length scale with which to parametrize the system; we denote this σ, and use this to define all other length scales.The second term in Eq. (S3) gives the maximum extension of the bond, R 0 ; throughout we use R 0 = 1.6 σ, and set the bond energy K FENE = 30 k B T .The bending rigidity of the polymer is introduced via a Kratky-Porod potential for every three adjacent DNA beads where θ is the angle between the three beads as give by and K BEND is the bending energy.The persistence length in units of σ is given by l p = K BEND /k B T .Finally, steric interactions between non-adjacent DNA beads are also given by the WCA potential [Eq.(S4)].Each protein complex is represented by a single bead and the WCA potential is used to give a steric interaction between these.Chromatin beads are labelled as binding or not-binding for each protein species according to the input data (see section on ChIP-seq and DNase-seq data analysis below).For the interaction between proteins and the chromatin beads labelled as binding, we use a shifted, truncated Lennard-Jones potential where r cut is a cut off distance, and r ij and d ij are the separation and mean diameter of the two beads respectively.This leads to an attraction between a protein and a chromatin bead if their centres are within a distance r cut .Here is an energy scale, but due to the second term in Eq. (S7) this is not the same as the minimum of the potential, which for clarity we denote (and we refer this to as the interaction energy).For simplicity we set the diameter of the protein complexes equal to that of the chromatin beads, d ij = σ, and set r cut = 1.4 σ.
The polymer is initialized as a random walk, and the dynamics are first evolved in the absence of protein interactions in order to generate an equilibrium coil conformation.Interactions with the protein complexes are then switched on, and the dynamics are evolved until a new equilibrium conformation is obtained.The length scale σ, mass m and energy scale k B T give rise to a natural simulation time unit τ LJ = σ 2 m/k B T , and Eq.(S1) is integrated with a constant time step ∆t = 0.01τ LJ , for a total of at least 8 × 10 6 time steps.Each simulation is repeated at least 500 times using a different initial conformation and random noise, resulting in an ensemble of conformations.Two chromatin beads are said to be interacting if their separation is less than 2.75 bead diameters; counting the proportion of conformations in which a given pair of beads is interacting gives an approximation of the probability that those beads interact.So far the system has been described in units σ, m, and k B T .In order to map these simulation units to real ones we must recognise that there are two further important time scales in the system, namely the inertial time τ in = m/γ i (from Eq. (S1) this is the time over which a bead loses information about its velocity), and the Brownian time τ B = σ 2 /D i (the time it takes for a bead to diffuse across its own diameter σ).Here D i is the diffusion constant for bead i, given through the Einstein relation by D i = k B T /γ i ; if we make the approximation that a chromatin bead will diffuse like a sphere we can then use Stokes' Law, where γ i = 3πνd i , with ν the viscosity of the fluid, and d i the diameter of bead i. Taking realistic values for the length, mass and viscosity one finds that τ in τ LJ τ B , with the times separated by several orders of magnitude.For numerical stability we must choose the time step ∆t smaller than all of these times, and we wish to study phenomena which will occur on times of the order τ B ; this means that using real values for all parameters would lead to infeasibly long simulation run times.Instead we make an approximation by setting m = k B T = σ = 1, and γ i = 2, and map to real time scales through the Brownian time τ B ; although this means that beads in our simulation have more inertia than in reality, this does not effect our results, which are taken once the polymer has reached an equilibrium conformation.Taking the diameter of the chromatin beads to be 15.8 nm, and assuming a viscosity of 10 cP for the nucleoplasm gives τ B ≈ 87 µs, meaning that a simulation time unit is ≈ 43.5 µs.Each simulation run therefore represents approximately 7 s of real time.
ChIP-seq and DNase-seq data analysis.As an input to the model we use ChIP-seq and DNase-seq data (previously published in Refs.(14,50,(56)(57)(58) as indicated in the captions for Additional files 2, 7 and 10: Figures S2, S7 and S10) to identify protein binding sites in the chromosome region of interest.For protein binding, ChIP-seq reads are aligned to the mouse reference genome build mm9 using the Bowtie2 software [59]; duplicate reads are removed, and pile-ups are generated using the BedTools package [60].Binding sites are identified using the macs2 peak calling software [61] using a control data set where available; peaks which have a normalised p-value < 0.001, and which have a fold-change higher than a threshold are retained.DNase-seq reads are similarly aligned to the mm9 genome using Bowtie2, but peaks are identified using the PeaKDEck software (which uses a peak finding algorithm calibrated specifically for DNase-seq data [62]).As detailed in the main text, we simplify our model by assuming that DNase hypersensitive sites indicate the positions of transcription factor binding sites.For histone modifications, we also align reads using Bowtie2; since these modifications can be found across wide regions, rather than identifying peaks we instead find regions where the pile-up of reads exceeds a threshold.
In order to incorporate the data into the simulations, the locus of interest is divided into regions corresponding to each bead in our model chromatin fibre.Beads are then labelled according to any peak or histone modification which overlaps with the region; for simplicity we only label beads a binding or not (or as having a histone modification or not), and do not incorporate peak intensities into the model.
Cluster Analysis.In order to assess the similarity between the conformations generated in each set of simulations we perform a cluster analysis.First we calculate the generalised "distance" between all pairs of conformations; then a dendrogram is generated using the standard hierarchical clustering algorithm in the MATLAB software [63], with an average linkage criterion.
A standard way to measure the distance between two polymer conformations is to consider the mean squared difference between separations of pairs of beads in each; however since our polymer consists of regions which bind proteins and unstructured regions, this does not perform well (the unstructured regions dominate in the mean, and no clear clusters are found).Instead we use a distance Γ(C, C ) between conformations C and C which ignores the unstructured regions, defined as where r C ij is the separation of beads i and j in conformation C. The Kronecker δ-function is defined such that δ a,b = 1 if a = b and 0 otherwise, with s C ij = 1 if beads i and j are interacting in conformation C and 0 otherwise (an interaction is defined as having separation less than 2.75 bead diameters).Thus the only contributions to the mean are from beads which are interacting in one conformation but not in the other; further limiting the analysis to consider only the chromatin beads within the most structured region of the locus (indicated by green bars in Figures 1C and 4C) results in a series of well defined clusters (Figures 2 and 4D).
Capture-C data.In order to test the predictions of the model we compared simulation results with Capture-C data; for the α and β globin loci data were from Ref. 14, whereas data for the mitoferrin locus in Figure 6 were from new experiments performed according to the method in that reference.In these experiments a set of oligonucloetide capture "targets" is designed, a 3C library is obtained using a frequently cutting restriction enzyme (Dpn II, cutting at GATA), and SureSelect oligonucleotide capture is followed by Hi-seq paired-end sequencing.The resulting reads then undergo in silico DpnII digestion (producing a set of fragments for each read), and the fragments are aligned to the mouse mm9 reference genome as single-end reads using the Bowtie software [64].Identical sets of read fragments are assumed to be PCR artefacts, and are removed (14); read sets which contain a targeted restriction fragment and a reporter fragment are retained.Data are then smoothed by counting interactions within 800 bp windows centred on genomic positions separated by 400 bp steps, giving an interaction profile for each target (black lines with grey shading in Figures 3A,B, 4E,F and 5C,F, and Additional Files 4 and 12: Figures S4 and S12).Since the efficiency of capture of each target is unknown, the obtained profiles show relative interaction strength, and

Figure 1 :
Figure 1: Simulating the α globin locus.(A) Browser view showing genes in the vicinity of the α globin locus, alongside a schematic indicating the coarse-graining used in the simulations.A 110 kbp section of the 400 kbp chromatin fragment which was simulated is shown.As described in the text, simulation chromatin beads were designated as CTCF binding sites, DHS binding, H3K4me1 modified sites, and combinations of these.The positions of the set of five regulatory elements are indicated with blue triangles, and promoters with green squares.(B) Example simulated configurations of the locus.CTCF proteins (green) and DHS binding proteins (red) are shown; the chromosome fragment is coloured as in A. See also Additional file 16: Video S1 for a 3-D view of the configurations.Parameters for the polymer model and the bridge-chromatin affinity are given in full in Additional file 15: Supplementary Methods.(C) Contact map showing the frequency of contacts between each chromatin bead in 1000 simulated configurations.Note that the colour bar shows a logarithmic scale.The blue line to the left indicates the region which is shown in A. The green line to the left indicates the region which is used for the clustering analysis (Figure 2 and text).

Figure 2 :
Figure 2: Conformations of the α globin locus can be grouped by similarity.A clustering analysis gives a dendrogram (left) which indicates how similar or different the conformations are.Conformations fall into four main representative structures depending on the pattern of contacts they exhibit (see Additional file 15: Supplementary Methods).Contact maps for each representative structure are shown (centre; the region shown is indicated by the green line in Figure1C), as is a schematic of each representative structure (right).The proportion of simulated conformations adopting a given structure gives a prediction of the frequency with which that structure will occur in a population of cells.

Figure 3 :
Figure 3: Simulations compare favourably with experimental data.(A) Plot showing the contacts made with the promoters of the two α globin genes (locations indicated by red asterisks; the positions of the regulatory elements and other gene promoters are also indicated).Simulation results (red) are shown alongside Capture-C data (grey); in both cases the plots show the contacts to both genes combined (since each copy of the gene has the same sequence it is impossible to separate these in the experiment).Black bars indicate regions where there is no contact data (i.e. between captured regions; see Additional file 15: Supplementary Methods and Ref. [14]).Since Capture-C data only gives relative contact strength, the height of the experimental data has been scaled so as to best fit the simulation results (see Additional file 15: Supplementary Methods).(B) As in A, but now showing the contacts made with the Mpg promoter (position indicated by red asterisk).Although Mpg is roughly the same genomic distance away from the regulatory elements as the α globin genes, it interacts with them less frequently.(C) Plot showing the distribution of the 3-D separation of the α globin promoters and the probe pE located at the regulatory elements R1-3.Simulations are compared with FISH measurements (see Methods and Additional file 5: Figure S5) performed on mature erythroblasts 30 hours after differentiation, when the globin genes are maximally expressed.The inset shows the mean and standard deviation for each case.(D) As in H, but the separation of the α promoters and a downstream control probe p58 located within the Sh3pxd2b gene.

Figure 4 :
Figure 4: Cis-interactions of the β globin locus.(A) Browser view showing genes in the vicinity of the β globin locus, alongside a schematic indicating the coarse-graining used in the simulations.A 130 kbp section of the 400 kbp chromatin fragment which was simulated is shown.The positions of the known regulatory elements within the LCR are indicated with blue triangles, and promoters with green squares.(B) Example simulated configurations of the locus.CTCF proteins (green) and DHS binding proteins (red) are shown; the chromosome fragment is coloured as in A. (C) Contact map showing the frequency of contacts between each chromatin bead in 500 simulated configurations.The colour bar shows a logarithmic scale.The blue line to the left indicates the region which is shown in A; the green line indicates the region which is used in the clustering analysis.(D) As in Figure 2, clustering analysis allows conformations to be grouped by their structural features.Schematics of the representative structures are shown, with the % of conformations in which they occur; a dendrogram, and contact maps for each representative structure are shown in Additional file 8: Figure S8.(E) Plot showing the contacts made with the promoters of the two β genes (locations indicated by red asterisks; the positions of the regulatory elements and gene promoters are indicated).Simulation results (red) are shown alongside Capture-C data (grey); both cases show the contacts to both genes combined (since each copy of the gene has the same sequence it is impossible to separate these in the experiment).Black bars indicate regions where there is no contact data (see Ref.[14] and Additional file 15: Supplementary Methods).(F) Similar plot showing the contacts made with the Hbb-y gene (position indicated by red asterisk).

FFigure 5 :
Figure 5: Simulations show changes in locus organisation across cell types.(A) Contact map for 500 conformations for the α globin locus in mouse embryonic stem cells (mES).Simulations are performed as in Figure 1, but using mES ChIP-seq and DNase-seq data, as shown in Additional file 9: Figure S9.(B) Difference between the contact maps in panel A and Figure 1C.Blue regions indicate contacts which were present in erythroblasts, but not mES, and yellow indicates contacts present in mES but not erythroblasts.(C) Plots comparing simulations and Capture-C data for mouse embryonic stem cells (data from Ref. [14]).(D)-(F) Similar plots but for the β globin locus.

Figure 6 :
Figure 6: Simulations also correctly predict looping for a less studied locus.Simulations of the Slc25a37 gene (Mitoferrin1) were performed for mouse erythroblasts and embryonic stem cell, using a similar input data as for the globin loci (DNase-seq, and ChIP-seq for CTCF and the H3K4me1 histone modification).(A) Contact map from the simulations of erythroblasts showing the frequency of contacts between each chromatin bead in 500 simulated configurations.(B) Similar contact map for the same locus in mouse embryonic stem cells.(C) Difference between the contact maps in panels A and B. Blue regions indicate contacts which were present in erythroblasts, but not mES, and yellow indicates contacts present in mES but not erythroblasts.(D) Browser view showing the genes across the 400 kb simulated region.(E) Plots showing the interaction profiles for the Slc25a37 promoter in each cell type, comparing simulation results (upper panels) with new Capture-C data (lower panels).Note that the genomic coordinates are aligned with the browser view in D.

Figure 7 :
Figure 7: Simulations predict the effect of protein knock-outs in the α globin locus.Plots showing the effect of a CTCF knock-out, and a "DHS knock-out" (equivalent to knocking out all protein complexes involved in looping the α globin locus except CTCF).(A)-(C) Contact maps showing the interactions between different chromosomal locations for conformations within each group identified by clustering analysis.Maps from three sets of simulations are shown; the positions of the known regulatory elements and gene promoters are indicated above each plot.(D) Schematics showing the structure of the locus within each group.(E) Plot showing the percentage of conformations which belong to each group identified by the clustering analysis.The colour key is given in D. (F) Plot showing in what percentage of conformations the two α globin gene promoters are interacting with one or more of the known regulatory elements.(G) Plot showing the distribution of the radius of gyration of the locus across the simulated conformations.The radius of gyration is defined as R2 g = (1/N ) N i=1 (ri − r) 2, where ri is the position of the ith chromatin bead in the polymer, and r is the mean position of all N chromatin beads.

Figures
Figures 3C-D and Additional file 5: Figures S5C-G show distributions of the separation of probe pairs at different locations in the α globin locus in mouse erythroblasts, where the α genes are active.Genomic locations of the probes are given in Additional file 5: Figure S5A.Probes were constructed in the pBS plasmid by subcloning regions from mouse BACRP23-469I8 and BACRP24-278E18 (obtained from CHORI) by λ-Red mediated recombination using oligonucleotide sequences shown in Additional file 14: TableS2(below).Recombineering was carried out mixing 50 µl of cells with 150 ng to 300 ng of purified DNA in a 0.1 cm wide cuvette using a Bio-Rad gene pulser set at 1.8 kV.Immediately after electroporation, 1 ml of SOC media was added, and cells were further grown at 37 • C for 1 hour before being plated on selective agar media containing 100 µg/ml ampicillin.In vitro cultured mouse foetal liver cells (expressing α and β globin genes) were settled on poly-l-lysine coated coverslips, fixed with 4% paraformaldehyde in 0.25 M HEPES and permeabilised with 0.2% Triton-X 100.FISH was performed using 7 kbp plasmid FISH probes, labelled with either Cy3-dCTP (GE Healthcare Life Sciences) or digoxygenin 11-dUTP (Roche Life Science).The genomic locations of the FISH probes is shown in Additional file 5: FigureS5A.Probes were hybridised in pairs (as in Additional file 5: FiguresS5B,D-G).Following hybridisation and detection using sheep anti-digoxygenin FITC (Roche Life Sciences) and rabbit anti-sheep FITC (Vector Laboratories), nuclei were imaged on a Deltavision Elite (GE Healthcare Life Sciences) using x100 super-plan apochromat oil 1.4 N.A. objective (Olympus) with a z-step size of 200 nm.Images were restored by deconvolution using Huygens Pro- Figure S2: ChIP-seq and DNase-seq data are used as an input to the model.(A) Genome browser view of genes in a 400 kbp region of mouse chromosome 11 surrounding the α-globin locus which is treated in our simulations.Symbols below the browser indicate the positions of the known regulatory elements (blue triangles) and the gene promoters (green squares).(B) ChIP-seq data for CTCF binding across the same region from mouse erythroid (Ter119 + ) cells.Red lines show the pile-up of reads, and black points indicate the positions of binding sites identified by peak-calling (see Additional file 15: Supplementary Methods for details).Data from Ref. (14).(C) Similar plot showing DNase-seq data from the same cell type, identifying the positions of DNase-1 hypersensitive sites (DHS).Data from Ref. (56).(D)-(G) Plots showing ChIP-seq data, again from the same cell type, for four TFs thought to be key players in globin regulation.Data from Ref. (14) (GATA1 and NFe2), Ref. (50) (Scl/Tal1), and Ref. (57) (Klf1).

Additional file 3 :
FigureS3: Interactions between promoters and specific regulatory elements can be identified in each simulated conformation.(A) Plot showing details of which promoters are interacting with each of the five known regulatory elements in the same set of simulations as presented in Figures1-3.Each horizontal row represents a single simulated conformation, with a blue mark indicating there is an interaction with the element (an interaction is defined as any chromatin bead lying within the promoter being within 2.75 bead diameters of any chromatin bead within the regulatory element).The grouping of different types of structure according to the clustering analysis is indicated to the left.(B) Plot showing in what proportion of conformations each of the promoters is interacting with one or more of the regulatory elements.The proportion of conformations in which either one of the α globin promoters is interacting with any of the elements is also indicated; one would expect this to represent the proportion of conformation in which α globin is being transcribed.(C) Histograms showing the distribution of the number of elements with which each promoter simultaneously interacts in a given conformation.(D) Histograms showing the distributions of the 3D separation between each of the two α globin gene promoters, and the R2 regulatory element.Also shown is the distribution for the separation between the two promoters.The dashed line indicates the distance below which two chromatin beads are deemed to be interacting.(E) Histograms showing the distribution of the radius of gyration of the locus in the conformations adopting each structure identified by the clustering analysis.The colour correspond to the different structures as shown in the schematics in panel A.

GC
Additional file 5: Figure S5: Simulation results show good agreement with fluorescence in-situ hybridization measurements.(A) Genome browser view showing the locations of FISH probes across the α globin locus; the positions of the known regulatory elements are indicated with blue triangles, and the promoters are indicated with green squares.All features are shown to scale.(B) We use the mean separations of probes measured in FISH experiments to parametrise length scales in the simulations.The experiments were performed on mature erythroblasts 30 hours after differentiation, as described in Methods.Points show the experimental versus simulation mean separations with the standard error in the mean shown as error bars.The line shows a linear fit through zero; the slope gives the conversion σ = 15.95 nm, which we round to 16 nm for the rest of the plots.(C) Plot showing the mean and standard deviation (shown as error bars) of the separation of pairs of probes.(D-G) Plots showing the full distribution of separations across many cells (at least 187 signal pairs) or simulated conformations (1000).Additional file 6: Figure S6: Variation of model parameters does not lead to large changes in the resulting configurations.(A) Plot showing how the Q score, which quantifies the agreement between the simulation and experimental chromosome interactions, varies with different model parameters (see Additional file 15: Supplementary Methods) Figure S7: ChIP-seq and DNase-seq data are used as an input to a model of the β globin locus.(A) Genome browser view of genes in a 400 kbp region of mouse chromosome 7 surrounding the β globin locus which is treated in our simulations.Symbols below the browser indicate the positions of the known regulatory elements within the LCR (blue triangles) and the gene promoters (green squares).(B) ChIP-seq data for CTCF binding across the same region from mouse erythroid (Ter119 + ) cells.Red lines show the pile-up of reads, and black points indicate the positions of binding sites identified by peak-calling (see Additional file 15: Supplementary Methods).Data from Ref. (14).(C) Similar plot showing DNase-seq data from the same cell type, identifying the positions of DNase-1 hypersensitive sites (DHS).Data from Ref. (56).(D)-(G) Plots showing ChIP-seq data, again from the same cell type, for four TFs thought to be key players in globin regulation.Data from Ref. (14) (GATA1 and NFe2), Ref. (50) (Scl/Tal1), and Ref. (57) (Klf1).(H)-(I) ChIP-seq data showing relevant histone modifications: monomethylation and trimethylation of H3K4 (associated with enhances and promoters respectively).Data from Ref. (56).All plots are aligned according to the horizontal axis.
Additional file 8: FigureS8: Interactions between β globin promoters and specific regulatory elements can be identified in each simulated conformation.(A) Plot showing details of which promoters are interacting with the known regulatory elements within the LCR from the same set of simulations as presented in Figure4.Each horizontal row represents a single simulated conformation, with a blue mark indicating there is an interaction with the element (an interaction is defined as any chromatin bead lying within the promoter being within 2.75 bead diameters of any chromatin bead within the regulatory element).The grouping of different types of structure according to the clustering analysis is indicated to the left; schematics and an individual contact map for each group are shown.(B) Plot showing in what proportion of conformations each of the promoters is interacting with one or more of the regulatory elements.The proportion of conformations in which either one of the β globin promoters is interacting with any of the elements is also indicated.(C) Histograms showing the distribution of the number of elements with which each promoter simultaneously interacts in a given conformation.

Additional file 9 :
Figure S9: Simulations predict the effect of protein knock-outs on the β globin locus.Plots showing the effect of a CTCF knock-out, and a "DHS knockout" (equivalent to knocking out all protein complexes involved in looping the β globin locus except CTCF).(A)-(C) Contact maps showing the interactions between different chromosomal locations for conformations within each group identified by clustering analysis.Maps from three sets of simulations are shown.(D) Schematics showing the structure of the locus within each group.(E) Plot showing the percentage of conformations which belong to each group identified by the clustering analysis.The colour key is given in D. (F) Plot showing in what percentage of conformations the two β globin gene promoters are interacting with one or more of the regulatory elements within the LCR.
Figure S10: ChIP-seq and DNase-seq data from mES cells can also be used as an input.(A) Browser view of the α globin locus.(B-D) ChIP-seq and DNase-seq data for mouse ES cells across the same region.Red lines show the pile-up of reads, and black points indicate the positions of binding sites identified by peak-calling.Data from the ENCODE project (58).(E) Browser view of the β globin locus.(F-H) ChIP-seq and DNase-seq data for mouse ES cells across the same region.Data from the ENCODE project (58).

Table S1 :
List of all simulation parameters.Parameters are given in simulation units; see Additional file 15: Supplementary Methods for details of how these relate to physical units.
(58)tional file 11: Figure S11: Input data is available for less well studied loci.(A)Browserview of the Slc25a37 (mitoferrin) locus.(B-D)ChIP-seqandDNase-seqdataformouse erythroid (Ter119+) cells across the same region.Red lines show the pile-up of reads, and black points indicate the positions of binding sites identified by peak-calling.CTCF data is from Ref. (14); DNase and histone modification data is from Ref.(56).(E-G)ChIP-seqand DNase-seq data for mouse ES cells across the same region.Data from the ENCODE project(58).

Table S2 :
Oligonucleotide sequences for FISH probes.Sequences used to amplify the vector backbone are shown in lower case, and regions used to subclone probe sequences in upper case.