Differential binding and co-binding pattern of FOXA1 and FOXA3 and their relation to H3K4me3 in HepG2 cells revealed by ChIP-seq
- Mehdi Motallebipour†1, 7Email author,
- Adam Ameur†2, 8,
- Madhu Sudhan Reddy Bysani1,
- Kalicharan Patra1, 9,
- Ola Wallerman1,
- Jonathan Mangion3,
- Melissa A Barker4,
- Kevin J McKernan5,
- Jan Komorowski2, 6 and
- Claes Wadelius1Email author
© Motallebipour et al.; licensee BioMed Central Ltd. 2009
Received: 17 June 2009
Accepted: 17 November 2009
Published: 17 November 2009
The forkhead box/winged helix family members FOXA1, FOXA2, and FOXA3 are of high importance in development and specification of the hepatic linage and the continued expression of liver-specific genes.
Here, we present a genome-wide location analysis of FOXA1 and FOXA3 binding sites in HepG2 cells through chromatin immunoprecipitation with detection by sequencing (ChIP-seq) studies and compare these with our previous results on FOXA2. We found that these factors often bind close to each other in different combinations and consecutive immunoprecipitation of chromatin for one and then a second factor (ChIP-reChIP) shows that this occurs in the same cell and on the same DNA molecule, suggestive of molecular interactions. Using co-immunoprecipitation, we further show that FOXA2 interacts with both FOXA1 and FOXA3 in vivo, while FOXA1 and FOXA3 do not appear to interact. Additionally, we detected diverse patterns of trimethylation of lysine 4 on histone H3 (H3K4me3) at transcriptional start sites and directionality of this modification at FOXA binding sites. Using the sequence reads at polymorphic positions, we were able to predict allele specific binding for FOXA1, FOXA3, and H3K4me3. Finally, several SNPs associated with diseases and quantitative traits were located in the enriched regions.
We find that ChIP-seq can be used not only to create gene regulatory maps but also to predict molecular interactions and to inform on the mechanisms for common quantitative variation.
The forkhead box/winged helix (FOX) family of transcription factors (TFs) is conserved from yeast to mammals, and in humans consists of approximately 40 members [1–3]. A subfamily of these factors is the FOXA family with the members FOXA1 (formerly known as hepatocyte nuclear factor (HNF)3α), FOXA2 (HNF3β), and FOXA3 (HNF3γ), involved in development of the liver tissue and regulation of expression of the liver specific genes [4, 5]. More specifically, FOXA1 and FOXA2 have been established as crucial for competence of the liver in the foregut endoderm during development . This is suggested to be due to the ability of FOXAs to act as 'pioneering' factors opening the compacted chromatin . FOXAs are also able to induce nucleosome positioning in a nucleosomal array, which has been demonstrated to occur in the enhancer region of the mouse serum albumin gene . In an X-ray crystallographic study of FOXA3 bound to DNA, it was suggested that these factors bind as monomers and that the structure of FOXAs is similar to those of histones H1 and H5 . The latter is proposed to be the explanation for the ability of FOXA to position nucleosomes and act as a pioneering factor .
FOXA1, -2, and -3 share great homology in the DNA binding domain. FOXA1 shares 95% and FOXA2 90% sequence identity with FOXA3 within the forkhead domain . While FOXA1 and -2 are up to 39% identical outside of the forkhead domain, FOXA3 has much less similarity with these factors . The FOXAs regulate genes involved in metabolism [2, 5, 9], for example, those encoding transthyretin, and apolipoproteins. Moreover, FOXA2 autoregulates its own expression and that of other TFs - for example, HNF4α, HNF1, and HNF6 - and has therefore been implicated as a master regulator of gene expression in the liver [9–11]. In the study by Duncan et al. , it was suggested that FOXA1 is a weaker transcription enhancer than FOXA2. It was further proposed that as FOXA1 and FOXA2 have the same recognition sequence on DNA, they compete for the binding site and FOXA1 may therefore exhibit an inhibitory effect.
There have been some chromatin immunoprecipitation (ChIP)-chip and ChIP with detection by sequencing (ChIP-seq) studies on members of the FOXA family published, specifically on FOXA1 (in MCF-7 and LNCaP cells) [12–14] and FOXA2 (mouse liver and a limited study in human liver) [15–17]. Although these studies have revealed interesting aspects of FOXAs as TFs, none have examined the interrelationship of the three members of this family. Additionally, several of these studies have only investigated the FOXA binding sites at the promoters of known genes and thus have not been truly genome-wide, despite the evidence that, for example, FOXA2 binds at sites other than the transcriptional start sites (TSSs) [16, 18].
Modifications of the amino-terminal tails of the histones can change the accessibility of the chromatin for TFs and the transcriptional machinery and thereby regulate the expression of genes. Although combinations of these modifications are indicated as a prerequisite for activation or repression of the transcriptional activity , genome-wide studies of all the required modifications in every condition is not practical. Therefore, one modification can be chosen as representative for an active or inactive state of transcription. In this study, we have selected trimethylation of lysine 4 on histone H3 (H3K4me3), a commonly studied histone modification, as an indication of regions actively transcribed or poised to be transcribed .
The new generation of sequencers, generally known as high throughput sequencers, has made the detection of DNA resulting from ChIP for genome-wide studies easier and more cost-effective. In this study, we aimed to characterize the genome-wide binding sites of FOXA1 and FOXA3, for the first time, in the hepatocellular carcinoma cell line HepG2 through ChIP and sequencing on the SOLiD platform. Furthermore, we intended to examine their possible interactions with each other, with FOXA2, and their correlation with H3K4me3 and other TFs in vivo. We found that FOXA1 and FOXA3 have dissimilar distributions of binding sites in HepG2. Intriguingly, although there were sites of FOXA1 and FOXA3 co-binding together with FOXA2, FOXA1 and FOXA3 did not seem to interact in vivo. Furthermore, we discovered that trimethylation of lysine 4 at histone H3 reveals different patterns or 'signatures' depending on the promoter structure and transcriptional activity. Importantly, H3K4me3 was often found at a distance of about 200 bases from the sites of FOXA1-2-3 binding, frequently directed towards the nearest TSS. Finally, we demonstrate that ChIP-seq can be used for detecting allele-specific binding and candidate functional single nucleotide polymorphisms (SNPs).
Overall data analysis
For the genome-wide analysis of FOXA1 and FOXA3 binding sites and regions of H3K4me3 in HepG2 cells, ChIPs and detection by a high throughput sequencer was performed. In order to get a detailed view of the regions of H3K4me3, we decided to treat the chromatin with micrococcal nuclease (MNase). MNase recognizes the naked DNA, which is not tightly wrapped around the nucleosomes, and digests it. This, in combination with the ChIP, will lead to nucleosome-sized DNA (147 bp) that can be sequenced by high throughput sequencers, resulting in a fine mapping of the H3K4me3 pattern in the genome. After alignment of the raw reads and calculation of overlap signals, we compared the results between the different libraries prepared for sequencing and detected a good correlation (Figure S1 in Additional data file 1). Thereafter, the aligned reads were merged, ordered on genomic positions, and extended by the average fragment size (Table S1 in Additional data file 1). We also sequenced a fraction of the input material, generated in the ChIPs, to use as a negative control for detection of regions where repeats may cause false positive overlap signals. Then we identified peaks with significant ChIP-enrichment by considering both the ChIP- and input signals.
Number of regions and overlaps with putative FOXA binding and H3K4me3
As mentioned, members in the FOXA family regulate common pathways. This was supported by our Gene Ontology (GO) analysis, where some categories were recurrent for FOXA1 and FOXA3 (Figure S4A, B in Additional data file 1). Here, we consider a gene to be regulated by FOXA1 or FOXA3 when it contains a binding site within 1 kb of the TSS. Therefore, we analyzed the data for possible co-binding sites for these two factors. As presented in Table 1, more than 3,000 peaks were found in both data sets.
In a genome-wide study of FOXA1 binding sites in the human breast adenocarcinoma cell line (MCF-7), 12,904 regions have been found at a 1% false discovery rate . Of these, 2,093 (16%) overlap with the putative binding sites found in HepG2 cells in our study. In a similar way, 2,178 (27%) of our regions were reciprocally found in the MCF-7-data. This indicates that around 2,000 FOXA1 binding sites are common between the HepG2 and MCF-7 cell lines, while 6,000 binding sites are unique to HepG2.
We found 41,780 H3K4me3 regions in the HepG2 genome (Table 1). This would approximately correspond to 160,000 nucleosomes with trimethylation of lysine 4 on histone H3. This number is calculated by multiplying the 41,780 regions by 764, which is the average peak length (Table S1 in Additional data file 1), and then dividing the product by 200, the assumed average distance in base-pairs between the start of two nucleosomes in these regions. Of the H3K4me3 regions, 42% are within 1 kb and an additional 15% within 5 kb of the TSS of a known gene, and 4.2% within 1 kb of a 3'-end (Figure S2 in Additional data file 1). Furthermore, 11% of these regions are intragenic, leaving 28% of the H3K4me3 not in the vicinity of a known gene.
Distinct H3K4me3 at bidirectional and other promoter structures
Considering clusters I, II, and III with high enrichments for H3K4me3 upstream of the TSS, we suspected the existence of bidirectional transcription in these regions. Therefore, the clusters were compared with the data for CAGE tags [22–24] in HepG2. CAGE (cap-analysis of gene expression) is a measurement of the expression of the TSSs of a gene. Consistent with our expectation, over 30% of the genes in each of these three clusters were in the vicinity of CAGE tags on the other strand compared to the TSS, that is, they were part of a bidirectional promoter (Table S3 in Additional data file 1). For cluster II, with a high and broad peak upstream of the TSS, this fraction exceeded 60%. Another significant finding was that 11% of the genes in cluster I, which had the highest expression, also had a FOXA3 binding site within 1 kb of their TSS (Table S3 in Additional data file 1).
Previous studies have suggested that bidirectional promoters occur in CpG-rich sequences [22, 23]. Thus, we examined the frequency of different sequence elements at the TSSs of the genes in the seven clusters (Table S5 in Additional data file 1). Promoters for all clusters - except cluster V, which had the least number of bidirectional promoters - were highly enriched for CpG-rich sequences. As expected, cluster V contained a higher number of TATA- and CAAT-boxes.
Thus, by unsupervised clustering of enrichment signals around the TSSs, we detected different H3K4me3 signatures depending on the structure of the promoter, sequence elements present in the promoter, and the level of expression of the downstream gene. A similar type of analysis was also performed for H3K4me3 at the 3'-end of genes (Figure S5 in Additional data file 1). Some of the clusters with higher signals at the 3'-ends were associated with high expression of the gene, suggesting a reciprocal H3K4me3 signal at the beginning and the end of some genes. These clusters also have higher frequency of CAGE tags at the 3'-ends (Table S10 in Additional data file 1). For further comments, see the supplementary results in Additional data file 1.
FOXA interactions detected by co-immunoprecipitation and ChIP-reChIP
The lack of evidence for a direct FOXA1 and FOXA3 interaction could be for technical reasons with regard to the Co-IP protocol. Therefore, to detect and verify possible co-bindings and to further understand whether these are due to binding of different FOXA molecules at the same site in different cells or due to co-binding in the same cell, we employed the ChIP-reChIP method in combination with semiquantitative PCR. With this method, crosslinked protein-DNA complexes are immunoprecipitated first with the antibody for one protein in the complex, followed by immunoprecipitation with the antibody for the second protein. We immunoprecipitated the chromatin from HepG2 cells with any of the three FOXA antibodies (FOXA1, FOXA2, and FOXA3) and reimmunoprecipitated the material with another of the three antibodies. The sequence of the pairs was then reversed in independent replicates in order to verify the results from the first round. The resulting DNA was then analyzed by PCR with primers amplifying a region containing enriched peaks for both factors in the complex. As a negative control, we used primers for regions containing a binding site for only one of the factors in the pair and primers for a region with no binding site for any of the factors. Theoretically, if two of the factors co-bind in a region, that sequence should be enriched in the ChIPed DNA, while sequences with a single binding should not be enriched as they are selected against by the serial immunoprecipitation. As demonstrated in Figure 4b, we could find that each of the factors FOXA1, FOXA2, and FOXA3 bind in close vicinity of any of the other two FOXAs on the same DNA molecule in the same cell.
With these results, we demonstrate regions of pair-wise binding for FOXA1, FOXA2, and FOXA3, where these factors co-bind in close proximity and, as indicated by the Co-IP data, some of these factors may even interact at the site of binding.
Correlation of FOXA binding and H3K4me3
In the next step, we correlated these clusters with CAGE tags from HepG2 within the same distance as above. This comparison revealed a higher percentage of TSSs near the combined FOXA binding sites (Table S7 in Additional data file 1). When considering CAGE-tags within 1 kb, the difference in directionality for clusters I and II became more evident, with more CAGE tags in the plus direction for cluster I and in the minus direction for cluster II. Furthermore, by creating separate footprints of H3K4me3 around the FOXA1-2-3 regions with or without a TSS within 5 kb, we observed that both groups exhibit a double peak, each peak with its centre at a distance of approximately 200 bp from the binding site (Figure 6c). The H3K4me3 pattern around FOXA1-2-3 binding sites, as presented in our study, correlates well with the hypothesis that FOXAs position nucleosomes at their binding site , which is best supported by FOXA1-2-3 regions with a TSS within 5 kb (Figure 6c).
Correlation of FOXA binding with other transcription factors
Overlap between putative FOXA binding sites and the binding of other factors
Number of regions
Another interesting observation was that FOXA1-2 and FOXA1-3 regions were more related to HNF4α binding than FOXA2-3 (Table 2 and Figure 5a-c, e-g). In addition, FOXA1-2-3 binding is highly correlated with HNF4α binding in HepG2 cells (Table 2 and Figure 5d, h).
Allele-specific DNA-protein interactions
Monoallelic expression of genes can be due to imprinting, allelic exclusion or sex chromosome dosage compensation. SNPs in combination with the ChIP-seq could prove to be a powerful method for detection of allele-specific binding that could lead to monoallelic or preferential expression from one allele in the studied genome. With a high enough number of sequence reads at a locus with a heterozygous SNP, one can detect whether the majority of reads are from one allele or the other. If TF binding or active histone marks are predominantly found on only one of the alleles, one can suspect preferential binding to that particular allele of the gene. Previously, we have interrogated the genome of HepG2 cells for SNPs by the Infinium assay and Human-1M array (Illumina) in 1,000,000 positions (data not shown). Among these, 220,000 were heterozygous SNPs (Additional data file 5), which we screened in the ChIP-seq data for allele-specific binding. After taking multiple testing into account as described in the Materials and methods section, we found three examples for FOXA1, two for FOXA3, and six for H3K4me3 (Table S8 in Additional data file 1).
Combining ChIP-seq and SNP association data
Comparison of ChIP-seq data with genome-wide association studies for identification of functional SNPs
CELSR2, PSRC1, SORT1
Kathiresan et al.
Plasma levels of liver enzymes (ALT†)
Yuan et al.
Aulchenko et al., Sabatti et al.
Aulchenko et al.
Sabatti et al.
Aulchenko et al.
Kathiresan et al.
TOMM40-APOE, APO cluster
Aulchenko et al., Sabatti et al.
Aulchenko et al.
Plasma levels of liver enzymes (GGT¥)
Yuan et al
In this paper, we present the first true genome-wide location analysis of FOXA1 and FOXA3 binding sites in the human HepG2 cell line through ChIP-seq and their internal association. Our analysis demonstrates that among the FOXA family, FOXA1 is the more frequent binder with a majority of binding sites far from known genes, while FOXA3 binds least frequently and preferentially at sites near a known gene. Additionally, from Co-IP analyses we found that FOXA2 interacts with both other FOXAs, while FOXA1 and FOXA3 do not seem to interact. Through ChIP-reChIP experiments, we demonstrated pair-wise co-binding of the FOXA factors to the same sites of DNA in the same cell. These data were further substantiated by the differential binding pattern of these complexes and their interactions with other TFs located either at TSSs or at distant sites.
Out of the ten top-ranked motifs found for FOXA1 and FOXA3, only the few top motifs are canonical and the rest are either variations of the top-motif or other motifs. We have previously suggested that these non-canonical bindings might be due to interactions of FOXAs with other TFs and that these motifs might in fact be canonical motifs for the binding partners of FOXAs . A recent report implies that different sequences at binding sites might affect the binding and regulatory activity of the interacting TF .
We did not find any evidence of protein-protein interaction between FOXA1 and FOXA3, but we cannot yet completely exclude a direct/indirect interaction. If there are any interactions between these two factors, they might be transient and rapid in the cells. These interactions might also be very weak and therefore easily lost during the treatments and washes, and hence not detected by Co-IP. Instances where FOXA1, FOXA2, and FOXA3 are found together could be due to two molecules of FOXA2 binding at the site and each recruiting one of the other two factors. Another possibility is the involvement of other factors, such as HNF4α, in engaging the different participants of the complex. Indeed, we demonstrate here that 76% of FOXA1-2-3 bindings coincide with HNF4α-binding. Previously, we have also detected that HNF4α co-immunoprecipitates with FOXA2 in HepG2 cells .
Based on the X-ray crystallographic structure of FOXA3, it was postulated that FOXAs bind DNA as monomers . This is not in conflict with our results, as most of the binding sites for the FOXAs seem to contain only one motif for the FOXA factors. Regions with binding of two or all three factors could be through binding of one to the main motif and the second and/or third factor to other motifs. This is less likely as at the sites of interaction we mostly find a single peak for each of the factors, all in nearly the same position. We therefore hypothesize that each of the factors bind DNA as monomers at different positions and then may interact in diverse constellations through long-range interactions. Further studies are required to understand the nature of these interactions and the inter-relationship of the FOXA factors.
In this study, we have also mapped the histone modification H3K4me3 genome-wide in HepG2 cells. As in other studies, we find this modification mainly around the TSS of genes . However, by clustering of data by k-means in these regions, we were able to discover several different patterns of H3K4me3, few of them highly correlated with the expression level of the gene and the surrounding promoter structure. Using the same approach, we could also detect five clusters at the 3'-end of genes. Again, here we found patterns that could be related to the structure of the DNA at these sites. Convergent or tail-to-tail genes, for example, exhibited other patterns than tail-to-head genes or genes without any other transcription units in the vicinity. Our results clearly demonstrate the importance of cautious interpretation of histone mark signals, as the average signal, usually displayed by footprints, is composed of a number of diverse patterns from subgroups of genes with different features.
FOXA TFs are demonstrated to act as pioneering factors, that is, they are able to bind to compacted chromatin and facilitate opening of the chromatin and recruitment of other TFs. We found here that binding of FOXA2-3 correlates more often with H3K4me3 than FOXA1-2 and that FOXA3 binding was associated with high expression and high enrichment for trimethylation of K4 on histone H3. Additionally, sites with triple binding of FOXAs had one or two peaks of H3K4me3 on either side of the binding site, with directionality towards a TSS. This can be interpreted as the binding of FOXAs upstream of a TSS positioning the nucleosomes in between, regardless of the distance, and recruiting the histone methyl-transferases that methylate the first nucleosome adjacent to the FOXA binding site. This mark may then be propagated to some extent between the binding site and the TSS.
Preferential binding of TFs and histone modifications to one allele of a SNP can easily be detected through ChIP-seq if there is a sufficient number of overlapping reads at heterozygous SNPs. Specifically, the preferred binding/modification can occur in regulatory elements with SNPs that are associated with common disorders. We found several examples of SNPs that were related to a preferential binding/modification in this study. Examination of these SNPs and their possible effect on the expression of nearby genes can offer mechanistic insights into inherited variation in gene expression and pathogenesis of common disorders.
ChIP-seq is a powerful strategy that can be used for purposes other than to create gene regulatory maps. By combining data for several TFs, we predicted protein-protein interactions and sites of co-binding, which were validated in Co-IP and ChIP-reChIP experiments. By reading the sequence in ChIP-DNA at polymorphic sites, we find many instances of allele-specific DNA-protein interactions, which are important data to understand the mechanisms of allelic imbalance in gene expression. SNPs associated with diseases and common phenotypes are also frequently found in the enriched regions. This means that ChIP-seq is an additional high-throughput method to generate systematic data on the molecular basis of quantitative human variation.
Materials and methods
Chromatin immunoprecipitation - ChIP and ChIP-reChIP
The human hepatocellular carcinoma cell line HepG2 was cultured in RPMI-1640 with non-inactivated fetal bovine serum and penicillin-streptomycin at 37°C with 5% CO2. For the ChIP-assay, cells were washed and serum-free medium was added before incubation with formaldehyde in a final concentration of 0.37% for 10 minutes. Cells were then lysed and the nuclei were resuspended in RIPA-buffer (1× phosphate-buffered saline (PBS), 1% NP-40, 0.5% Na-deoxycholate, 0.1% SDS, 0.004% Na-azide) for sonication with a Bioruptor (Diagenode, Liège, Belgium) to obtain fragments with a size range between 150 and 300 bp. Sonicated chromatin was then pre-cleared and incubated with the antibody overnight. After a short incubation with Protein-G agarose (Roche, Mannheim, Germany), the chromatin-antibody-bead complex was washed four times with RIPA-buffer, once with 0.01 M Tris-HCl (pH 8), 0.25 M LiCl, 0.001 M EDTA, 1% NP-40 and 1% Na-deoxycholate, and once with TE-buffer. Then the chromatin was eluted with 1% SDS and 0.1 M NaHCO3 before RNase-treatment and reversal of the cross-linking at 65°C for 6 hours and Proteinase-K treatment overnight. The DNA was eluted with phenol-chloroform and precipitated with ethanol. Enrichment for each antibody was tested by semiquantitative PCR with primers for known binding sites.
ChIP-reChIP was performed as described in , except that after the first elution the material was diluted five times.
Antibodies used in this study were for FOXA1 (ab5089, Abcam, Cambridge, UK), FOXA2 (sc-6554, Santa Cruz Biotechnology, Santa Cruz, CA, USA), FOXA3 (sc-5361, Santa Cruz Biotechnology), and tri-methylation-histone H3 (K4) (05-745 (CL MC315100), Upstate, Temecula, CA, USA).
Pelleted HepG2 cells were resuspended in ice-cold buffer A containing 320 mM sucrose, 15 mM HEPES pH 7.9, 60 mM KCl, 2 mM EDTA, 0.5 mM EGTA, 0.5% bovine serum albumin, 0.5 mM spermidine, 0.15 mM spermine, and 0.5 mM DTT. After a short incubation on ice, cells were homogenized with a Dounce homogenizer. The homogenate was then layered on an equal volume of buffer B, which was the same buffer as A except that it did not contain bovine serum albumin and the sucrose concentration was 30%. Collected nuclei were washed once with and resuspended in 0.34 mM sucrose, 15 mM HEPES pH 7.5, 60 mM KCl, 15 mM NaCl, 0.5 mM spermidine, 0.15 mM spermine, and 0.15 mM β-mercaptoethanol. After adjustment of CaCl2 concentration to 3 mM, the suspension was aliquoted to have 30 × 106 cells per milliliter. These were incubated at 37°C for 5 minutes before addition of 300 U of micrococcal nuclease to each aliquot, after which the incubation was continued for another 5 minutes. To stop the reaction 90 mM HEPES pH 7.9, 220 mM NaCl, 10 mM EDTA, 2% Triton X-100, 0.2% Na-deoxycholate, 0.2% SDS, 0.5 mM PMSF, and 2 μg/ml aprotinin was added. The solution was centrifuged and the supernatant was used to set up the ChIP-assay as described above.
Library preparation and sequencing
All preparations of libraries for sequencing and the sequencing itself were according to the SOLiD™ System 2.0 Fragment Library Preparation: Lower Input DNA and User Guide standard protocols, with the following modifications.
For the FOXA1 library three ChIPs were performed, resulting in five immunoprecipitations, which were pooled. The majority of the fragments were in the size range of 150 to 300 bp after sonication during the ChIP. Therefore, to obtain a homogeneous fragment size and to avoid biased fragment amplification, we decided to prepare three fragment libraries. After ligation of adapters and purification, the DNA was separated by PAGE and three fragment size ranges selected: 150 to 200 bp, 200 to 250 bp, and 250 to 300 bp including the adapters. These were used for 15 cycles of in-gel PCR. For the FOXA3 library 0.5 μg of immunoprecipitation material and 4 μg of input were used as the starting material. Here, three fragment sizes were also chosen for the immunoprecipitation material, but only the 250- to 300-bp fragment size was chosen from the input. The in-gel PCR was performed for 13 cycles. Amplified fragments were sequenced with 50-bp read lengths for FOXA1 and 35-bp read lengths for FOXA3. For H3K4me3, five immunoprecipitations from independent MNase-ChIP assays were pooled and used as the starting material. For this library preparation only one fragment size was chosen, as the sizes of fragments were uniform due to micrococcal nuclease treatment. The library was sequenced with a 35-bp read length.
Cells were grown as described for ChIP. These were washed twice with cold 1× PBS, resuspended in a buffer containing 100 mM Tris-HCl (pH 8.0), 100 mM NaCl, 0.2% NP-40, and protease inhibitor cocktail for 30 minutes on ice and centrifuged in cold to collect nuclei in the pellet. The pellet containing intact nuclei was washed once with RIPA and resuspended in the same buffer, followed by homogenization in a Dounce homogenizer. Precleared nuclear lysate was incubated with antibody or IgG at a concentration of 1 μg per approximately 500 μg of total protein and 50 μl Protein G-agarose at 4°C overnight. Immune complexes were washed twice with RIPA buffer and eluted in NuPAGE® LDS sample buffer (Invitrogen, Temecula, CA, USA) containing reducing agent (Invitrogen) at 70°C for 10 minutes. Samples were separated on NuPAGE® 4-12% for western blotting and detection with ECL detection system (GE Healthcare, Chalfont St Giles, UK). In order to verify the equal loading and as a negative control for the Co-IP experiments, one blot was stripped with 1× PBS, 0.1% Tween-20, and 0.2% SDS for 30 minutes at 65°C and incubated with GAPDH antibody (data not shown).
Alignment, data management, and peak detection
ChIP-seq data were aligned using the SOLiD™ matching pipeline . Three match errors in color space were allowed for the data sets with 35-bp reads (H3K4me3, FOXA3, and input). For the 50-bp reads in the FOXA1 experiment, four errors were allowed.
For FOXA1 and FOXA3, only reads starting at unique points were considered as those fragments were randomly sonicated and therefore expected to start at different positions. H3K4me3 ChIP-DNA had been sheared with MNase and the fragments were thus expected to be of defined lengths. Therefore, we did not merge H3K4me3 reads starting at the same position. Input reads were not merged either. Reads were then extended to represent the average fragment length in each of the samples (Table S1 in Additional data file 1) and genome-wide signals were created for reads on the forward strand, reads on the reverse strand, and overlapping fragments.
In this way, we can calculate P-values for observing at least k overlapping reads at any given position. These tests are then performed for each base in the genome. Because of the high number of reads in these experiments, and since the fraction l f /l g is very small, the binomial distribution can be effectively approximated by Poisson (λ) in this analysis where λ = l f /l g . So, in practice, this part of our peak finding method will yield the same results as other approaches that are based on the Poisson distribution . Nevertheless, we chose to use the binomial distribution since it is a general and intuitive model for randomly placing millions of reads onto the genomic sequence. The Bonferroni method was used to correct for multiple hypothesis testing, a very stringent P-value correction.
Cutoffs were then obtained by selecting the lowest overlap value that gives a corrected P-value below 0.01: 15 for FOXA1, 21 for FOXA3, 15 for H3K4me3 and 16 for input. Peaks were defined by consecutive positions with overlap at least as high as the cutoff. In the next step, FOXA1, FOXA3, and H3K4me3 peaks that were also significantly enriched for input were removed. Examples of such regions are demonstrated in Figure S7 in Additional data file 1. High signals both in input and ChIP DNA are mostly found in regions close to centromeres where repeats are abundant. Such signals can arise when the reference genome has not been properly assembled. In the final step, we keep only regions with a significant number of overlapping forward reads upstream of the peak, or a significant number of reverse reads downstream. To take the forward and reverse reads into account is a commonly used strategy in ChIP-seq analysis [38, 39]. We calculated cutoffs for forward and reverse peaks in the same way as for the overlap signal with l f now being the read length instead of fragment length as before. We allowed the forward or reverse peak to be at most a fragment length distance from the peak center. The reason why we do not require both a forward and a reverse peak is that repetitive elements immediately upstream or downstream of a peak can make it impossible to get uniquely aligned reads there.
We used the Bioconductor package BCRANK  for de novo motif searches in FOXA1 and FOXA3 data. BCRANK takes a ranked list as regions and works best on lists containing many regions, where the sequences at the bottom are not always bound directly by the TF. Therefore, we created ranked lists with FOXA1 and FOXA3 regions using more relaxed cutoffs than the ones described above by not applying the cutoffs for forward and reverse peaks. This resulted in 16,578 relaxed regions for FOXA1 and 22,066 for FOXA3.
Annotations and Gene Ontology analysis
We annotated our regions to two types of transcripts. For well-characterized genes, we downloaded the UCSC Genes table from the UCSC Genome Browser , which contains predictions based on RefSeq, Genbank, CCDS, and UniProt. For less well-annotated transcripts, we downloaded the CAGE tag database from the FANTOM3 site , and extracted the 45,093 CAGE tag clusters found in HepG2 cells. In our analysis, all transcripts with a TSS within 1 kb of a peak are considered as bound by that factor. GO analysis was performed by applying the DAVID  functional annotation tools to sets of UCSC Genes. CpG island coordinates were downloaded from the UCSC Genome Browser.
Expression data were downloaded from Gene Expression Omnibus , accession number [GEO:GDS2213]. In the experiment, mRNA levels in HepG2 cells were measured in four replicates on the Affymetrix U133 Plus 2.0 array . The expression level for each probe was calculated as the mean of the four replicates, and the probes were mapped to the corresponding UCSC Genes using the knownToU133Plus2 table from the UCSC Genome Browser. In this way, we obtained expression levels for 23,898 genes.
Clustering of H3K4me3 patterns
For all 23,898 genes with expression levels in HepG2, we extracted the H3K4me3 signals in a 4-kb window surrounding the TSSs. The H3K4me3 signals were stored in numerical vectors with one value for each base in the window. k-means clustering was then applied to the 23,898 vectors. In this way, genes with similar H3K4me3 patterns around TSSs could be identified in an unsupervised manner. The same clustering strategy was performed for H3K4me3 signals at the 3'-end of the genes.
Statistical analysis of allele-specific binding
Around 220,000 SNPs, heterozygous in HepG2, were identified by genotyping 1,000,000 SNPs on an Illumina array (data not shown). In the next step, we filtered out SNPs that were covered by at least ten reads. This resulted in three heterozygous SNPs for FOXA1, two for FOXA3, and six for H3K4me3. Among these heterozygous SNPs, we aimed to detect instances where ChIP-seq data were significantly biased towards one of the alleles. For this, we used SNP calling results from the SOLiD™ SNP calling pipeline  for each of H3K4me3, FOXA1, and FOXA3. We then assumed that both alleles should be equally represented if there is no allelic preference for the binding, and used a binomial model to test for the hypothesis. SNPs with a P-value < 0.001 were considered significant. To reach the significance threshold, a SNP must be covered by at least ten reads and, in that case, all reads must contain the same allele. For SNPs with even higher coverage, both alleles can be present but one significantly more often than the other. By multiplying the number of tested heterozygous SNPs (Table S8 in Additional data file 1) with the P-value significance threshold 0.001, we obtain the number of expected false positive allele-specific binding events for each factor. A comparison with the number of observations reveals that the false discovery rate is about 0.1 in our results, indicating that about one of the 11 identified SNPs in Table S8 in Additional data file 1 is expected to be a false positive.
The raw ChIP-seq data are freely accessible at the European Read Archive  with the accession number [ERA:000074]. The coordinates for our identified positive regions for FOXA1, FOXA3, and H3K4me3 are available as Additional data files 2, 3, and 4 and can be uploaded and viewed in the UCSC Genome Browser.
Additional data files
The following additional data are available with the online version of this paper: a PDF file with supplementary results, materials and methods, Figures S1 to S8, and Tables S1 to S14 (Additional data file 1); a BED file with positions of FOXA1 in the HepG2 genome (Additional data file 2); a BED file with positions of FOXA3 in the HepG2 genome (Additional data file 3); a BED file with positions of H3K4me3 in the HepG2 genome (Additional data file 4); a text file containing data on heterozygous SNPs in HepG2 with position, ID, and alleles of each SNP (Additional data file 5).
cap-analysis of gene expression
ChIP with detection by sequencing
Forkhead box/winged helix
GA binding protein
trimethylation of lysine 4 on histone H3
hepatocyte nuclear factor
single nucleotide polymorphism
transcriptional start site
upstream stimulatory factor.
We thank people at Applied Biosystems in Foster City and Beverly, USA, and Uppsala Genome Center at Rudbeck laboratory, Uppsala University, Sweden for performing the sequencing. A special thanks to Jussi Vanhatalo at the Nordic AB for his skill and technical support in these studies. CW was supported by the Swedish Research Council for Science and Technology, the Diabetes association, the Cancer Foundation, the Markus Borgström Foundation, the Family Ernfors Fund, Novo Nordisk, and a grant for sequencing from the Knut and Alice Wallenberg Foundation, via the Center for Metagenomic Sequencing. JK was supported by the Swedish Foundation for Strategic Research, The Knut and Alice Wallenberg Foundation, Uppsala University and the Swedish University for Agricultural Sciences.
- Kaestner KH, Knochel W, Martinez DE: Unified nomenclature for the winged helix/forkhead transcription factors. Genes Dev. 2000, 14: 142-146.PubMedGoogle Scholar
- Friedman JR, Kaestner KH: The Foxa family of transcription factors in development and metabolism. Cell Mol Life Sci. 2006, 63: 2317-2328. 10.1007/s00018-006-6095-6.PubMedView ArticleGoogle Scholar
- Carlsson P, Mahlapuu M: Forkhead transcription factors: key players in development and metabolism. Dev Biol. 2002, 250: 1-23. 10.1006/dbio.2002.0780.PubMedView ArticleGoogle Scholar
- Lee CS, Friedman JR, Fulmer JT, Kaestner KH: The initiation of liver development is dependent on Foxa transcription factors. Nature. 2005, 435: 944-947. 10.1038/nature03649.PubMedView ArticleGoogle Scholar
- Schrem H, Klempnauer J, Borlak J: Liver-enriched transcription factors in liver function and development. Part I: the hepatocyte nuclear factor network and liver-specific gene expression. Pharmacol Rev. 2002, 54: 129-158. 10.1124/pr.54.1.129.PubMedView ArticleGoogle Scholar
- Cirillo LA, Lin FR, Cuesta I, Friedman D, Jarnik M, Zaret KS: Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol Cell. 2002, 9: 279-289. 10.1016/S1097-2765(02)00459-8.PubMedView ArticleGoogle Scholar
- Shim EY, Woodcock C, Zaret KS: Nucleosome positioning by the winged helix transcription factor HNF3. Genes Dev. 1998, 12: 5-10. 10.1101/gad.12.1.5.PubMedPubMed CentralView ArticleGoogle Scholar
- Clark KL, Halay ED, Lai E, Burley SK: Co-crystal structure of the HNF-3/fork head DNA-recognition motif resembles histone H5. Nature. 1993, 364: 412-420. 10.1038/364412a0.PubMedView ArticleGoogle Scholar
- Duncan SA, Navas MA, Dufort D, Rossant J, Stoffel M: Regulation of a transcription factor network required for differentiation and metabolism. Science. 1998, 281: 692-695. 10.1126/science.281.5377.692.PubMedView ArticleGoogle Scholar
- Odom DT, Dowell RD, Jacobsen ES, Nekludova L, Rolfe PA, Danford TW, Gifford DK, Fraenkel E, Bell GI, Young RA: Core transcriptional regulatory circuitry in human hepatocytes. Mol Syst Biol. 2006, 2: 10.1038/msb4100059. 2006.0017Google Scholar
- Kyrmizi I, Hatzis P, Katrakili N, Tronche F, Gonzalez FJ, Talianidis I: Plasticity and expanding complexity of the hepatic transcription factor network during liver development. Genes Dev. 2006, 20: 2293-2305. 10.1101/gad.390906.PubMedPubMed CentralView ArticleGoogle Scholar
- Carroll JS, Liu XS, Brodsky AS, Li W, Meyer CA, Szary AJ, Eeckhoute J, Shao W, Hestermann EV, Geistlinger TR, Fox EA, Silver PA, Brown M: Chromosome-wide mapping of estrogen receptor binding reveals long-range regulation requiring the forkhead protein FoxA1. Cell. 2005, 122: 33-43. 10.1016/j.cell.2005.05.008.PubMedView ArticleGoogle Scholar
- Laganiere J, Deblois G, Lefebvre C, Bataille AR, Robert F, Giguere V: From the Cover: Location analysis of estrogen receptor alpha target promoters reveals that FOXA1 defines a domain of the estrogen response. Proc Natl Acad Sci USA. 2005, 102: 11651-11656. 10.1073/pnas.0505575102.PubMedPubMed CentralView ArticleGoogle Scholar
- Lupien M, Eeckhoute J, Meyer CA, Wang Q, Zhang Y, Li W, Carroll JS, Liu XS, Brown M: FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell. 2008, 132: 958-970. 10.1016/j.cell.2008.01.018.PubMedPubMed CentralView ArticleGoogle Scholar
- Odom DT, Dowell RD, Jacobsen ES, Gordon W, Danford TW, MacIsaac KD, Rolfe PA, Conboy CM, Gifford DK, Fraenkel E: Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat Genet. 2007, 39: 730-732. 10.1038/ng2047.PubMedPubMed CentralView ArticleGoogle Scholar
- Wederell ED, Bilenky M, Cullum R, Thiessen N, Dagpinar M, Delaney A, Varhol R, Zhao Y, Zeng T, Bernier B, Ingham M, Hirst M, Robertson G, Marra MA, Jones S, Hoodless PA: Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing. Nucleic Acids Res. 2008, 36: 4549-4564. 10.1093/nar/gkn382.PubMedPubMed CentralView ArticleGoogle Scholar
- Tuteja G, Jensen ST, White P, Kaestner KH: Cis-regulatory modules in the mammalian liver: composition depends on strength of Foxa2 consensus site. Nucleic Acids Res. 2008, 36: 4149-4157. 10.1093/nar/gkn366.PubMedPubMed CentralView ArticleGoogle Scholar
- Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD, Wraight VL, James K, Andrews R, Langford C, Dhami P, Carter N, Vetrie D, Ponten F, Komorowski J, Dunham I, Wadelius C: Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005, 14: 3435-3447. 10.1093/hmg/ddi378.PubMedView ArticleGoogle Scholar
- Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, Cuddapah S, Cui K, Roh TY, Peng W, Zhang MQ, Zhao K: Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 2008, 40: 897-903. 10.1038/ng.154.PubMedPubMed CentralView ArticleGoogle Scholar
- Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA: A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007, 130: 77-88. 10.1016/j.cell.2007.05.042.PubMedPubMed CentralView ArticleGoogle Scholar
- Ameur A, Rada-Iglesias A, Komorowski J, Wadelius C: Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP. Nucleic Acids Res. 2009, 37: e85-10.1093/nar/gkp381.PubMedPubMed CentralView ArticleGoogle Scholar
- Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume DA: Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet. 2007, 8: 424-436. 10.1038/nrg2026.PubMedView ArticleGoogle Scholar
- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, et al: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006, 38: 626-635. 10.1038/ng1789.PubMedView ArticleGoogle Scholar
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309: 1559-1563. 10.1126/science.1112014.PubMedView ArticleGoogle Scholar
- Wallerman O, Motallebipour M, Enroth S, Patra K, Bysani MS, Komorowski J, Wadelius C: Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing. Nucleic Acids Res. 2009.Google Scholar
- Rada-Iglesias A, Ameur A, Kapranov P, Enroth S, Komorowski J, Gingeras TR, Wadelius C: Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders. Genome Res. 2008, 18: 380-392. 10.1101/gr.6880908.PubMedPubMed CentralView ArticleGoogle Scholar
- Yuan X, Waterworth D, Perry JR, Lim N, Song K, Chambers JC, Zhang W, Vollenweider P, Stirnadel H, Johnson T, Bergmann S, Beckmann ND, Li Y, Ferrucci L, Melzer D, Hernandez D, Singleton A, Scott J, Elliott P, Waeber G, Cardon L, Frayling TM, Kooner JS, Mooser V: Population-based genome-wide association studies reveal six loci influencing plasma levels of liver enzymes. Am J Hum Genet. 2008, 83: 520-528. 10.1016/j.ajhg.2008.09.012.PubMedPubMed CentralView ArticleGoogle Scholar
- Sabatti C, Service SK, Hartikainen AL, Pouta A, Ripatti S, Brodsky J, Jones CG, Zaitlen NA, Varilo T, Kaakinen M, Sovio U, Ruokonen A, Laitinen J, Jakkula E, Coin L, Hoggart C, Collins A, Turunen H, Gabriel S, Elliot P, McCarthy MI, Daly MJ, Jarvelin MR, Freimer NB, Peltonen L: Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet. 2009, 41: 35-46. 10.1038/ng.271.PubMedPubMed CentralView ArticleGoogle Scholar
- Aulchenko YS, Ripatti S, Lindqvist I, Boomsma D, Heid IM, Pramstaller PP, Penninx BW, Janssens AC, Wilson JF, Spector T, Martin NG, Pedersen NL, Kyvik KO, Kaprio J, Hofman A, Freimer NB, Jarvelin MR, Gyllensten U, Campbell H, Rudan I, Johansson A, Marroni F, Hayward C, Vitart V, Jonasson I, Pattaro C, Wright A, Hastie N, Pichler I, Hicks AA, et al: Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nat Genet. 2009, 41: 47-55. 10.1038/ng.269.PubMedPubMed CentralView ArticleGoogle Scholar
- Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, Schadt EE, Kaplan L, Bennett D, Li Y, Tanaka T, Voight BF, Bonnycastle LL, Jackson AU, Crawford G, Surti A, Guiducci C, Burtt NP, Parish S, Clarke R, Zelenika D, Kubalanza KA, Morken MA, Scott LJ, Stringham HM, Galan P, Swift AJ, Kuusisto J, Bergman RN, Sundvall J, Laakso M, et al: Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet. 2009, 41: 56-65. 10.1038/ng.291.PubMedPubMed CentralView ArticleGoogle Scholar
- Prokopenko I, Langenberg C, Florez JC, Saxena R, Soranzo N, Thorleifsson G, Loos RJ, Manning AK, Jackson AU, Aulchenko Y, Potter SC, Erdos MR, Sanna S, Hottenga JJ, Wheeler E, Kaakinen M, Lyssenko V, Chen WM, Ahmadi K, Beckmann JS, Bergman RN, Bochud M, Bonnycastle LL, Buchanan TA, Cao A, Cervino A, Coin L, Collins FS, Crisponi L, de Geus EJ, et al: Variants in MTNR1B influence fasting glucose levels. Nat Genet. 2009, 41: 77-81. 10.1038/ng.290.PubMedPubMed CentralView ArticleGoogle Scholar
- Lyssenko V, Nagorny CL, Erdos MR, Wierup N, Jonsson A, Spegel P, Bugliani M, Saxena R, Fex M, Pulizzi N, Isomaa B, Tuomi T, Nilsson P, Kuusisto J, Tuomilehto J, Boehnke M, Altshuler D, Sundler F, Eriksson JG, Jackson AU, Laakso M, Marchetti P, Watanabe RM, Mulder H, Groop L: Common variant in MTNR1B associated with increased risk of type 2 diabetes and impaired early insulin secretion. Nat Genet. 2009, 41: 82-88. 10.1038/ng.288.PubMedPubMed CentralView ArticleGoogle Scholar
- Meijsing SH, Pufall MA, So AY, Bates DL, Chen L, Yamamoto KR: DNA binding site sequence directs glucocorticoid receptor structure and activity. Science. 2009, 324: 407-410. 10.1126/science.1164265.PubMedPubMed CentralView ArticleGoogle Scholar
- Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K: High-resolution profiling of histone methylations in the human genome. Cell. 2007, 129: 823-837. 10.1016/j.cell.2007.05.009.PubMedView ArticleGoogle Scholar
- Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, Jaenisch R, Wagschal A, Feil R, Schreiber SL, Lander ES: A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006, 125: 315-326. 10.1016/j.cell.2006.02.041.PubMedView ArticleGoogle Scholar
- SOLiD™ Software Development Community. [http://solidsoftwaretools.com]
- Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S: Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007, 4: 651-657. 10.1038/nmeth1068.PubMedView ArticleGoogle Scholar
- Jothi R, Cuddapah S, Barski A, Cui K, Zhao K: Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 2008, 36: 5221-5231. 10.1093/nar/gkn488.PubMedPubMed CentralView ArticleGoogle Scholar
- Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, Myers RM, Sidow A: Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods. 2008, 5: 829-834. 10.1038/nmeth.1246.PubMedPubMed CentralView ArticleGoogle Scholar
- UCSC Genome Browser. [http://genome.ucsc.edu]
- FANTOM3. [http://fantom3.gsc.riken.jp]
- DAVID Bioinformatics Resources 2008. [http://david.abcc.ncifcrf.gov]
- Gene Expression Omnibus. [http://www.ncbi.nlm.nih.gov/geo]
- Dannenberg LO, Edenberg HJ: Epigenetics of gene expression in human hepatoma cells: expression profiling the response to inhibition of DNA methylation and histone deacetylation. BMC Genomics. 2006, 7: 181-10.1186/1471-2164-7-181.PubMedPubMed CentralView ArticleGoogle Scholar
- European Read Archive. [ftp://ftp.era.ebi.ac.uk/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.