Cistrome: an integrative platform for transcriptional regulation studies
- Tao Liu†1, 2,
- Jorge A Ortiz†3, 4,
- Len Taing1, 2,
- Clifford A Meyer1,
- Bernett Lee3, 5,
- Yong Zhang6,
- Hyunjin Shin1, 2,
- Swee S Wong3, 7,
- Jian Ma6,
- Ying Lei8,
- Utz J Pape1,
- Michael Poidinger3, 5,
- Yiwen Chen1,
- Kevin Yeung3, 9,
- Myles Brown2, 10Email author,
- Yaron Turpaz3, 11Email author and
- X Shirley Liu1, 2Email author
© Liu et al.; licensee BioMed Central Ltd. 2011
Received: 4 April 2011
Accepted: 22 August 2011
Published: 22 August 2011
The increasing volume of ChIP-chip and ChIP-seq data being generated creates a challenge for standard, integrative and reproducible bioinformatics data analysis platforms. We developed a web-based application called Cistrome, based on the Galaxy open source framework. In addition to the standard Galaxy functions, Cistrome has 29 ChIP-chip- and ChIP-seq-specific tools in three major categories, from preliminary peak calling and correlation analyses to downstream genome feature association, gene expression analyses, and motif discovery. Cistrome is available at http://cistrome.org/ap/.
The term 'cistrome' refers to the set of cis-acting targets of a trans-acting factor on a genome-wide scale, also known as the in vivo genome-wide location of transcription factors or histone modifications. Cistromes were initially identified using chromatin immunoprecipitation (ChIP) combined with microarrays (ChIP-chip) . However, with the recent advent of next generation sequencing (NGS) technologies, ChIP combined with NGS (ChIP-seq)  has become the more popular technique due to its higher sensitivity and resolution.
Computational analyses of cistrome data have become increasingly complex and integrative. Investigators often examine the data from many different angles by combining cistrome, epigenome, genomic sequence, and transcriptome analyses. Many algorithms and tools have been published over the years to facilitate such analyses. However, these tools require investigators to have both the hardware resources and computational expertise to install, configure, and run these different algorithms effectively. Integrated platforms such as CisGenome  and seqMINER  have been developed to streamline data analyses; however, the maintenance of these platforms demands suitable hardware resources and computational skills. In addition, these tools lack useful features such as the integration of cistrome data with gene expression analysis, data sharing between researchers, and reusable analysis workflows.
Before interpreting the biological results from ChIP-chip or ChIP-seq data using the Cistrome platform, researchers can upload raw data from their microarray or sequencing facilities and then preprocess those data using Cistrome peak-calling tools. Alternatively, researchers can also upload intermediate results from their own analysis tools. As illustrated in Figure 1, the peak calling step generates two types of intermediate files: peak location files (in BED format), indicating the predicted transcription factor binding sites or histone modification sites, and signal profile files (in WIGGLE format) of binding or histone modification across the genome.
Several methods can be used to import data into Cistrome. The 'Upload File' function can import a file from the user's computer or from an HTTP or FTP file server in the same manner as in Galaxy. In most cases, sequencing facilities will manage the low level base calling and read mapping processes. The least processed Cistrome data formats that we allow are the SAM/BAM  or BED formats for ChIP-seq sequencing mapping results, CEL files for ChIP-chip using Affymetrix tiling arrays, or PAIR files from NimbleGen custom arrays. Researchers may have already used other algorithms to generate intermediate results, such as BED format files for regions of interest on the genome or WIGGLE format files for signal information. In such cases, users can also upload intermediate result files onto Cistrome and apply our downstream tools while being mindful of the acceptable formats (Table S1 in Additional file 1). In addition, we implemented two new data types for expression microarray data sets from Affymetrix and NimbleGen technologies. Raw expression microarray data and a text file describing the phenotype information (for example, before and after transcription factor activation) should be packaged in a zip file before being uploaded through the general upload tool.
Cistrome contains peak-calling tools for both ChIP-chip and ChIP-seq data. We deployed the MAT tool  for Affymetrix promoter or tiling arrays and have supported nine different array designs from Caenorhabditis elegans to human. Affymetrix CEL files are required as input. For NimbleGen two-color arrays, MA2C  was deployed. Because researchers usually have their own customized NimbleGen two-color array designs, array design (.ndf) and position (.pos) files and raw probe raw signal files (.pair) should all be uploaded to run MA2C on the Cistrome website. Both MAT and MA2C are able to handle control data or replicates as input data and can generate a BED file for peak locations and WIGGLE file for normalized probe signals as the output. Cistrome provides the MACS (Model-based Analysis of ChIP-Seq)  tool for ChIP-seq data obtained from various short read sequencers (for example, Genome Analyzer and HiSeq 2000 from Illumina or SOLiD from Applied Biosystems). MACS can improve the accuracy of the predicted binding sites by modeling the length of the sequenced ChIP fragments and the local bias due to chromatin openness. MACS can run with or without controls and allows the widely used SAM/BAM format and another six mapping result formats (Table S1 in Additional file 1) as input. The outputs include peak regions and peak summits (the precise binding location estimated by the algorithm) in BED format and ChIP fragment pileup along the whole genome at every 10 bp in WIGGLE format. When the diagnosis option is turned on, MACS subsamples the data to determine the number of peaks that can be recovered from a subset, thus estimating the saturation status of the current sequencing depth. We deployed MACS version 1.4rc2 on Cistrome, which supports single-end or paired-end sequencing in BAM or SAM format.
With the rapid growth of ChIP-chip and ChIP-seq datasets in public repositories, it has become increasingly important to be able to integrate information from cross-platform and between-laboratory ChIP-chip or ChIP-seq datasets. We recently developed the powerful meta-analysis tool MM-ChIP (Model-based Meta-analysis of ChIP data)  and deployed it under the peak-caller application category of Cistrome. The MM-ChIP tool includes two separate functions: MMChIP-chip performs ChIP-chip meta-analysis based on WIGGLE files from the MA2C and MAT tools, and MMChIP-seq uses NGS alignments in BED format as input to combine different ChIP-seq libraries of the same factor under the same conditions. The resulting peak locations (in BED files) and signal profiles (in WIGGLE files) can be visualized as a custom track on the UCSC genome browser and used as input for other downstream analysis tools that will be discussed later. In addition to these specific peak callers for different platforms or purposes, there is a general peak caller in Cistrome that can take any whole genome signal profile in WIGGLE format, normalize the signals, and then attempt to find the significant regions by comparing to a null distribution built from background data.
Expression microarray analysis tools
The Cistrome Expression pipeline uses R and Bioconductor  packages to perform basic gene expression analyses. The data analysis starts with the processing of a set of signal intensity files for Affymetrix expression arrays (.cel) or NimbleGen arrays (.xys). Datasets may also include a phenotype (.txt) file that describes and groups the set of expression files. The next step in the pipeline calculates the expression index of this dataset using one of four possible methods: robust multichip average (RMA) , justRMA, gcRMA and MAS5. The result is a normalized expression set (.eset) that can be represented as refSeq, Entrez, or ProbeSet IDs in plain text format. When mapping the ProbeSet IDs to refSeq or Entrez IDs, the custom CDF files from BRAINARRAY  are used. The genes that are differentially expressed between conditions (for example, before and after a transcription factor is knocked down) are often used to explore the function of the transcription factor together with cistrome data. When a normalized expression set is used as input, Cistrome can identify differentially expressed genes using any of the following methods: limma moderated t-test, ordinary least-squares, and permutation by re-sampling. Correction for false positive (type I) errors may be performed using either the Bonferroni correction or Benjamini-Hochberg false discovery rate (FDR) methods. The output from this tool is a list of differentially expressed genes, log2-transformed fold changes and FDR-corrected P-values of differential expression. The differential expression result can be processed into gene lists, such as up-regulated or down-regulated genes, using one of the public workflows as described in Table S2 in Additional file 1. The gene lists can be further incorporated with other Cistrome tools.
Several downstream analysis modules are also available. A transcription factor tool allows the user to find the transcription factors with the highest level of expression. The selection is done based on an expression index cutoff value, and further filtering can be performed to restrict the resulting list to the Gene Ontology (GO) terms for transcription regulation activities. A correlation tool allows the user to detect all genes for which their expressions correlate with another given gene. This correlation result can also be filtered by applying the GO terms. The GO enrichment tool helps researchers explore the functions for a list of genes, such as the up-regulated genes after a transcription factor knockdown or the genes with transcription factor bound in promoter regions. Enrichment can be compared to the background of all genes or a subset of genes on the array. This tool uses Bioconductor GO and GOstats  packages together with a query to the DAVID (Database for Annotation, Visualization and Integrated Discovery) web server . The visualization tool in this category allows users to visualize and compare the expression index distributions of multiple lists of genes (for example, genes with proximate transcription factor binding compared with all genes) using box plots or histograms.
Downstream analyses for a cistrome study require specific or integrative tools. The value of Cistrome is that it enables biologists to use a broad range of bioinformatics tools to easily generate report-quality figures and tables, and to simplify routine analysis using reproducible pipelines. In Cistrome, we provide tools for correlation studies, genome feature association studies and motif analysis together with public workflows to link these tools together.
Functional DNA regions in genomes are often evolutionarily conserved between different species [17–19]. Therefore, evolutionary conservation of ChIP-chip/seq peaks compared with flanking non-peak regions is often a good indicator of good data quality and correct data preprocessing. In Cistrome, the 'Conservation Plot' tool can take one or more cistromes in BED files as input, and use UCSC PhastCons conservation scores  to produce a figure showing the average conservation score profiles around the peak centers (Figure 2d). This analysis could be extended to compare the conservation differences between multiple cistromes.
Another useful task is to find the genomic features or genes associated with transcription factor binding or histone modification sites. For instance, H3K4me3 is enriched in the promoter regions of active genes , and H3K36me3 is enriched in transcribed exons . Finding the target genes is critical to understanding the function of transcription factors, such as transcription repression or activation. Therefore, a set of tools from the CEAS (Cis-regulatory Element Annotation System)  package, including SitePro, GCA (Gene Centered Annotation), Peak2Gene and the CEAS main program, has been deployed in the Cistrome web interface. SitePro can draw the average signal profiles around given genomic locations. When multiple locations or sets of signal files are used as input, SitePro can address questions such as how the signals of multiple factors change at the same locations between different conditions or how the same factor changes in different sets of genomic locations. The GCA tool can find the peaks that are closest to the transcription start site (TSS) of each gene and calculate the coverage of the peaks of the gene body in a spreadsheet. The Peak2Gene tool can find the nearest genes for each peak. The CEAS main program generates multi-paged figures as either a PDF document or PNG image. In general, when a BED file for peaks and a WIGGLE file for signals are used as input, the resulting report includes the peak enrichment on chromosomes and various genomic features, such as gene promoters, downstream regions, UTRs, coding exons or introns, and the average signal profile around TSSs and transcription termination sites (TTSs), the meta-gene body (all genes are scaled to 3 kbps), concatenated exons (coding regions), or concatenated introns. When gene lists are provided (for example, a list of genes with the highest and lowest levels of expression for the same sample in a ChIP-chip or ChIP-seq experiment), CEAS will plot the average signal profiles for different gene groups in different colors for the TSS, TTS, gene bodies, exons, or introns (Figure 2c). This function can be coupled with gene expression tools described in the previous section to show whether the signals of the transcription factor or histone marks are related to transcription repression or activation.
Cistrome has many other useful tools to help users better manipulate their data. A lift over tool can convert WIGGLE files from one genome assembly to another if users want to combine old analysis results with a new genome annotation. However, ab initio re-preprocessing is recommended to generate new WIGGLE files for the new genome assembly. A WIGGLE file standardization tool can convert the resolution of a WIGGLE file to 8, 32, 64 or 128 bps. Two other tools can extract data for certain chromosome out of a BED file or a WIGGLE file. Furthermore, many Galaxy functions that we considered to be very useful for ChIP-chip/seq data analyses are also enabled in Cistrome. For example, the intersect tool for two interval files, and the filtering/sorting/cutting tool for tab-delimited text files are widely used in many of our precompiled public workflows to post-process intermediate results then feed them into downstream tools (Table S2 in Additional file 1).
Comparison to existing software
Cistrome was built upon the Galaxy framework to provide a user-friendly, reproducible and transparent workbench for cistrome researchers. Researchers can easily and intuitively reuse and share data, incorporate published data, and publish their results on the website. Compared with the more general Galaxy main site , the Cistrome system was specifically designed for downstream data analysis accompanied by ChIP-chip or ChIP-seq technologies and includes basic analyses from peak calling to motif detection. In the future, the Cistrome analysis platform module will be linked to our local Data Collection (DC) module where publicly available ChIP-chip and ChIP-seq data are downloaded and preprocessed.
Overview comparison of functionalities of Cistrome, CisGenome and SeqMINER
Yes. Affymetrix or NimbleGen platform
Yes. Affymetrix or other platform through conversions
Yes. No support for SAM/BAM
General peak calling
Yes. Through wiggle file for signals
No direct solution
Yes. Across different ChIP-chip platforms, or across different ChIP-seq libraries
From normalization, differential expression, to gene ontology
Yes. Affymetrix or NimbleGen platform
Genome association study
Yes. Chromosome or gene feature enrichment; aggregation plot; genes or peaks centered annotation; conservation plot; k-means clustering heatmap
Yes. Closest genes around peaks
Yes. K-means clustering at peak sites; interactive heatmap; aggregation plot
Correlation between samples
Yes. Whole genome or peak centered Pearson correlation; Venn diagram
Yes. Pearson correlation at enriched regions
Yes. Find enriched known or de novo motifs; map motifs to genomic locations
Yes. Find de novo motifs; map motifs to genomic locations
Liftover both BED/WIGGLE files; low level operations on text manipulation and format conversion through Galaxy
Many useful scripts for format conversions, to calculate overlaps and so on
Genome browser visualization
Redirect to mirrored UCSC genome browser on Cistrome, or external genome browsers supported by Galaxy
Local installed genome browser on Windows operating system
Conclusions and future directions
We have deployed a comprehensive ChIP-chip and ChIP-seq analysis platform called Cistrome by integrating publicly available research tools and newly developed algorithms from our group under the Galaxy framework. Cistrome covers most of ChIP-chip/seq analysis tasks, from data preprocessing, expression analysis, integrative analysis, reproducible pipeline, to data publishing; this integrated approach allows biologists to analyze and visualize their own ChIP-chip/seq data for publication. We plan to extend Cistrome in the following areas: first will be to support the increasing number of ChIP-seq datasets by building a Cistrome DC module; second, we plan to continue adding additional research tools and improve the existing features to provide more sophisticated integrative workflows, especially for epigenomics data. We will address these plans in detail in the following paragraphs.
Each ChIP-chip/seq platform has its own cistrome data analysis challenges. ChIP-chip platforms include tiling arrays from Affymetrix, NimbleGen and Agilent, and ChIP-seq platforms include NGS machines from Illumina, Applied Biosciences and Helicos. A typical human ChIP-seq experiment sequenced on one Illumina GAIIx lane generates approximately 20 GB of fastq data. With more researchers adopting ChIP-chip/seq methods and NGS technologies that are improving at rates beyond Moore's law , the production of cistrome data is increasing exponentially. Currently, databases such as the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO)  and the European Bioinformatics Institute (EBI) ArrayExpress  host array data, and databases such as the NCBI Sequence Reads Archive (SRA)  and the EBI SRA host sequencing data . However, experimental biologists often cannot understand or reuse these deposited data in their raw form. Although some processed datasets have been submitted to these databases, they are difficult to compare and integrate due to diverse data generation platforms and analysis algorithms. Therefore, parallel to the Cistrome data analysis module, we are designing another major component of Cistrome: the DC module. The Cistrome DC will be a manually curated data warehouse. The data stored in the DC module include both raw and preprocessed data - peak locations and signal profiles - that are ready to be imported into the current Cistrome analysis platform. We plan to develop a user-friendly interface to let users easily search and browse the datasets. We also plan to build a bridge from the current analysis module to the Cistrome DC so that users can choose to package their analyzed data and publish them in the Cistrome DC upon paper publication.
Concurrent with an increasing interest in epigenomics research, increasing amounts of histone modification ChIP-seq, nucleosome-seq, and DNase-seq data are becoming available to the public. We plan to add another specific peak caller, Nucleosome Positioning from Sequencing (NPS), to Cistrome to target histone modification data . When ChIP-seq data are used at the nucleosome resolution (that is, where experimentalists use micrococcal nuclease to digest DNA) NPS can provide better data interpretation than the general ChIP-seq peak caller MACS. NPS can give the well-positioned nucleosomes as output and further detect the dynamic chromatin regions with moving nucleosome or DNase sites between conditions. Our newly developed algorithms, called Binding Inference from Nucleosome Occupancy Changes (BINOCh) , can follow up with motif analysis in the dynamic regions to better understand the transcription factor binding changes.
Many new features and tools for cistrome analysis are included in our future plans. Basic file manipulation tools - for example, the BedTools  suite - will be added to Cistrome in the future. The goal is to provide more flexible workflows for different demands. Because the WIGGLE format used to save whole genome signal profiles is too big to maintain and manipulate, we plan to switch to a more space-efficient self-indexed binary format: the BigWig . We also plan to support preprocessed RNA-seq data (for example, in RPKM (reads per kilobase of exon model per million mapped reads) form) in our expression analysis module. Galaxy has included Cufflinks tools in main codes, and we will provide functions that are similar to those of the current expression tools such as DESeq  or edgeR  and incorporate them into other integrative analysis tools. For example, by combining expression profiles and transcription factor motif enrichment, we could predict the correct transcription factors that collaborate with the ChIPed factor.
Because Cistrome was built on Galaxy, we will continue updating the Galaxy framework codes for new features, such as Galaxy Pages for the reproducible and interactive supplementary material or Galaxy Visualization to show data tracks in a genome browser view. We also plan to follow in the steps of Galaxy and provide a cloud computing solution for future scalability. We welcome feedback from users regarding new features and better representations to make Cistrome a better resource for the community.
transcription start site
transcription termination site.
Cistrome was developed by the Cistrome team at both the Dana-Farber Cancer Institute and Eli Lilly and Company. We thank Lingling Shen, Wenbo Wang, Jacqueline Wentz, Josiah Altschuler and Kar Joon Chew for their contributions to the system implementation. We also thank the many collaborators who gave us suggestions and feedback. This work is supported by the Dana-Farber Cancer Institute High Tech and Campaign Technology Fund (XSL), the National Basic Research Program of China grant 973 Program No. 2010CB944904 (YZ), NIH grants HG004069-04S1 (LT), DK074967 (MB) and DK062434 (TL).
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA: Genome-wide location and function of DNA binding proteins. Science. 2000, 290: 2306-2309. 10.1126/science.290.5500.2306.PubMedView ArticleGoogle Scholar
- Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007, 316: 1497-1502. 10.1126/science.1141319.PubMedView ArticleGoogle Scholar
- Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH: An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol. 2008, 26: 1293-1300. 10.1038/nbt.1505.PubMedPubMed CentralView ArticleGoogle Scholar
- Ye T, Krebs AR, Choukrallah MA, Keime C, Plewniak F, Davidson I, Tora L: seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res. 2010, 39: e35-PubMedPubMed CentralView ArticleGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11: R86-10.1186/gb-2010-11-8-r86.PubMedPubMed CentralView ArticleGoogle Scholar
- Cistrome projects on bitbucket. https://bitbucket.org/cistrome/cistrome-harvard/, https://bitbucket.org/cistrome/cistrome-applications-harvard
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMedPubMed CentralView ArticleGoogle Scholar
- Johnson WE, Li W, Meyer CA, Gottardo R, Carroll JS, Brown M, Liu XS: Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci USA. 2006, 103: 12457-12462. 10.1073/pnas.0601180103.PubMedPubMed CentralView ArticleGoogle Scholar
- Song JS, Johnson WE, Zhu X, Zhang X, Li W, Manrai AK, Liu JS, Chen R, Liu XS: Model-based analysis of two-color arrays (MA2C). Genome Biol. 2007, 8: R178-10.1186/gb-2007-8-8-r178.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS: Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008, 9: R137-10.1186/gb-2008-9-9-r137.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen Y, Meyer CA, Liu T, Li W, Liu JS, Liu XS: MM-ChIP enables integrative analysis of cross-platform and between-laboratory ChIP-chip or ChIP-seq data. Genome Biol. 2011, 12: R11-10.1186/gb-2011-12-2-r11.PubMedPubMed CentralView ArticleGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMedPubMed CentralView ArticleGoogle Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31: e15-10.1093/nar/gng015.PubMedPubMed CentralView ArticleGoogle Scholar
- BRAINARRAY. [http://brainarray.mbni.med.umich.edu/]
- Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics. 2007, 23: 257-258. 10.1093/bioinformatics/btl567.PubMedView ArticleGoogle Scholar
- Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003, 4: P3-10.1186/gb-2003-4-5-p3.PubMedView ArticleGoogle Scholar
- Liu Y, Liu XS, Wei L, Altman RB, Batzoglou S: Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res. 2004, 14: 451-458. 10.1101/gr.1327604.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang T, Stormo GD: Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc Natl Acad Sci USA. 2005, 102: 17400-17405. 10.1073/pnas.0505147102.PubMedPubMed CentralView ArticleGoogle Scholar
- Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE: Human-mouse genome comparisons to locate regulatory sites. Nat Genet. 2000, 26: 225-228. 10.1038/79965.PubMedView ArticleGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15: 1034-1050. 10.1101/gr.3715005.PubMedPubMed CentralView ArticleGoogle Scholar
- Bernstein BE, Kamal M, Lindblad-Toh K, Bekiranov S, Bailey DK, Huebert DJ, McMahon S, Karlsson EK, Kulbokas EJ, Gingeras TR, Schreiber SL, Lander ES: Genomic maps and comparative analysis of histone modifications in human and mouse. Cell. 2005, 120: 169-181. 10.1016/j.cell.2005.01.001.PubMedView ArticleGoogle Scholar
- Kolasinska-Zwierz P, Down T, Latorre I, Liu T, Liu XS, Ahringer J: Differential chromatin marking of introns and expressed exons by H3K36me3. Nat Genet. 2009, 41: 376-381. 10.1038/ng.322.PubMedPubMed CentralView ArticleGoogle Scholar
- Shin H, Liu T, Manrai AK, Liu XS: CEAS: cis-regulatory element annotation system. Bioinformatics. 2009, 25: 2605-2606. 10.1093/bioinformatics/btp479.PubMedView ArticleGoogle Scholar
- He HH, Meyer CA, Shin H, Bailey ST, Wei G, Wang Q, Zhang Y, Xu K, Ni M, Lupien M, Mieczkowski P, Lieb JD, Zhao K, Brown M, Liu XS: Nucleosome dynamics define transcriptional enhancers. Nat Genet. 2010, 42: 343-347. 10.1038/ng.545.PubMedPubMed CentralView ArticleGoogle Scholar
- Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2009, 38: D105-110.PubMedPubMed CentralView ArticleGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, 34: D108-110. 10.1093/nar/gkj143.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhu C, Byers KJ, McCord RP, Shi Z, Berger MF, Newburger DE, Saulrieta K, Smith Z, Shah MV, Radhakrishnan M, Philippakis AA, Hu Y, De Masi F, Pacek M, Rolfs A, Murthy T, Labaer J, Bulyk ML: High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 2009, 19: 556-566. 10.1101/gr.090233.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Clontech. [http://www.clontech.com]
- Xie Z, Hu S, Blackshaw S, Zhu H, Qian J: hPDI: a database of experimental human protein-DNA interactions. Bioinformatics. 2009, 26: 287-289.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu XS, Brutlag DL, Liu JS: An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol. 2002, 20: 835-839.PubMedView ArticleGoogle Scholar
- Galaxy. [http://main.g2.bx.psu.edu/]
- Stein LD: The case for cloud computing in genome informatics. Genome Biol. 2010, 11: 207-10.1186/gb-2010-11-5-207.PubMedPubMed CentralView ArticleGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Edgar R: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009, 37: D885-890. 10.1093/nar/gkn764.PubMedPubMed CentralView ArticleGoogle Scholar
- Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone SA, et al: ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009, 37: D868-872. 10.1093/nar/gkn889.PubMedPubMed CentralView ArticleGoogle Scholar
- Leinonen R, Sugawara H, Shumway M: The sequence read archive. Nucleic Acids Res. 2010, 39: D19-21.PubMedPubMed CentralView ArticleGoogle Scholar
- Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tárraga A, Cheng Y, Cleland I, Faruque N, Goodgame N, Gibson R, Hoad G, Jang M, Pakseresht N, Plaister S, Radhakrishnan R, Reddy K, Sobhany S, Ten Hoopen P, Vaughan R, Zalunin V, Cochrane G: The European Nucleotide Archive. Nucleic Acids Res. 2010, 39: D28-31.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Y, Shin H, Song JS, Lei Y, Liu XS: Identifying positioned nucleosomes with epigenetic marks in human from ChIP-Seq. BMC Genomics. 2008, 9: 537-10.1186/1471-2164-9-537.PubMedPubMed CentralView ArticleGoogle Scholar
- Meyer CA, He HH, Brown M, Liu XS: BINOCh: binding inference from nucleosome occupancy changes. Bioinformatics. 2011, 27: 1867-1868. 10.1093/bioinformatics/btr279.PubMedPubMed CentralView ArticleGoogle Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D: BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010, 26: 2204-2207. 10.1093/bioinformatics/btq351.PubMedPubMed CentralView ArticleGoogle Scholar
- Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol. 2010, 11: R106-10.1186/gb-2010-11-10-r106.PubMedPubMed CentralView ArticleGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139-140. 10.1093/bioinformatics/btp616.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu T, Rechtsteiner A, Egelhofer TA, Vielle A, Latorre I, Cheung MS, Ercan S, Ikegami K, Jensen M, Kolasinska-Zwierz P, Rosenbaum H, Shin H, Taing S, Takasaki T, Iniguez AL, Desai A, Dernburg AF, Kimura H, Lieb JD, Ahringer J, Strome S, Liu XS: Broad chromosomal domains of histone modification patterns in C. elegans. Genome Res. 2011, 21: 227-236. 10.1101/gr.115519.110.PubMedPubMed CentralView ArticleGoogle Scholar
- Cistrome. [http://cistrome.org/ap/u/cistrome/p/demonstration]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.