dCLIP: a computational approach for comparative CLIP-seq analyses
© Wang et al.; licensee BioMed Central Ltd. 2014
Received: 24 August 2013
Accepted: 7 January 2014
Published: 7 January 2014
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/.
Eukaryotic genomes encode large numbers of RNA-binding proteins (RBPs), each of which has unique associating properties with RNAs and impacts the structure, localization, generation and function of both coding and non-coding RNAs [1, 2]. Comparison of RNA-RBP interaction profiles across different conditions becomes increasingly important to understanding the function of RBPs and RNA regulation processes [3, 4]. The advent of the crosslinking immunoprecipitation (CLIP) coupled with high-throughput sequencing (CLIP-seq) technique enables the investigation of RNA-RBP interactions at the genome level [5–7]. There are three versions of CLIP-seq experiments, high-throughput sequencing together with UV-crosslinking and immunoprecipitation (HITS-CLIP), photoactivatable-ribonucleoside-enhanced CLIP (PAR-CLIP) and individual-nucleotide resolution CLIP (iCLIP) [5–7], of which HITS-CLIP and PAR-CLIP are most commonly used. These two methods differ mainly by the crosslinking strategy being used. HITS-CLIP treats cells with UV light to crosslink proteins with RNAs and will introduce certain types of mutations in some of the CLIPed tags at crosslinking sites. For example, the mutations are specifically deletions if the crosslinked RBP is Argonaute (AGO) . PAR-CLIP treats cells with photoreactive ribonucleotide analogs for incorporation into RNAs before UV treatment, which results in specific T → C or G → A substitutions depending on the type of nucleoside analog used . One disadvantage of HITS-CLIP and PAR-CLIP is that reverse transcription must pass over the residual amino acids on the crosslink sites of RNAs. iCLIP overcomes this problem by employing a self-circularization strategy . Also random barcodes are introduced to discriminate between PCR duplicates and unique cDNA products.
Although a few bioinformatics tools like PARalyzer, CLIPZ, wavClusteR and miRTarCLIP [10–13] have been developed to analyze a single CLIP-seq dataset, the quantitative comparison of multiple CLIP-seq datasets has only recently gained interest in the field [4, 14, 15]. Piranha  has been developed for CLIP-seq and Ribonucleoprotein immunoprecipitation followed by high-throughput sequencing (RIP-seq)  data analysis, and also provides a procedure for comparative analysis. However, the comparative analysis procedure in Piranha is relatively ad hoc, and does not utilize the spatial dependency among neighboring genomic locations, which is an important characteristic in creating differential binding profiles. A straightforward way to compare RNA-RBP interaction profiles across conditions is to analyze individual CLIP-seq data separately to identify the peaks (or binding sites) for each condition and then use coordinate overlapping or similar approaches to obtain common and differential binding sites. However, this ad hoc approach compares the results qualitatively but not quantitatively. For example, if a region is bound by an RBP under two conditions (for example, wild type versus knockout) with both significant enrichment but different binding intensities, the ad hoc approach will not be able to detect this region as a differential binding site. In addition, this ad hoc approach is over-sensitive to the cutoffs used for analyzing individual data, and has been shown to underestimate the similarity of two samples when applied to the analysis of multiple chromatin immunoprecipitation (ChIP)-seq experiments [18, 19]. Therefore, a computational approach that can compare different CLIP-seq datasets simultaneously and quantitatively is needed.
The main challenge to quantitatively comparing genome-level sequencing profiles across conditions is that next-generation sequencing data usually contains relatively low signal-to-noise ratios [20, 21]. Differences in background levels further complicate the analysis. To address these problems, several computational approaches have been developed for comparative ChIP-seq analysis, including ChIPDiff , ChIPnorm , MAnorm  and dPCA . These computational approaches have greatly facilitated the understanding of dynamic changes of protein-DNA interactions across conditions. However, these computational approaches cannot be directly applied to CLIP-seq data to identify differential RNA-protein interactions, due to some inherent differences between ChIP-seq and CLIP-seq data. First, CLIP-seq data are strand-specific, while the tools designed for ChIP-seq experiments do not consider strands of peaks. Second, CLIP-seq experiments usually induce additional characteristic mutations in high-throughput sequencing reads, but the mutation information in the raw sequencing data is simply discarded in the bioinformatics software designed for ChIP-seq data analysis. Third, CLIP-seq reads are usually short, and the reads are not shifted or extended when counting tag intensities, but shifting or extension of reads is a necessary step in ChIP-seq analysis . Fourth, CLIP-seq requires a much higher resolution (close to single nucleotide) in detection of RBP-binding sites, but ChIP-seq software usually work on a much lower level of resolution. For example, ChIPDiff is limited to 1 kb and ChIPnorm typically to a resolution of a few hundred base pairs. In addition, the method proposed by Bardet et al. is not bundled as a portable software and takes about two days to finish. Therefore, we have developed the dCLIP software for detecting differential binding regions in comparing two CLIP-seq experiments.
dCLIP is a two-stage computational approach for comparative CLIP-seq analysis. As the first stage, a modified MA-plot approach was designed specifically to normalize CLIP-seq data across datasets to obtain high resolution results. As the second stage, a hidden Markov model (HMM) was developed to detect common or different RBP-binding regions across conditions. The HMM has a great advantage in modeling the dependency among adjacent genomic locations, which leads to improved performance in identifying differential binding sites. Here, we show that dCLIP can accurately identify RBP differential binding sites through the comparative analysis of four differential CLIP-seq datasets, including HITS-CLIP, PAR-CLIP and iCLIP experiments. In addition, we compared the performance of dCLIP and Piranha . Our analysis shows that dCLIP can identify more biologically meaningful differential binding sites than Piranha.
Overview of the software
iCLIP dataset preprocessing mainly follows that of Konig et al., with minor modifications. Sequencing reads with the same random barcode represent PCR duplicates. Duplicates are removed and barcodes trimmed from the unique tags before mapping to the reference genome. A helper script, remove_barcode.pl, is provided in the dCLIP software to help users remove barcodes from Fastq sequencing files. After mapping, the first nucleotide upstream of each mapped cDNA, defined as the crosslink nucleotide, is expanded by a few nucleotides (specified by the users) in both downstream and upstream directions from its location, namely adding one to the tag counts on all bases in this short window. Therefore, the total tag count on each base is calculated as the sum of expanded cDNA counts covering that base and the mutant tag count will always be zero. Similarly, cDNA counts in both experimental conditions are summarized on the bin-level in regions of non-zero coverage.
A small number c is added to each count value to avoid logarithm of zero count. We assumed that both conditions share a large number of common binding regions with similar binding strength. Therefore, a linear regression line M = a + b × A is fitted to bins whose and values are both larger than a user-defined cutoff. Because common binding sites should have similar binding strengths, the parameters derived from the regression model should capture the true scaling relationship between the two samples. This scaling relationship is extrapolated to the whole dataset, by subtracting a fitted M value from the linear regression model from the raw M value of every bin in all clusters. The adjusted M value is used in the following data analysis.
Hidden Markov model
The HMM is a statistical Markov model in which the system being modeled is assumed to have spatial dependency between neighboring data units. RBP-RNA interactions involve a short stretch of RNA that can span up to a few bins . This ensures the strong auto-correlation of tag counts in neighboring bins, which can be modeled by HMM. Therefore, we applied HMM to identify common and differential binding regions from the adjusted M values. As these adjusted M values come from many individual CLIP clusters, the HMM model has multiple observation sequences. During the statistical inference, all observation sequences share the same transition matrix and the same emission function.
Accordingly, the transition matrix Π is a 3 × 3 matrix, whose element πr,s is the transition probability Given state , the adjusted M values are fitted by a three-component normal mixture model. Because the common peaks that are determined by similar mechanisms in both conditions are normalized towards the same binding strength, the middle normal component is assigned a mean of zero. To avoid unreasonable assignment of bins to hidden states when the adjusted M values are extremely large or small, the three normal components are all assumed to have the same variance. Also, to simplify the problem, the means of first and third normal components are assumed to have the same absolute value but different signs.
Since we assume that most sites would not show changes in their binding between conditions, the second component should dominate the mixture distribution. The first and third components can be treated as outliers if we solely focus on the second component. We then apply a median absolute deviation method  to robustly estimate the standard deviation to estimate σ, by equating .
So, using grid approximation, we obtain a pair of and that maximize the likelihood function and also maintain the constraint at the same time.
The emission probabilities are calculated from the fitted model and fixed for each bin in different states before the iterations of HMM start. To find the chain of most likely hidden states, given the observations and the model, a Viterbi dynamic-programming algorithm is employed to infer the hidden state .
Finally, adjacent bins inferred to be in the same state are concatenated into continuous regions. A BED file is then generated to be uploaded to the University of California Santa Cruz (UCSC) Genome Browser, each entry of which is one continuous region in the same state. In addition, a TXT file is generated that describes the inference results of each bin in more detail. Eight bedGraph files are generated that store the total or mutant tag counts for both conditions and both strands. These files can also be directly uploaded to the UCSC Genome Browser for visualization. Examples of output files from the dCLIP pipeline are provided in Additional file 3.
The dCLIP software was implemented in the Perl programming language. Perl (versions above 5.16) together with two Perl modules PDL and PDL::Stats are needed to run the program. The implementation is supported on all major operating platforms.
The dCLIP software inputs SAM format alignment files of the two conditions to be compared. The SAM format files can be in single-end mode or paired-end mode. The users can specify parameters such as bin size, minimal number of tags in a cluster, the number of nucleotides to expand for cDNA counts (iCLIP), the type of characteristic mutations to be profiled and the stop conditions for the HMM.
miR-155/AGO HITS-CLIP dataset
We used dCLIP to analyze the miR-155/AGO HITS-CLIP dataset from Loeb et al., where the authors were interested in revealing miR-155-dependent AGO protein-binding sites. During microRNA (miRNA) biogenesis, double miRNAs are incorporated into the RNA-induced silencing complex  after being processed by Dicer. The miRNA/miRNA* duplex is then separated within the AGO protein and only one strand (the ‘guide strand’) will be retained before binding to mRNA targets. As a result, AGO protein, as one of the key catalytic components of the RNA-induced silencing complex, serves as a scaffold for miRNA and mRNA interaction. In this study , miR-155 knockout mice were generated, and CD4+ T cells were extracted from both the wild-type mice and the miR-155-knockout mice for performing HITS-CLIP experiments. Therefore, the differential AGO protein-binding sites should provide important cues for miR-155 targeting events.
To show that dCLIP can also handle PAR-CLIP datasets, we applied the dCLIP software to a PAR-CLIP dataset where the RBP under investigation is fragile X mental retardation protein (FMRP) . The FMR1 RBP family comprises three members, FMR1, FXR1 and FXR2. FMR1 encodes for many isoforms, of which isoform 7 is predominantly expressed . The authors identified two major binding motifs of FMR1, ACTT/ACTG and AGGA/TGGA. The authors generated a recombinant FMR1 isoform 7 protein with a point mutation I304N in the KH2 domain. Through electromobility shift assays and PAR-CLIP experiments conducted with the wild-type and I304N proteins, the authors found the KH2 domain to be specific for binding to the ACTT/ACTG motif. Therefore, diminished binding to the ACTT/ACTG motif, rather than the AGGA/TGGA motif, should be the primary effect of the point mutation.
We further calculated the number of T → C mutations that occur on top of all ACTT, ACTG and TGGA motifs found within those 9,859 binding sites in both the wild-type and I304N condition (Figure 8b). The T → C mutation counts on the T bases in these motifs were divided by the total T → C counts in a 30 bp window as the background distribution. Because the AGGA motif does not have a T base, there were no T → C mutations on top of this motif and this motif was thus not included in this analysis. The normalized number of T → C mutations in the I304N condition was smaller than the number of T → C mutations in the wild-type condition for the ACTT/ACTG motif as well as the TGGA motif, consistent with these sites having weaker binding in the I304N condition. The extent by which the relative T → C mutation counts decreased in the I304N condition was much more significant for the ACTT/ACTG motif (P <2e-16 for ACTT, P = 1.9e-12 for ACTG) than the TGGA motif (P = 1.4e-5). This was expected because the I304N point mutation locates in the KH2 domain responsible for binding to the ACTT/ACTG motif. Because the ACTT/ACTG and TGGA/AGGA motifs always occur in adjacent or nearby regions on the genomic sequence, a loss of binding affinity to the ACTT/ACTG motifs by the I304N mutation should lead to a secondary, weaker effect on the binding of the protein to neighboring TGGA/AGGA motifs. Overall, the analysis of this FMR1 PAR-CLIP dataset shows that dCLIP also performs well on PAR-CLIP datasets.
miR-124/AGO HITS-CLIP dataset
We also benchmarked the performance of dCLIP against Piranha , which provides a procedure for comparative CLIP-seq analysis. In the Piranha software, read intensities are first counted and binned. It also defines a set of properties that vary along with the count data. These one or more properties could be either count or other types of data. For example, one property could be the binned count data of the second condition, which enables Piranha to identify differentially regulated RBP binding sites in this scenario. The count of the second condition is used to scale the count of the first condition and the scaled count data are used to fit a model. For fitting the statistical model, a variety of options are provided, including Poisson Model, Negative Binomial Model, Zero Truncated Poisson Model and Zero Truncated Negative Binomial Model. Finally, bins with significant P-values are identified as differential binding sites.
We compared the performance of the dCLIP and Piranha software on the miR-124/AGO HITS-CLIP dataset produced from the original publication of Piranha . In this dataset, HEK293 cells were transfected with miR-124 to identify its targets by comparison against non-transfected cells. Because miR-124 is not endogenously expressed, the AGO binding sites that are enriched in the transfected condition compared with the non-transfected condition should mostly mark miR-124 binding sites. We downloaded the raw sequencing data from SRA056343, trimmed adapters and then aligned the Fastq files to the hg19 genome using Bowtie  and Novoalign. Then dCLIP and Piranha (using the Poisson model) were used to identify the differential AGO binding sites enriched in the transfected condition. For both tools, a bin size of 5 bp was used.
TDP-43 iCLIP dataset
dCLIP is also able to analyze iCLIP datasets. The major difference of processing iCLIP datasets from HITS-CLIP and PAR-CLIP datasets is that cDNA counts, rather than total tag counts, are analyzed by the algorithm and no mutant tag counts are collected. We downloaded the TDP-43 iCLIP datasets from . The TDP-43 RBP protein is mainly localized to the nucleus, and is involved in transcription, alternative splicing and the development of many diseases . Aggregation of misfolded TDP-43 has been implicated in the neurodegenerative diseases frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis . In this study, the authors conducted iCLIP experiments with human postmortem cortical tissue from three healthy individuals and three patients who had sporadic FTLD with TDP-43 inclusions. The sequencing data from both sets of participants were pooled before mapping and we used an in-house program to remove PCR duplicates and trim the barcodes. We then mapped the sequencing tags to the hg19 reference genome and used dCLIP with a bin size of four nucleotides to analyze the alignment files. We also compared the performance of Piranha (PoissonRegression) with dCLIP on this iCLIP dataset.
The two-stage procedure implemented in dCLIP includes an MA normalization step and a HMM to identify differential and common binding sites. The MA normalization is a critical step to make the CLIP-seq data comparable across conditions. The straightforward rescaling by the total number of reads across samples is not appropriate for comparative CLIP-seq analysis because the signal-to-noise ratio usually varies across different conditions. The modified MA plot normalization method in dCLIP not only addresses the issue of different signal and noise levels effectively, but also works on much smaller units than those used for microarray and ChIP-seq data analysis, allowing dCLIP to detect binding sites of higher resolution required for CLIP-seq data analysis. To reduce potential bias and conduct rigorous comparison across different conditions, we recommend adopting the same experimental and bioinformatics procedures, such as RNase digestion, high-throughput sequencing and alignment, for both conditions.
The HMM plays a key role in identifying differential and common binding sites of two CLIP-seq samples in the dCLIP software. HMM can increase signal-to-noise ratios for sequencing data analysis, because it takes into account the correlation between consecutive bins. This is particularly important for CLIP-seq data, because of small bin size and high correlations between consecutive bins. The HMM in dCLIP defined a common binding state and two differential binding states. One thing to note for the three-state HMM is that the identified differential binding sites, for example the ones with inferred state of enriched and non-enriched, may actually only have a small tag enrichment in condition one, and an even smaller tag enrichment in condition two. Therefore, the differential binding sites need to be ranked and screened as such sites may not be of real interest to biologists. The analysis of the miR-155/AGO HITS-CLIP dataset, for example, set a cutoff of average tag intensity of 30 in the wild-type condition.
One assumption of the dCLIP algorithm is that most sites will not have changes in their binding between conditions. Our simulation studies (Additional file 4) show that dCLIP is able to handle comparative CLIP-seq analysis when there are more than 50% of common binding sites between two samples. We recommend users to roughly assess whether this assumption is valid or not for their experiments based on biological knowledge or preliminary bioinformatics analysis. In addition, dCLIP software will issue a warning if the estimated proportion of common binding sites with similar binding strength is less than 50%.
The dCLIP software was benchmarked against the Piranha software. Piranha incorporates covariates which could represent transcript abundance, count data in the second condition or positional mutation information. However, the covariate is incorporated in the statistical model in the exactly same way no matter which type of data it actually represents. This design enables Piranha to be easily applied to a wide variety of CLIP-seq data analysis scenarios. However, this one-for-all method also harms the detection power of RBP binding regions of interest in each specific scenario, as different data types have their unique properties and should be treated differently. The dCLIP method is specialized in comparing two CLIP-seq experiments and was shown to perform better than Piranha in identifying differential binding sites. Therefore dCLIP should be a better choice when the users are interested in identifying differential or common RBP-binding sites.
The pairwise approach to compare CLIP-seq data in dCLIP can be extended to multiple-sample comparison. When there are n samples, a transition matrix of 2 n states need to be implemented in the HMM. Theoretically, dCLIP can be easily modified to handle as many samples as possible. However, if n exceeds 10, the computation cost will increase dramatically. In addition, the normalization method also needs to be changed to suit the multiple-sample comparison. For example, the trimmed mean of M values method  or the upper-quartile normalization method  could be modified to handle the normalization step for multiple-sample comparisons. Currently, most CLIP-seq studies do not conduct transcript abundance measurements [29, 46] and, accordingly, most current CLIP-seq analysis tools, such as PARalyzer , do not consider transcript abundance either. However, taking background transcript abundance into account will be very helpful for more accurately defining RBP binding sites in either one-sample scenarios or multiple-sample scenarios. If the background expression data is available, that information can be relatively easily incorporated into dCLIP to further refine its performance.
We present a new computational approach, dCLIP, for the comparative analysis of CLIP-seq data. dCLIP was implemented as an easy-to-use command line tool in the Perl programming language. The dCLIP software is able to handle HITS-CLIP, PAR-CLIP and iCLIP datasets, and can take single-end or paired-end sequencing files as input. The dCLIP software is strand-sensitive and is able to detect differential binding sites at almost single-base resolution. It also correctly keeps all of the characteristic mutation information for later analysis. Real data analysis shows that dCLIP can accurately identify differential binding regions of RBPs and outperforms another CLIP analysis program, Piranha . We anticipate that the dCLIP software will become a helpful tool for biologists and bioinformaticians for comparative CLIP-seq data analysis.
crosslinking immunoprecipitation coupled with high-throughput sequencing
frontotemporal lobar degeneration
high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation
hidden Markov model
individual-nucleotide resolution crosslinking and immunoprecipitation
photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation
polymerase chain reaction
small nucleolar RNA
small nuclear RNA
University of California Santa Cruz
RNP immunoprecipitation followed by high-throughput sequencing.
This work was supported by the National Institutes of Health (1R01CA172211, 5R01CA152301, 4R33DA027592) and the Cancer Prevention Research Institute of Texas (RP101251).
- Keene JD: RNA regulons: coordination of post-transcriptional events. Nat Rev Genet. 2007, 8: 533-543. 10.1038/nrg2111.View ArticleGoogle Scholar
- Halbeisen RE, Galgano A, Scherrer T, Gerber AP: Post-transcriptional gene regulation: from genome-wide studies to principles. Cell Mol Life Sci. 2008, 65: 798-813. 10.1007/s00018-007-7447-6.View ArticleGoogle Scholar
- Leung AK, Young AG, Bhutkar A, Zheng GX, Bosson AD, Nielsen CB, Sharp PA: Genome-wide identification of Ago2 binding sites from mouse embryonic stem cells with and without mature microRNAs. Nat Struct Mol Biol. 2011, 18: 237-244. 10.1038/nsmb.1991.View ArticleGoogle Scholar
- Loeb GB, Khan AA, Canner D, Hiatt JB, Shendure J, Darnell RB, Leslie CS, Rudensky AY: Transcriptome-wide miR-155 binding map reveals widespread noncanonical microRNA targeting. Mol Cell. 2012, 48: 760-770. 10.1016/j.molcel.2012.10.002.View ArticleGoogle Scholar
- Darnell RB: HITS-CLIP: panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA. 2010, 1: 266-286. 10.1002/wrna.31.View ArticleGoogle Scholar
- Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T: Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010, 141: 129-141. 10.1016/j.cell.2010.03.009.View ArticleGoogle Scholar
- Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM, Ule J: iCLIP–transcriptome-wide mapping of protein-RNA interactions with individual nucleotide resolution. J Vis Exp. 2011, 50: e2638-Google Scholar
- Zhang C, Darnell RB: Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data. Nat Biotechnol. 2011, 29: 607-614. 10.1038/nbt.1873.View ArticleGoogle Scholar
- Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM, Ule J: iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol. 2010, 17: 909-915. 10.1038/nsmb.1838.View ArticleGoogle Scholar
- Khorshid M, Rodak C, Zavolan M: CLIPZ: a database and analysis environment for experimentally determined binding sites of RNA-binding proteins. Nucleic Acids Res. 2011, 39: D245-D252. 10.1093/nar/gkq940.View ArticleGoogle Scholar
- Chou CH, Lin FM, Chou MT, Hsu SD, Chang TH, Weng SL, Shrestha S, Hsiao CC, Hung JH, Huang HD: A computational approach for identifying microRNA-target interactions using high-throughput CLIP and PAR-CLIP sequencing. BMC Genomics. 2013, 14: S2-Google Scholar
- Sievers C, Schlumpf T, Sawarkar R, Comoglio F, Paro R: Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data. Nucleic Acids Res. 2012, 40: e160-10.1093/nar/gks697.View ArticleGoogle Scholar
- Corcoran DL, Georgiev S, Mukherjee N, Gottwein E, Skalsky RL, Keene JD, Ohler U: PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data. Genome Biol. 2011, 12: R79-10.1186/gb-2011-12-8-r79.View ArticleGoogle Scholar
- Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T, Thun MJ: Cancer statistics, 2008. CA Cancer J Clin. 2008, 58: 71-96. 10.3322/CA.2007.0010.View ArticleGoogle Scholar
- Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 2001, 98: 13790-13795. 10.1073/pnas.191502998.View ArticleGoogle Scholar
- Uren PJ, Bahrami-Samani E, Burns SC, Qiao M, Karginov FV, Hodges E, Hannon GJ, Sanford JR, Penalva LO, Smith AD: Site identification in high-throughput RNA-protein interaction data. Bioinformatics. 2012, 28: 3013-3020. 10.1093/bioinformatics/bts569.View ArticleGoogle Scholar
- Zhao J, Ohsumi TK, Kung JT, Ogawa Y, Grau DJ, Sarma K, Song JJ, Kingston RE, Borowsky M, Lee JT: Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol Cell. 2010, 40: 939-953. 10.1016/j.molcel.2010.12.011.View ArticleGoogle Scholar
- Bardet AF, He Q, Zeitlinger J, Stark A: A computational pipeline for comparative ChIP-seq analyses. Nat Protoc. 2012, 7: 45-61.View ArticleGoogle Scholar
- Wang X, Zang M, Xiao G: Epigenetic change detection and pattern recognition via Bayesian hierarchical hidden Markov models. Stat Med. 2012, 32: 2292-2307.View ArticleGoogle Scholar
- Soon WW, Hariharan M, Snyder MP: High-throughput sequencing for biology and medicine. Mol Syst Biol. 2013, 9: 640-View ArticleGoogle Scholar
- Hardcastle TJ: High-throughput sequencing of cytosine methylation in plant DNA. Plant Methods. 2013, 9: 16-10.1186/1746-4811-9-16.View ArticleGoogle Scholar
- Xu H, Sung WK: Identifying differential histone modification sites from ChIP-seq data. Methods Mol Biol. 2012, 802: 293-303. 10.1007/978-1-61779-400-1_19.View ArticleGoogle Scholar
- Nair NU, Sahu AD, Bucher P, Moret BM: ChIPnorm: a statistical method for normalizing and identifying differential regions in histone modification ChIP-seq libraries. PLoS One. 2012, 7: e39573-10.1371/journal.pone.0039573.View ArticleGoogle Scholar
- Shao Z, Zhang Y, Yuan GC, Orkin SH, Waxman DJ: MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets. Genome Biol. 2012, 13: R16-10.1186/gb-2012-13-3-r16.View ArticleGoogle Scholar
- Siegel R, Naishadham D, Jemal A: Cancer statistics, 2013. CA Cancer J Clin. 2013, 63: 11-30. 10.3322/caac.21166.View ArticleGoogle Scholar
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS: Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008, 9: R137-10.1186/gb-2008-9-9-r137.View ArticleGoogle Scholar
- dCLIP. [http://qbrc.swmed.edu/software/]
- Smyth GK, Speed T: Normalization of cDNA microarray data. Methods. 2003, 31: 265-273. 10.1016/S1046-2023(03)00155-5.View ArticleGoogle Scholar
- Chi SW, Zang JB, Mele A, Darnell RB: Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature. 2009, 460: 479-486.Google Scholar
- Rousseeuw PJ, Croux C: Alternatives to the median absolute deviation. J Am Stat Assoc. 1993, 88: 1273-1283. 10.1080/01621459.1993.10476408.View ArticleGoogle Scholar
- Ailliot P, Thompson C, Thomson P: Mixed methods for fitting the GEV distribution. Water Resour Res. 2011, 47: W05551Google Scholar
- Gregory RI, Shiekhattar R: MicroRNA biogenesis and cancer. Cancer Res. 2005, 65: 3509-3512. 10.1158/0008-5472.CAN-05-0298.View ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.View ArticleGoogle Scholar
- Friedman RC, Farh KK, Burge CB, Bartel DP: Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009, 19: 92-105.View ArticleGoogle Scholar
- Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol. 2005, 3: e85-10.1371/journal.pbio.0030085.View ArticleGoogle Scholar
- Djuranovic S, Nahvi A, Green R: miRNA-mediated gene silencing by translational repression followed by mRNA deadenylation and decay. Science. 2012, 336: 237-240. 10.1126/science.1215691.View ArticleGoogle Scholar
- Meijer HA, Kong YW, Lu WT, Wilczynska A, Spriggs RV, Robinson SW, Godfrey JD, Willis AE, Bushell M: Translational repression and eIF4A2 activity are critical for microRNA-mediated gene regulation. Science. 2013, 340: 82-85. 10.1126/science.1231197.View ArticleGoogle Scholar
- Bazzini AA, Lee MT, Giraldez AJ: Ribosome profiling shows that miR-430 reduces translation before causing mRNA decay in zebrafish. Science. 2012, 336: 233-237. 10.1126/science.1215704.View ArticleGoogle Scholar
- Ascano M, Mukherjee N, Bandaru P, Miller JB, Nusbaum JD, Corcoran DL, Langlois C, Munschauer M, Dewell S, Hafner M, Williams Z, Ohler U, Tuschl T: FMRP targets distinct mRNA sequence elements to regulate protein expression. Nature. 2012, 492: 382-386. 10.1038/nature11737.View ArticleGoogle Scholar
- Ashley CT, Sutcliffe JS, Kunst CB, Leiner HA, Eichler EE, Nelson DL, Warren ST: Human and murine FMR-1: alternative splicing and translational initiation downstream of the CGG-repeat. Nat Genet. 1993, 4: 244-251. 10.1038/ng0793-244.View ArticleGoogle Scholar
- Tollervey JR, Curk T, Rogelj B, Briese M, Cereda M, Kayikci M, Konig J, Hortobagyi T, Nishimura AL, Zupunski V, Patani R, Chandran S, Rot G, Zupan B, Shaw CE, Ule J: Characterizing the RNA targets and position-dependent splicing regulation by TDP-43. Nat Neurosci. 2011, 14: 452-458. 10.1038/nn.2778.View ArticleGoogle Scholar
- Buratti E, Baralle FE: Multiple roles of TDP-43 in gene expression, splicing regulation, and human disease. Front Biosci. 2008, 13: 867-878. 10.2741/2727.View ArticleGoogle Scholar
- Janssens J, Van Broeckhoven C: Pathological mechanisms underlying TDP-43 driven neurodegeneration in FTLD-ALS spectrum disorders. Hum Mol Genet. 2013, 22: R77-R87. 10.1093/hmg/ddt349.View ArticleGoogle Scholar
- Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010, 11: R25-10.1186/gb-2010-11-3-r25.View ArticleGoogle Scholar
- Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010, 11: 94-10.1186/1471-2105-11-94.View ArticleGoogle Scholar
- Neumann M, Bentmann E, Dormann D, Jawaid A, DeJesus-Hernandez M, Ansorge O, Roeber S, Kretzschmar HA, Munoz DG, Kusaka H, Yokota O, Ang LC, Bilbao J, Rademakers R, Haass C, Mackenzie IR: FET proteins TAF15 and EWS are selective markers that distinguish FTLD with FUS pathology from amyotrophic lateral sclerosis with FUS mutations. Brain. 2011, 134: 2595-2609. 10.1093/brain/awr201.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.