Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions
© Sugimoto et al.; licensee BioMed Central Ltd. 2012
Received: 13 March 2012
Accepted: 3 August 2012
Published: 3 August 2012
UV cross-linking and immunoprecipitation (CLIP) and individual-nucleotide resolution CLIP (iCLIP) are methods to study protein-RNA interactions in untreated cells and tissues. Here, we analyzed six published and two novel data sets to confirm that both methods identify protein-RNA cross-link sites, and to identify a slight uridine preference of UV-C-induced cross-linking. Comparing Nova CLIP and iCLIP data revealed that cDNA deletions have a preference for TTT motifs, whereas iCLIP cDNA truncations are more likely to identify clusters of YCAY motifs as the primary Nova binding sites. In conclusion, we demonstrate how each method impacts the analysis of protein-RNA binding specificity.
To understand post-transcriptional regulation, it is crucial to study protein-RNA interactions in the cellular environment. Irradiation with UV-C light creates a covalent bond between proteins and RNAs that are in direct contact in vivo without requiring pre-incubation of cells with photoreactive ribonucleoside analogs. Cross-linking and immunoprecipitation (CLIP) was therefore developed to identify RNA sites in direct contact with RNA-binding proteins (RBPs) . Especially in combination with high-throughput sequencing, CLIP (or HITS-CLIP) identified RNA targets of RBPs in a transcriptome-wide manner [2–5]. These studies showed that the precise position of protein binding sites on target RNAs is extremely important, since the effect of RBPs on the alternative splicing largely depends on their precise binding position. This was most clearly shown by genome-wide RNA maps of splicing regulation [6, 7].
To understand the precise position of protein-RNA cross-linking, several modifications of CLIP were developed. All of these approaches exploit the effect of cross-linked nucleotides during the reverse transcription reaction. One such approach, Photoactivatable Ribonucleoside-Enhanced CLIP (PAR-CLIP), uses photo-reactive nucleotides and UV-A light for the cross-linking reaction, which increases the incidence of point mutations at the cross-link sites . However, application of PAR-CLIP requires pre-incubation of cells with photoreactive ribonucleoside analogs, and therefore cannot be performed with untreated cells and tissues. The efficiency of nucleoside uptake, and the potential toxicity of these nucleosides , might vary between cell lines and tissues. Methods that identify cross-link sites without the need of photo-reactive nucleosides are therefore required.
As originally described by Granneman and colleagues , cross-link sites induced by UV-C light are associated with point mutations and deletions in CLIP cDNAs, which was supported by Kishore and colleagues . However, a study by Zhang and Darnell  compared the frequency and distribution of deletions and point mutations in CLIP and mRNA-Seq cDNAs, and found that CLIP cDNA deletions were a more reliable signature of cross-link sites compared to point mutations. The cDNA deletions in HITS-CLIP data were then used to identify cross-link sites of Neuro-oncological ventral antigen 1 and 2 (Nova1 and Nova2, which will be together referred to as Nova) and Argonaute (Ago) proteins in a genome-wide manner. Recently, individual-nucleotide resolution CLIP (iCLIP) was developed to identify cross-link sites independently of cDNA mutations .
Our first goal was to determine the proportion of truncated cDNAs in the iCLIP cDNA libraries. CLIP and PAR-CLIP protocols identify only the cDNAs that have read through the cross-link site. However, the peptide or amino acid left on the RNA after treatment with proteinase K can obstruct the reverse transcriptase, and therefore primer extension studies showed that a significant proportion of cDNAs truncate at the cross-link sites . iCLIP employs a different cDNA cloning protocol from CLIP and PAR-CLIP, which enables identification of the cDNAs that truncate at the cross-link sites . The position of cDNA truncation therefore enables iCLIP to identify the cross-link sites. The ability of iCLIP to provide nucleotide-resolution information about the cross-link sites was initially demonstrated by determining the positions within uridine tracts that cross-link to heterogeneous nuclear ribonucleoproteins C1/C2 (hnRNP C), and the positions downstream of 5' splice sites that cross-link to cytotoxic granule-associated RNA binding proteins (TIA1 and TIAL1) [5, 7]. However, these studies did not evaluate the proportion of cDNAs that truncate at the cross-link sites, as compared to the cDNAs that read through the cross-link sites. If the read-through cDNAs dominated the iCLIP libraries, they could impair the ability of iCLIP to identify the cross-link sites with nucleotide resolution.
Our second goal was to compare the cross-link sites identified by CLIP and iCLIP. Due to the well-characterized sequence preference of Nova proteins and the available CLIP data, we performed iCLIP with Nova proteins in order to compare the two methods. Nova proteins, encoded by Nova1 and Nova2 genes, contain three KH RNA-binding domains. The sequence specificity of Nova proteins has been extensively characterized using in vitro selection and RNA binding, X-ray crystallography, mutagenesis, and computational studies of Nova-dependent splicing enhancer or silencer elements [13–18]. These studies have shown that the KH domains recognize the YCAY motif (Y stands for pyrimidine), such that the affinity of full-length Nova proteins to RNA increases with the number of proximal YCAY tetramers, and a minimum of three to five proximal YCAY tetramers was required for functional binding [13, 17]. Analysis of cDNA deletions in Nova CLIP demonstrated that they were located at YCAY motifs, which confirmed that cDNA deletions can identify protein-RNA cross-link sites .
Our third goal was to determine the sequence biases of UV-C-induced cross-linking. This question could not be addressed by the past CLIP and iCLIP studies, because all of these studies have used UV-induced cross-linking to identify protein-RNA interactions. We therefore used a method where we induced covalent protein-RNA cross-linking in vivo without employing UV-C irradiation. This was achieved by employing the NOP2/Sun domain family, member 2 protein (NSUN2), an RNA methyltransferase that catalyzes the methylation of cytosine to 5-methylcytosine [19–21]. During the catalytic process, cysteine 321 of NSUN2 forms a covalent link with the cytosine residue in the RNA substrate. Cysteine 271 is then required to catalyze release of the methylated RNA from NSUN2. When the cysteine 271 residue is mutated to alanine, release of substrate no longer occurs, and an irreversible covalent bond forms between NSUN2 and RNA . We performed iCLIP with the mutant human NSUN2 (C271A), which allowed us to evaluate the sequence biases introduced by the UV-C induced cross-linking. This demonstrated that both CLIP and iCLIP are subject to a modest uridine preference caused by UV-C cross-linking. In addition, our analyses also demonstrated that CLIP cDNA deletions primarily occur at TTT motifs, and showed that iCLIP cDNA truncation sites analysis is better suited for the study of binding sites located within repetitive motifs.
The vast majority of iCLIP cDNAs truncate at the cross-link sites
Deletions in CLIP, iCLIP and mRNA-seq cDNAs
Unique cDNAs with deletions in sequence reads
Unique cDNAs with deletions (1 to 25 nucleotides)
Ago mRNA CLIP
hnRNP C iCLIP
To analyze if the proportion of truncated cDNAs in iCLIP depends on the protein being studied, we evaluated iCLIP data from past studies of hnRNP C, TIA1, TIAL1 and TAR DNA binding protein (TDP-43; also known as TARDBP) [7, 23]. Strikingly, the proportion of cDNAs containing deletions in TIA1, TIAL1 and TDP-43 iCLIP was close to that of mRNA-Seq, indicating that over 95% of cDNAs in these iCLIP experiments truncated at cross-link sites (Table 1; Figures s2 and s3 in Additional file 1). To further consolidate this finding, we evaluated cross-linking of TIA1 and TIAL1 at positions +6 to +30 downstream of exon-intron junctions, which were shown by an independent study to be important for TIA-dependent splicing regulation [24, 25]. cDNA truncations identified this region 291 and 457 times more frequently compared to cDNA deletions in TIA1 and TIAL1 iCLIP, respectively (Figure s4 in Additional file 1). This demonstrates the improved capacity of iCLIP cDNA truncations, compared to cDNA deletions, in identifying the TIA binding sites. Taken together, our results indicate that the vast majority of cDNAs in iCLIP experiments are truncated at the protein-RNA cross-link sites.
Analysis of sequence biases at the cross-link sites identified by CLIP or iCLIP
The re-defined positions of cDNA deletions showed that YCAY motifs were enriched only at positions -4 and +1 relative to the deletion sites (Figure s5B in Additional file 1). Notably, the vast majority of these cDNA deletions were located within TTT motifs (Figure s6 and Additional file 1), and TTT enrichment was present also at Ago CLIP cDNA deletion sites (Figure s7 in Additional file 1). Furthermore, TTT enrichment was present at Nova CLIP cDNA deletion sites even if we did not use an FDR threshold to define the significant CLIP cDNA deletion sites (Figure s6D, E in Additional file 1). The TTTCAY motif represented 80% of the Nova CLIP cDNA deletions that mapped to the nucleotide preceding the YCAY motif (+1 position; Figure 2c), and YCATTT represented 90% of the cases where cDNA deletions mapped to the nucleotide following the YCAY motif (-4 position; Figure s5D in Additional file 1). Furthermore, the YCATTTCAY motif represented 56% of the cases where CLIP cDNA deletions mapped to the -4 position of YCAY (Figure s5B in Additional file 1), indicating that the -4 peak was largely a result of the TTT enrichment at CLIP cDNA deletions. Therefore, we evaluated only the YCAY motif starting closest to each cross-link site, which showed that CLIP cDNA deletions and iCLIP cDNA truncations both identified the nucleotide preceding the YCAY motifs (+1 site) as the primary Nova cross-link site (Figure 2b, c). Importantly, TTTCAY represented only 15% of the cases where Nova iCLIP cDNA truncations mapped to the nucleotide preceding the YCAY motif (+1 position; Figure 2b) and 22% of the cases where iCLIP cDNA truncation mapped to the nucleotide following the YCAY motif (-4 position; Figure s5C in Additional file 1). Nova and Ago proteins do not have a known binding preference for the U tracts. Therefore, the enrichment of the TTT motif is most likely associated with the deletion sites in read-through cDNAs. The analysis of cDNA truncations in iCLIP therefore provides an advantage by identifying cross-link sites lacking the TTT motif.
iCLIP cDNA truncations identify the positions of CLIP cDNA deletions
To further examine the overlap between cross-link sites identified by CLIP and iCLIP, we directly compared the positions of the re-defined cDNA deletions in CLIP (FDR < 0.001) and cDNA truncations in iCLIP (no FDR threshold). iCLIP cDNA truncation sites were significantly enriched at the CLIP deletion sites, confirming that iCLIP cDNAs represent truncations at the cross-link sites (Figure 2d; Figure s8A, B in Additional file 1). In contrast, the 3' ends of CLIP cDNAs that lack deletions did not overlap with the CLIP deletion sites, confirming that the overlap is specific to iCLIP libraries (Figure s8C, D in Additional file 1). Similarly, the 3' ends of iCLIP cDNAs containing deletions did not overlap with the CLIP cDNA deletion sites (Figure 2d; Figure s8E in Additional file 1). Instead, the 3' ends of iCLIP cDNAs containing deletions had a similar pattern to the 3' ends of CLIP cDNAs, and iCLIP cDNA deletion sites were significantly enriched at CLIP cDNA deletion sites, indicating that most iCLIP cDNAs containing deletions represent read-through sequences (Figure 2d; Figure s8 in Additional file 1). In conclusion, we find that iCLIP cDNAs lacking deletions truncate at positions overlapping with deletions in CLIP or iCLIP cDNAs, confirming that they can identify the position of cross-link sites.
UV-C-induced cross-linking preferentially occurs at uridines
The use of cross-link sites to study RNA binding specificity
To evaluate how the sequence biases at cross-link sites influence the study of RNA binding specificity of Nova, we assessed the nucleotide composition of the two variant pyrimidine positions of YCAY motifs at Nova cross-link sites identified by the CLIP cDNA deletions or iCLIP cDNA truncations (Figure 3e, f). We found that the relative proportions of TCAC increased at cross-link sites of both methods, with a corresponding decrease in the CCAT motif (Figure 3e, f). To quantify this change, we compared the ratio of YCAY motifs starting at positions 0 to +2 to those starting at positions -20 to +20. At the CLIP cDNA deletion sites, CCAT decreased from 21% to 0.3%, whereas at iCLIP cDNA truncation sites the decrease was from 36% to 26% with a corresponding increase in TCAC (Figure s10A-D in Additional file 1). This indicates that the analysis of sequence motifs at cross-link sites identified by CLIP cDNA deletions has stronger sequence preferences compared to cross-link sites identified by iCLIP cDNA truncations.
iCLIP allows quantitative analysis of protein occupancy on its RNA-binding sites
In this manuscript, we benchmarked CLIP and iCLIP, the two most frequently used methods for transcriptome-wide study of protein-RNA interactions in untreated cells and tissues. We showed that similarly to CLIP, iCLIP libraries contain a small proportion of cDNAs with deletions. Therefore iCLIP can identify cross-link sites by two independent approaches: cDNA deletions or cDNA truncations. Even though the proportion of iCLIP cDNA with deletions is very low, the overlap of deletions with the cross-link sites identified by cDNA truncations can serve to validate the nucleotide resolution of iCLIP data. The low proportion of cDNAs with deletions indicates that 82% of Nova iCLIP cDNAs were truncated at cross-link sites, and this proportion is even greater in iCLIP data of other proteins. The variable proportions of truncated cDNAs in iCLIP of different RBPs might reflect the effects of different peptides that remain bound to the RNA after proteinase K digestion. Since iCLIP can produce both truncated and read-through cDNAs, it can robustly identify RNA-binding sites even in cases where the read-through cDNAs are rarely produced (such as in the TIA proteins), and is therefore capable of studying a larger repertoire of RBPs. Furthermore, by using the mutant NSUN2 protein, we demonstrated that iCLIP can identify cross-link sites induced either by UV-C-induced cross-linking or other covalent cross-linking protocols.
We found that the TTT motif was the primary motif at the cross-link sites identified by Nova and Ago CLIP cDNA deletions. Since these studies did not identify recognition of uridine-rich sequences by Nova or Ago proteins, the potential functional relevance of the TTT motif remains to be established. Importantly, we found that the TTT motif is not enriched in Nova CLIP cDNAs without deletions, which constitute the large majority of CLIP cDNAs (Figure s11 in Additional file 1), indicating that the enrichment of TTT might be a bias introduced by the cDNA deletion analysis. As has been shown in past studies of the slippage-mediated mutations by HIV reverse transcriptase, one-base deletions are most common at homonucleotide runs . Therefore, the increased incidence of cDNA deletions at homonucleotide runs, together with the UV-C cross-linking bias for uridines, might be responsible for the enrichment of TTT motif at the cross-link sites identified by cDNA deletions in Nova and Ago CLIP. It remains to be seen if the TTT motif is the primary site for deletions only in Nova and Ago CLIP cDNAs, or also in CLIP of other RBPs.
It is also important to be aware that cDNA mutations in CLIP and PAR-CLIP may represent genomic variation rather than cross-link induced mutations. For instance, we found that most deletions in TDP-43 iCLIP cDNAs constituted consecutive dinucleotide deletions in TG repeats (Figure s12 in Additional file 1), unlike the deletions in Nova CLIP cDNAs where consecutive dinucleotide deletions constituted only 21% of all deletions . Such dinucleotide variation is common in the human genome because TG repeats correspond to the hyper-variable CA microsatellite. Thus, it is likely that most deletions identified in TDP-43 iCLIP cDNAs are a result of genomic variation, rather than cross-link-induced mutations. Methods that aim to identify cross-link sites by analysis of mutations in cDNAs are therefore prone to identifying genomic variation instead of cross-link sites. Analysis of cDNA truncations in iCLIP is therefore useful to identify cross-link sites independent of the genomic variation.
To evaluate the nucleotide preferences of UV-C-induced cross-linking, we compared it with the spontaneous covalent cross-linking of NSUN2. We observed a consistent T enrichment at position 0 of all iCLIP studies where cross-linking was induced with UV-C - since this nucleotide is not part of cDNAs (but is upstream of cDNAs), the T enrichment could only result from steps up to reverse transcription that are common between CLIP and iCLIP. Moreover, NSUN2 had no T enrichment, but instead had C enrichment at position +1. This indicates that UV-C-induced cross-linking has a uridine bias. As data of additional RBPs become available, other nucleotide biases might be identified. Our results also indicate that cDNAs can truncate either one nucleotide before the cross-link sites, as appears most common in the case of UV-C-induced cross-linking, or directly at cross-link sites, as is most common in the case of NSUN2.
Since the methylation by NSUN2 is a transitory enzymatic reaction, we could not cross-link NSUN2 by UV-C light in order to directly compare the cross-link sites of the different methods. Instead, we compared cross-link sites identified by cDNA deletions in Nova CLIP and cDNA truncations in Nova iCLIP. The sequence specificity of Nova proteins has been extensively characterized by previous evolutionary conservation  and affinity measurements [13–18]. Both our and previous studies  showed that both TCAT and CCAT are highly enriched in the region surrounding the cross-link sites. However, there is a large change in the proportion of TCAT and CCAT enrichment at deletion sites of CLIP cDNAs, which is consistent with our finding that deletions primarily occurred at the TTT motif. In contrast, there is a small change in TCAT and CCAT at iCLIP cDNA truncation sites, which likely reflects the uridine preference of UV-C cross-linking. This indicates that the enriched sequence motifs at cross-link sites identified by CLIP are more strongly affected by the sequence preferences of cDNA deletions compared to iCLIP cDNA truncation sites.
It is clear that the motifs enriched directly at cross-link sites need to be interpreted with caution because of the potential effects of nucleotide preferences of UV cross-linking. However, we demonstrate that enrichment of the sequence motifs recognized by each RBP is not restricted to the cross-link sites. This is particularly evident by the enrichment of TCAT and CCAT in Nova iCLIP, and TG repeats in TDP-43 iCLIP, which is present even at a distance of over 20 nucleotides away from the cross-link sites (Figures s10E-G and s11 in Additional file 1). This pattern of enrichment most likely reflects the high-affinity binding sites of RBPs, which are often composed of clusters of short motifs [23, 30]. Analysis of such clustered motifs that are enriched not only directly at the cross-link sites but also in the vicinity of cross-link sites could avoid the sequence biases of deletion site analysis or UV-C-induced cross-linking.
Past studies summarized the CLIP data at multiple binding sites across the genome to show that they provide quantitative information . However, it was not clear if occupancy of individual binding sites within an individual RNA could be quantitatively compared. We analyzed the primary Nova RNA target Meg3, which showed that iCLIP cDNA counts correlate well with the YCAY cluster score. The use of random barcode for cDNA quantification  is one reason for the increased quantitative nature of iCLIP. Moreover, genome-wide analysis showed that iCLIP identifies a larger number of clustered YCAY motifs. This difference may be explained by the lack of TTT preference in iCLIP, or the increased mappability of iCLIP cDNAs, since the truncated cDNA are less likely to fully overlap with the repetitive motif clusters. Although we showed that iCLIP truncation analysis allows the comparison of binding sites within a single transcript, care needs to be taken in comparisons of binding sites on different transcripts, and between exons and introns of a transcript, because these can vary dramatically in their abundance. The accessibility of an RBP to different transcripts also depends on its localization within the cell. The normalization approaches to take these variations into account have been recently reviewed . Our study indicates that UV-C cross-linking is associated with a mild uridine bias, which can be avoided by analysis of the motifs enriched in the vicinity of cross-link sites.
Our analysis showed that over 80% of cDNAs were truncated at cross-link sites. We showed that cDNA truncations in iCLIP can identify the same cross-link sites as CLIP cDNA deletions. Moreover, since only iCLIP can recover truncated cDNAs, iCLIP identifies cross-link sites more comprehensively. We observed a strong enrichment of the TTT motif at CLIP cDNA deletion sites, but only a mild T enrichment at iCLIP cDNA truncation sites. The T enrichment most likely results from uridine preference of UV-C-induced cross-linking, because it is absent when we perform UV-independent cross-linking of a mutant RNA methylase. The TTT enrichment, however, most likely results from analysis of cDNA deletions, because it is absent when analyzing CLIP cDNAs without deletions. Finally, we demonstrated that iCLIP is better capable of identifying long YCAY clusters as the primary Nova binding sites.
Materials and methods
CLIP, mRNA-Seq and iCLIP data sets and experiments
Nova and Ago CLIP data sets [2, 32, 33] and the significant cDNA deletion sites were described by Zhang and Darnell . The cDNA library of mRNA-Seq for HeLa cell transcripts was prepared using an Illumina TruSeq kit. Nova iCLIP was performed by following the standard iCLIP protocol for brain tissue [5, 23]. We used postnatal mouse brain tissue and immunoprecipitated Nova protein using an anti-Nova antibody . hnRNP C, TIA1, TIAL1 and TDP-43 iCLIP data sets were available from past studies [5, 7, 23]. For NSUN2 iCLIP, we followed the standard iCLIP protocol with the following modifications: we transfected COS7 cells with the C271A mutant NSUN2, and did not subject the cells to UV-C irradiation. We immunoprecipitated the mutant NSUN2 using an antibody against the myc epitope tag (9E10; Sigma-Aldrich, St. Louis, MO, USA). High-throughput sequencing for the experiments conducted in this study was performed using the Illumina Genome Analyzer IIx.
Mapping and annotation of sequencing data
We used the mm9/NCBI37, hg19/GRCh37 and MGSC Merged 1.0/rheMac2 genome assemblies and Ensembl 59 (for mouse and human) and Ensembl63 (for rhesus macaque) gene annotation. Before mapping we removed random barcode and adaptor sequences from iCLIP cDNA sequences, as described previously . We performed iterative mapping of cDNAs without deletions, followed by mapping of remaining cDNAs containing deletions. In the first round, we mapped the cDNAs to the genome with Bowtie 0.12.7 , which does not allow deletions, using the following parameters: -v 2 -m 1 -a --best --strata. The nucleotide preceding the iCLIP cDNAs mapped by Bowtie was used to define the cross-link sites identified by truncated cDNAs. In the second round, we mapped the remaining cDNAs to the genome using Novoalign , which can map cDNAs containing deletions, using the following parameter: -e 0. The deleted nucleotide in CLIP and iCLIP cDNAs mapped by Novoalign was used to define the cross-link sites identified by read-through cDNAs. If a cDNA had more than one deletion, we selected the one closest to the beginning of the read. When multiple cDNAs with the same random barcode mapped to the same starting position in the genome, but contained deletions at different sites, we selected the deletions with most frequent occurrence. If two deletions had the same frequency of occurrence, we selected the one closest to the beginning of the sequence read for the cDNAs. If the cDNA did not contain random barcode (CLIP and mRNA-Seq), we followed a procedure where we allocated the same random barcode to all cDNAs. The method for the random barcode evaluation, annotation of genomic segments and identification of significantly clustered cDNA truncation sites was described earlier [5, 7], except that the Ensembl 59 gene annotation was used. For analyses of CLIP, mRNA-Seq and iCLIP data, we only used cDNA libraries that contained more than 10,000 uniquely mapped reads.
Calculating the number of total cDNAs and cDNAs with deletions
Since CLIP and mRNA-Seq cDNA lacked random barcodes, for the comparison of the number of total cDNAs or cDNAs with deletions in CLIP, mRNA-Seq and iCLIP cDNA library (Table 1), we performed the following procedure to cancel random barcode evaluation of iCLIP libraries. For total cDNA number calculations, we joined all sequence reads starting at the same position of the genome into a single read. For cDNAs with deletions, we selected unique cDNAs with deletions as described above. If there was more than one cDNA with deletions, where the sequence reads started the same position of the genome, we joined them and defined the deletion sites as the one closest to the beginning of the reads. This analysis and all following analyses were done with custom Python and R scripts and the iCount server .
Calculating the proportion of read-through cDNAs in Nova iCLIP cDNA libraries
First, we estimated the proportion of read-through cDNAs in the total iCLIP cDNA library by evaluating the proportion of cDNAs containing deletions. This allows us to evaluate the proportion of cDNAs that were missed in the CLIP protocol due to cDNA truncations. In this we assumed the following: first, Nova CLIP cDNA libraries contain only read-through cDNAs, whereas Nova iCLIP cDNA libraries contain read-through and truncated cDNAs; and second, due to the identical protocol for reverse transcription and sequencing, the rate of deletions and their distribution in read-through cDNAs was the same in Nova CLIP and iCLIP cDNA libraries.
Furthermore, while both CLIP and iCLIP aim to prepare libraries with average cDNA lengths of 50 nucleotides, different experiments had some variation in sequence lengths. To avoid this variation when comparing cDNA libraries of CLIP, mRNAseq and iCLIP, we only evaluated deletions in the first 25 nucleotides from the 5' end of cDNAs.
To make this estimate we used the following values: p(iCLIP), the proportion of cDNAs with deletions in the first 25 nucleotides for Nova iCLIP data (3,749/166,330, ≈2.3%); p(RT), the proportion of cDNAs with deletions in the first 25 nucleotides for read-through cDNAs from Nova CLIP data (421,417/3,852,778, ≈11%); and p(BG), the proportion of cDNA with deletions in the first 25 nucleotides of mRNA-Seq cDNAs, which we used to estimate the background occurrence of deletions (18,936/4,857,809, ≈0.4%). Thus, we estimated that 82% of cDNAs were lost in CLIP cDNA cloning protocol due to truncations.
Thus, we estimate that 85% of Nova iCLIP cDNAs, among the cDNAs that lack deletions, were truncated at cross-link sites.
Re-defining the deletion sites
We searched the sequence from -2 to 0 positions to the deletion sites for the plus strand-mapped cDNAs and from 0 to 2 for the minus strand-mapped cDNAs. If the sequence was TTT, we re-defined the deletion site as the middle of the TTT motif. If the re-defined deletion site overlapped with another existing deletion site, the deletion counts were summed. Nucleotide composition around deletion sites was visualized with WebLogo 3 .
YCAY motif occurrence and enrichment around cross-link sites
The YCAY motif occurrence was calculated around cross-link sites defined by confident CLIP deletion sites by Zhang and Darnell (FDR < 0.001) , or by confident iCLIP truncation sites (FDR < 0.05) . The cross-link sites were evaluated on the sense strand of transcribed regions and on both strands of the intergenic regions. The closest YCAY motif was defined by recording the starting position of the YCAY motif with the smallest distance to the cross-link site. If two YCAY motifs had the same distance to cross-link sites, we selected the upstream motif (for example, if the closest YCAY motifs started at positions -5 and +5, we selected only the position -5). To determine the background occurrence of YCAY motifs, we randomly re-positioned the cross-link sites within the same genomic segment (for instance, in the same 3' untranslated region or the same intron, as described before ) and calculated YCAY occurrence around these re-positioned sites (in the region -50 to 50 relative to the sites). We performed this randomization 100 times and calculated the average background YCAY motif occurrence. To determine the region of two-fold enrichment in Figure 5a, we averaged the enrichment at -2 to +2 positions around each position to avoid the effects of fluctuations.
Visualization of cDNAs and cross-link sites on the Meg3 RNA
We used the postnatal mouse brain Nova CLIP data set to visualize the Nova CLIP cDNAs without deletions for Figure 4. The cDNAs without deletions were mapped with Bowtie as described above (without a FDR threshold), and converted to eland format. The cDNAs were then clustered with the Findpeaks 184.108.40.206 program  using the following argument: -dist_type 0 50 -hist_size 1 -eff_size 1.8655e9. The Nova CLIP cDNA deletion sites (FDR < 0.001) and the counts were described above and re-defined as described above. The Nova iCLIP truncation sites (FDR < 0.05) and the cDNA counts were described above. These data sets on the Meg3 gene were visualized with the UCSC genome browser.
Calculation of the YCAY score
The YCAY score corresponds to the density of YCAY motifs in a 41-nucleotide sliding window. A region comprising 20 nucleotides upstream and downstream around the genomic position of interest was evaluated, and the number of YCAY motifs that were completely contained in the area was used to determine the YCAY score for the position.
Correlation between YCAY score and CLIP or iCLIP cDNA counts
The region chr12:110796849-110809936 on the mouse genome (mm9) was evaluated to study Nova binding to the Meg3 RNA. The correlation between the YCAY score of the YCAY cluster and the highest CLIP or iCLIP cDNA counts in the cluster was calculated.
The YCAY clusters were defined using an approach inspired by the Findpeaks 220.127.116.11 program : 1) calculate the YCAY score for all positions in the region and determine the local maximum; 2) if the minimum score between local maxima was 0, the clusters ended at the position where the score became 0; 3) if the minimum score between the local maxima was not 0, compare the minimum score with 0.9-fold of the smaller of the two local maxima; 4) if the minimum score was smaller, separate the cluster at the middle of the area with the local minimum value; 5) if the minimum score was larger, join the two peaks into the same cluster, and compare its local maximum to the next local maximum, starting from step 2.
The maximum YCAY score in each cluster was defined as the YCAY score of the cluster. We only used the clusters that contained at least one cross-link site to calculate the correlation. We calculated the Spearman's rank correlation coefficient between the YCAY score and cDNA count. The same analysis was done to calculate the correlation with cDNA counts at cross-link sites defined by either CLIP cDNA deletions or iCLIP cDNA truncations.
The P-value of the correlation between the YCAY score and cDNA count of iCLIP or CLIP on the region of the Meg3 RNA described above was calculated using asymptotic t approximation as two-sided. These value was calculated with cor.test(x, y, alternative = "two.sided", method = "spearman", exact = FALSE) function of R.
The Nova and NSUN2 iCLIP data are available from ArrayExpress with accession number E-MTAB-1008 and together with past published iCLIP data also from iCount .
UV cross-linking and immunoprecipitation
false discovery rate
high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation
heterogeneous nuclear ribonucleoproteins C1/C2
individual-nucleotide resolution UV cross-linking and immunoprecipitation
Neuro-oncological ventral antigen
NOP2/Sun domain family, member 2
Photoactivatable Ribonucleoside-Enhanced CLIP
TAR DNA binding protein (also known as TARDBP)
cytotoxic granule-associated RNA binding protein
The authors wish to thank Robert B Darnell for sharing the anti-Nova antibody, Chaolin Zhang, Kathi Zarnack, Christopher Sibley and Nicholas McGlincy for their valuable comments on the manuscript, and the genomic team at CRI for Illumina sequencing. This work was supported by the European Research Council grant 206726-CLIP, Slovenian Research Agency (P2-0209, Z7-3665) and the Medical Research Council (grant number U105185858). YS is supported by the Nakajima Foundation and JK is supported by the Human Frontiers Science Program Postdoctoral Fellowship.
- Ule J, Jensen KB, Ruggiu M, Mele A, Ule A, Darnell RB: CLIP identifies Nova-regulated RNA networks in the brain. Science. 2003, 302: 1212-1215. 10.1126/science.1090095.PubMedView ArticleGoogle Scholar
- Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer AC, Blume JE, Wang X, Darnell JC, Darnell RB: HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008, 456: 464-469. 10.1038/nature07488.PubMedPubMed CentralView ArticleGoogle Scholar
- Yeo GW, Coufal NG, Liang TY, Peng GE, Fu XD, Gage FH: An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol. 2009, 16: 130-137. 10.1038/nsmb.1545.PubMedPubMed CentralView ArticleGoogle Scholar
- Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T: Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010, 141: 129-141. 10.1016/j.cell.2010.03.009.PubMedPubMed CentralView ArticleGoogle Scholar
- Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM, Ule J: iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol. 2010, 17: 909-915. 10.1038/nsmb.1838.PubMedPubMed CentralView ArticleGoogle Scholar
- Ule J, Stefani G, Mele A, Ruggiu M, Wang X, Taneri B, Gaasterland T, Blencowe BJ, Darnell RB: An RNA map predicting Nova-dependent splicing regulation. Nature. 2006, 444: 580-586. 10.1038/nature05304.PubMedView ArticleGoogle Scholar
- Wang Z, Kayikci M, Briese M, Zarnack K, Luscombe NM, Rot G, Zupan B, Curk T, Ule J: iCLIP predicts the dual splicing effects of TIA-RNA interactions. PLoS Biol. 2010, 8: e1000530-10.1371/journal.pbio.1000530.PubMedPubMed CentralView ArticleGoogle Scholar
- Lozzio CB, Wigler PW: Cytotoxic effects of thiopyrimidines. J Cell Physiol. 1971, 78: 25-32. 10.1002/jcp.1040780105.PubMedView ArticleGoogle Scholar
- Granneman S, Kudla G, Petfalski E, Tollervey D: Identification of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and high-throughput analysis of cDNAs. Proc Natl Acad Sci USA. 2009, 106: 9613-9618. 10.1073/pnas.0901997106.PubMedPubMed CentralView ArticleGoogle Scholar
- Kishore S, Jaskiewicz L, Burger L, Hausser J, Khorshid M, Zavolan M: A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins. Nat Methods. 2011, 8: 559-564. 10.1038/nmeth.1608.PubMedView ArticleGoogle Scholar
- Zhang C, Darnell RB: Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data. Nat Biotechnol. 2011, 29: 607-614. 10.1038/nbt.1873.PubMedPubMed CentralView ArticleGoogle Scholar
- Urlaub H, Hartmuth K, Luhrmann R: A two-tracked approach to analyze RNA-protein crosslinking sites in native, nonlabeled small nuclear ribonucleoprotein particles. Methods. 2002, 26: 170-181. 10.1016/S1046-2023(02)00020-8.PubMedView ArticleGoogle Scholar
- Buckanovich RJ, Darnell RB: The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo. Mol Cell Biol. 1997, 17: 3194-3201.PubMedPubMed CentralView ArticleGoogle Scholar
- Jensen KB, Musunuru K, Lewis HA, Burley SK, Darnell RB: The tetranucleotide UCAY directs the specific recognition of RNA by the Nova K-homology 3 domain. Proc Natl Acad Sci USA. 2000, 97: 5740-5745. 10.1073/pnas.090553997.PubMedPubMed CentralView ArticleGoogle Scholar
- Lewis HA, Musunuru K, Jensen KB, Edo C, Chen H, Darnell RB, Burley SK: Sequence-specific RNA binding by a Nova KH domain: implications for paraneoplastic disease and the fragile X syndrome. Cell. 2000, 100: 323-332. 10.1016/S0092-8674(00)80668-6.PubMedView ArticleGoogle Scholar
- Musunuru K, Darnell RB: Determination and augmentation of RNA sequence specificity of the Nova K-homology domains. Nucleic Acids Res. 2004, 32: 4852-4861. 10.1093/nar/gkh799.PubMedPubMed CentralView ArticleGoogle Scholar
- Dredge BK, Stefani G, Engelhard CC, Darnell RB: Nova autoregulation reveals dual functions in neuronal splicing. EMBO J. 2005, 24: 1608-1620. 10.1038/sj.emboj.7600630.PubMedPubMed CentralView ArticleGoogle Scholar
- Teplova M, Malinina L, Darnell JC, Song J, Lu M, Abagyan R, Musunuru K, Teplov A, Burley SK, Darnell RB, Patel DJ: Protein-RNA and protein-protein recognition by dual KH1/2 domains of the neuronal splicing factor Nova-1. Structure. 2011, 19: 930-944. 10.1016/j.str.2011.05.002.PubMedPubMed CentralView ArticleGoogle Scholar
- King MY, Redman KL: RNA methyltransferases utilize two cysteine residues in the formation of 5-methylcytosine. Biochemistry. 2002, 41: 11218-11225. 10.1021/bi026055q.PubMedView ArticleGoogle Scholar
- Hussain S, Benavente SB, Nascimento E, Dragoni I, Kurowski A, Gillich A, Humphreys P, Frye M: The nucleolar RNA methyltransferase Misu (NSun2) is required for mitotic spindle stability. J Cell Biol. 2009, 186: 27-40. 10.1083/jcb.200810180.PubMedPubMed CentralView ArticleGoogle Scholar
- Motorin Y, Lyko F, Helm M: 5-methylcytosine in RNA: detection, enzymatic formation and biological functions. Nucleic Acids Res. 2010, 38: 1415-1430. 10.1093/nar/gkp1117.PubMedPubMed CentralView ArticleGoogle Scholar
- Redman KL: Assembly of protein-RNA complexes using natural RNA and mutant forms of an RNA cytosine methyltransferase. Biomacromolecules. 2006, 7: 3321-3326. 10.1021/bm051012l.PubMedView ArticleGoogle Scholar
- Tollervey JR, Curk T, Rogelj B, Briese M, Cereda M, Kayikci M, Konig J, Hortobagyi T, Nishimura AL, Zupunski V, Patani R, Chandran S, Rot G, Zupan B, Shaw CE, Ule J: Characterizing the RNA targets and position-dependent splicing regulation by TDP-43. Nat Neurosci. 2011, 14: 452-458. 10.1038/nn.2778.PubMedPubMed CentralView ArticleGoogle Scholar
- Forch P, Puig O, Martinez C, Seraphin B, Valcarcel J: The splicing regulator TIA-1 interacts with U1-C to promote U1 snRNP recruitment to 5' splice sites. EMBO J. 2002, 21: 6882-6892. 10.1093/emboj/cdf668.PubMedPubMed CentralView ArticleGoogle Scholar
- Aznarez I, Barash Y, Shai O, He D, Zielenski J, Tsui LC, Parkinson J, Frey BJ, Rommens JM, Blencowe BJ: A systematic analysis of intronic sequences downstream of 5' splice sites reveals a widespread role for U-rich motifs and TIA1/TIAL1 proteins in alternative splicing regulation. Genome Res. 2008, 18: 1247-1258. 10.1101/gr.073155.107.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhou Y, Cheunsuchon P, Nakayama Y, Lawlor MW, Zhong Y, Rice KA, Zhang L, Zhang X, Gordon FE, Lidov HG, Bronson RT, Klibanski A: Activation of paternally expressed genes and perinatal death caused by deletion of the Gtl2 gene. Development. 2010, 137: 2643-2652. 10.1242/dev.045724.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang X, Rice K, Wang Y, Chen W, Zhong Y, Nakayama Y, Zhou Y, Klibanski A: Maternally expressed gene 3 (MEG3) noncoding ribonucleic acid: isoform structure, expression, and functions. Endocrinology. 2010, 151: 939-947. 10.1210/en.2009-0657.PubMedPubMed CentralView ArticleGoogle Scholar
- Hamburgh ME, Curr KA, Monaghan M, Rao VR, Tripathi S, Preston BD, Sarafianos S, Arnold E, Darden T, Prasad VR: Structural determinants of slippage-mediated mutations by human immunodeficiency virus type 1 reverse transcriptase. J Biol Chem. 2006, 281: 7421-7428. 10.1074/jbc.M511380200.PubMedView ArticleGoogle Scholar
- Jelen N, Ule J, Zivin M, Darnell RB: Evolution of Nova-dependent splicing regulation in the brain. PLoS Genet. 2007, 3: 1838-1847.PubMedView ArticleGoogle Scholar
- Buratti E, Baralle FE: Characterization and functional implications of the RNA binding properties of nuclear factor TDP-43, a novel splicing regulator of CFTR exon 9. J Biol Chem. 2001, 276: 36337-36343. 10.1074/jbc.M104236200.PubMedView ArticleGoogle Scholar
- Konig J, Zarnack K, Luscombe NM, Ule J: Protein-RNA interactions: new genomic technologies and perspectives. Nat Rev Genet. 2011, 13: 77-83. 10.1038/ni.2154.View ArticleGoogle Scholar
- Chi SW, Zang JB, Mele A, Darnell RB: Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature. 2009, 460: 479-486.PubMedPubMed CentralGoogle Scholar
- Zhang C, Frias MA, Mele A, Ruggiu M, Eom T, Marney CB, Wang H, Licatalosi DD, Fak JJ, Darnell RB: Integrative modeling defines the Nova splicing-regulatory network and its combinatorial controls. Science. 2010, 329: 439-443. 10.1126/science.1191150.PubMedPubMed CentralView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMedPubMed CentralView ArticleGoogle Scholar
- Novoalign. [http://www.novocraft.com/]
- iCount. [http://icount.biolab.si]
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.PubMedPubMed CentralView ArticleGoogle Scholar
- Fejes AP, Robertson G, Bilenky M, Varhol R, Bainbridge M, Jones SJ: FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics. 2008, 24: 1729-1730. 10.1093/bioinformatics/btn305.PubMedPubMed CentralView ArticleGoogle Scholar