Characterizing 5-methylcytosine in the mammalian epitranscriptome
© BioMed Central Ltd 2013
Published: 29 November 2013
Skip to main content
© BioMed Central Ltd 2013
Published: 29 November 2013
The post-transcriptional modification 5-methylcytosine (m5C) occurs in a wide range of coding and non-coding RNAs. We describe transcriptome-wide approaches to capture the global m5C RNA methylome. We also discuss the potential functions of m5C in RNA and compare them to 6-methyladenosine modifications.
Post-transcriptional changes in RNA processing are essential regulators for most, if not all, cellular responses. Although RNA modifications are more prevalent and diverse in their chemical nature than DNA modifications , our knowledge of their occurrence and function in RNA is generally limited. There are approximately 150 known ribonucleoside modifications: most of them have been found in tRNA and rRNA, but some occur in mRNA [1–4]. Post-transcriptional modifications are highly likely to add complexity to RNA-mediated functions.
The advent of next-generation sequencing (NGS) tools has enabled the identification of RNA modifications both globally and in a substrate-specific manner. 6-Methyladenosine (m6A) was the first modification to be characterized, and is now known to be present in several types of RNA and, most notably, is highly enriched around stop codons in many mRNAs [2–4].
Another known modification in RNA is 5-methylcytosine (m5C). Although m5C is a well-characterized modification in DNA, its precise regulatory functions in RNA remain unclear [5, 6]. Until recently, the detection of RNA methylation involved the digestion of highly purified RNA followed by separation techniques, such as high performance liquid chromatography (HPLC) and mass spectrometry, which only allowed the identification of m5C in stable and highly abundant tRNAs and rRNAs [7–10]. Labeling techniques with 3H in living cells allowed detection of m5C in mRNA and viral RNAs [11, 12]. However, the deposition of m5C into mRNA remained controversial [13–17]. The recent advances in high-throughput techniques, combined with NGS, have renewed interest in the field, and have led to the identification of m5C as a widespread modification in coding and non-coding (nc)RNAs .
A fraction of these RNAs were found to be specifically methylated by the RNA methylase NSun2, including mRNAs, ncRNAs and several tRNAs . NSun2 had been previously shown to methylate tRNAs at various positions [19–22]. Additional NSun2-methylated coding and ncRNAs were identified in two studies published in 2013 that used customized RNA immunoprecipitation approaches followed by NGS [23, 24].
The regulatory functions of m5C modifications in RNA are still not fully understood. In tRNAs, in vitro cytosine-5 methylation can affect Mg2+ binding to tRNA molecules, which in turn influences the anticodon stem loop conformation and stabilizes the secondary structure [25, 26]. Cytosine-5 methylation alone, or in combination with other nonessential tRNA modifications, can also protect from degradation or cleavage [22, 27–29]. In rRNA, m5C is thought to play a role in translation . Synthetic cytosine-5 methylated mRNAs exhibit increased stability, and loss of methylation in the 3’ UTR of p16 has been reported to reduce its stability [31, 32].
The biological functions of tRNA m5C-methylation are linked to the regulation of protein translation in stress pathways and tissue differentiation in yeast, Drosophila, fish and mouse [19, 22, 29, 33–36]. Mutations in the NSUN2 gene in humans cause an autosomal recessive syndrome characterized by intellectual disability, skin disorders and growth retardation [37–40]. These findings, together with studies carried out in NSun2-deficient mice and cell lines, suggest a wide-ranging role for m5C modifications in RNA, including cellular signaling, tissue development and differentiation, and cancer [19, 21, 22, 36, 41–43].
In this review, we compare the current methods to identify m5C in the mammalian transcriptome with a particular focus on NSun2-mediated methylation. We further discuss the potential of these initial studies to comprehensively determine the global but enzyme-specific cytosine-5 RNA methylome. In addition, we compare studies focusing on m6A and m5C modifications and consider the likely benefits of characterizing and elucidating the functions of the mammalian epitranscriptome.
Bisulfite sequencing, a method based on the chemical deamination of cytosines and originally developed to detect m5C in DNA, was previously adapted for use with RNA (Figure 1A) . The technique is based on the differential chemical reactivity of m5C compared with cytosine. Sodium bisulfite causes the deamination of unmethylated cytosines into uridines in single-stranded DNA or RNA, while m5C remains unconverted . A crucial parameter is the fraction of cytosine that is converted to uridine, but a high conversion rate is only achieved by prolonged incubation under consecutive acidic and alkaline conditions, which also causes RNA degradation. This degradation can compromise the subsequent reverse transcription and PCR amplification steps [18, 46].
Using this approach, the occurrence of m5C can be reproducibly and quantitatively detected in tRNA and rRNA . Coupling bisulfite conversion of RNA with NGS has led to detection of m5C in coding and non-coding RNAs, in addition to tRNAs [18, 45]. Squires et al.  used polyA-enriched RNA and identified m5C as a widespread modification in coding and non-coding RNAs.
In contrast to Squires et al., bisulfite sequencing of size-selected mRNA was used by Edelheit et al.  to avoid tRNA and rRNA contamination from the prokaryote Sulfolobus solfataricus and, despite covering most coding genes, this approach detected low methylation levels in only 15 coding genes. However, all m5C sites were confirmed by RIP using a monoclonal antibody raised against m5C (m5C-RIP) (Figure 1B) . m5C-RIP has not been performed in mammalian cells as yet.
Aza-IP exploits the formation of a covalent bond between RNA and specific RNA methylases (Figure 1C) . Both DNA and RNA m5C methyltransferases form a temporary covalent bond between the catalytic cysteine in the enzymatic active site and carbon 6 of the targeted cytosine. This catalytic intermediate complex is later resolved to release the methylated cytosine at carbon 5 and to regenerate the free enzyme [47, 48]. In the first step of the Aza-IP method, 5-azacytidine, a cytidine analog, is incorporated into nascent RNA molecules by RNA polymerases (Figure 1C) [23, 49]. The only structural difference between cytidine and 5-azacytidine is a nitrogen substitution at carbon 5 that inhibits the release of the methylated RNA from its enzyme.
In a study by Khoddami and Cairns , this covalently bound protein-RNA complex was immunoprecipitated for the RNA methylases NSun2 and Dnmt2. The identification of the precise cytosine targeted by the enzymes was possible due to a specific transversion of cytosine to guanosine only at Dnmt2- and NSun2-methylated sites. Although it is unclear why and how this conversion occurs, it has been confirmed that there are Dnmt2-dependent m5C sites in the three tRNAs Asp, Gly and Val [22, 23, 29]. Putative novel targets for Dnmt2 were the type I cytokeratin KRT18 mRNA and KRT18 pseudogene mRNA . NSun2-dependent methylation sites occurred in tRNAs as well as several other ncRNAs. Although a potential site in the Src homology 2 domain containing F (SHF) mRNA was identified, statistically significant sites in other mRNAs were absent .
In an alternative approach, we utilized covalently bound catalytic intermediate RNA-protein complexes to identify methylation substrates of NSun2 (Figure 1D). NSun2 releases methylated RNA from the enzyme-substrate complex through a highly conserved cysteine residue in the catalytic domain [50, 51]. Mutating this cysteine to alanine results in a covalently bound stable RNA-protein complex [41, 50, 51]. In the human NSun2 protein, this cysteine is represented by C271 and the mutation to alanine (C271A) allowed capturing of stable NSun2-RNA complexes . Combined with a customized individual-nucleotide resolution cross-linking immunoprecipitation (iCLIP) approach where the UV-crosslinking step is omitted, and followed by NGS, we named this method methylation-iCLIP (miCLIP). This approach identified NSun2-mediated cytosine-5 methylation at the nucleotide resolution [24, 52]. miCLIP not only confirmed known tRNA and non-tRNA methylation sites of NSun2, but also identified novel sites in coding and non-coding RNAs .
Of the four system-wide techniques that have been used to identify m5C sites in the epitranscriptome (Figure 1) [18, 23, 24, 45], RNA bisulfite sequencing remains the gold standard. This is because it directly detects the enzymatic conversion of cytosines to 5-methylcytosines in RNA. However, RNA bisulfite sequencing has at least three major practical disadvantages: (1) cytosines are resistant to conversion when in double-stranded RNA and several modifications other than m5C inhibit the C to U conversion, leading to false positives ; (2) the chemical conversion of the nucleotides leads to degradation of the RNA and requires extreme deep sequencing to reveal methylated RNAs of low abundance; and (3) it reveals no information about substrate specificity when, for instance, specific RNA methylases are knocked out. For instance, it cannot distinguish direct from indirect targets or capture the cross-reaction of additional RNA methylases.
m5C-RIP, Aza-IP and miCLIP do not rely on chemical base conversion and avoid harsh chemical and thermal conditions that can lead to the degradation of target RNA. They also enrich for RNA substrates during the immunoprecipitation step. Therefore, RNAs with lower abundances can also be captured with standard deep sequencing protocols.
A key future goal in the m5C epitranscriptomics field is to gain a methylase-specific view of the mammalian m5C RNA methylome. The m5C-RIP approach has yet to be performed in a mammalian system and, like bisulfite sequencing, analysis of enzyme-specific m5C modification requires genetic deletion or knockdown of the RNA methylase being investigated. Aza-IP and miCLIP not only allow detection of enzyme-specific methylation, but also have the advantage that the covalently bound methylase-RNA complexes can be purified under higher stringency conditions, thus reducing the background of non-specifically bound RNAs.
The mechanism of cytosine-5 RNA methylation is highly conserved, and both Aza-IP and miCLIP are applicable, in principle, for most, if not all, m5C RNA methylases. Aza-IP relies on the chemical incorporation of 5-azacytidine into the entire transcriptome prior to isolation of protein-RNA complexes, which may compromise RNA stability and integrity. In addition, the analog will be incorporated into DNA and may alter transcriptional processes. Although Aza-IP was developed using overexpressed NSun2 and Dnmt2 proteins, it can in principle be performed on the endogenous methylases using the appropriate antibodies. In contrast, miCLIP depends on the overexpression of the mutant methylase protein, yet it does not involve any chemical modification of RNA, thus offering a distinct advantage. Since miCLIP only utilizes the last step of the catalytic process, in which the methylated substrate is released from the enzyme, it does not affect the methyl-transfer step, or the endogenous RNA structure, stability and integrity.
Particular consideration should also be given to the best bioinformatics approach to analyze different m5C sites. For instance, mapping the methylation sites in tRNAs is particularly challenging because of their abundance and repetitive nature [54, 55]. One possibility is to only use uniquely mapping reads, which may result in the omission of methylation sites in particularly repetitive tRNA sequences to allow for more accurate mapping. However, since tRNAs are highly repetitive sequences in the genome, this often leads to discarding a large proportion of the available data. Another strategy is to use uniquely mapping reads to simulate the distribution of reads in a given area, thereby assigning multiple mapping reads proportionally. Mapping m5C sites obtained by bisulfite sequencing is hampered by the bisulfite conversion itself and RNA degradation, which may alter the base content.
AzaIP also presents some difficulties for identifying the complete set of methylation sites - a large number of targets often do not contain C to G transversions, and so the methylation sites in those targets are not mapped. Mapping methylation sites in miCLIP, on the other hand, does not depend on such artificial modifications, but rather on the stalling of the reverse transcriptase at the cross-link/methylation site during the reverse transcription step. This phenomenon was utilized to efficiently map methylation sites.
While all four methods have their individual drawbacks, together they have the power to reliably detect m5C modifications. We will now turn our attention to comparing the cytosine-5 methylated RNAs that have been identified in the mammalian transcriptome using the different techniques.
The degree of overlap between the three studies was surprisingly low, and this may be due to several technical differences, as well as the use of different cell lines. Despite this low degree of overlap, the analyses found identical m5C sites in tRNAs and non-tRNAs and confirmed NSun2-dependent methylation sites in several ncRNAs. Our comparison further identified methylated cytosines in mRNAs commonly found in at least two of the three studies. However, we noted that these mRNA methylation sites often shared a genomic location with tRNA genes (see section below). Further optimization of the approaches is clearly required to achieve a comprehensive identification of high-confidence methylation sites.
tRNA and tRNA-like sequences are sometimes found embedded within the intronic sequence of an mRNA, or more rarely in its UTR regions. The significance of this is unknown, but in addition to being present in the pre-mRNA/mRNA sequence, it is also possible that these tRNA sequences are independently transcribed into functionally mature tRNAs. Our analysis showed that m5C sites mapping to commonly identified mRNAs can overlap with the genomic location of tRNA genes (Figure 2D). We determined how often miCLIP-identified mRNA methylation sites overlapped with the presence of tRNA genes. Most miCLIP-identified mRNA m5C sites occurred within exons, fewer within introns and more rarely in 5′ and 3′ UTRs (Figure 2D). Of those, the vast majority of m5C sites did not coincide with the occurrence of tRNA or tRNA-like sequences, and only five miCLIP-identified mRNAs contained tRNA sequences (Figure 2D). For example, Aza-IP identified SHF as the only statistically significant mRNA containing an NSun2-dependent m5C site , and this site was also identified by miCLIP (Figure 2C) . However, the m5C sites occurred in introns of the SHF mRNA and overlap with tRNA sequences (Figure 2E).
To address the question of whether methylation is taking place in the pre-mRNA/mRNA sequence or an independently transcribed tRNA, we focused on the m5C site within the CTS telomere maintenance complex component 1 (CTC1) mRNA as an example. The CTC1 m5C sites, one of which was also detected by bisulfite sequencing (Figure 2C), localized to the 3’ UTR of the mRNA (Figure 2F). We asked whether the miCLIP sequence reads extended further into the mRNA sequence beyond the tRNA regions. If the sequence reads did not extend further into the mRNA sequence, but were confined to the tRNA sequence, it would suggest that we had detected methylation in independently transcribed tRNAs. We found that miCLIP reads usually extended only slightly beyond the annotated tRNA genes (Figure 2G). Since the annotated tRNA genes do not include 5′ leader or 3′ trailer sequences, it is most likely that these small extensions overlap with tRNA 3′ trailer sequences, which are removed post-transcriptionally. None of the sequence reads extended beyond these tRNA-trailer sequence regions, indicating that these examples represent tRNA methylation rather than methylation of pre-mRNAs/mRNAs.
The existence of at least six more putative m5C RNA methylases in mammals may contribute to the general modest overlap between the NSun2- and Dnmt2-specific cytosine-5 methylation sites and the transcriptome-wide methylome [18, 23, 24]. The additional enzymes NSun1 and NSun3-7  are predicted to methylate RNA based on sequence conservation of key catalytic residues. Although the substrate specificities of these enzymes are unknown, NSun1 and NSun5, in addition to NSun2, have been identified as mRNA-binding proteins . The biological functions of the NSun protein family is largely unknown, although all of them are expressed during mouse embryogenesis . The transcripts of NSun2-7 are enriched in the developing brain, which is consistent with a proposed role in neurocognitive development .
The limited information we do have about the functional roles of NSun proteins indicates an essential role in a wide range of biological processes and human diseases. NSun1 (also called NOP2 or p120) is a nucleolar protein implicated in rRNA methylation and biogenesis. It has been shown to regulate the cell cycle and is involved in tumorigenesis [59–64]. Whether the enzymatic activity of NSun1 is directly required to mediate these cellular processes remains unclear. NSun4 interacts with mitochondrial transcription termination factor 4 (MTERF4) to control mitochondrial ribosomal biogenesis and translation [65, 66]. The NSUN5 gene is located in a genomic region deleted in patients with Williams-Beuren syndrome, a rare neurodevelopmental disorder . Patients with Williams-Beuren syndrome display intellectual disability and developmental delay, as well as craniofacial and cardiovascular abnormalities . Whether deletion of NSUN5 directly contributes to these symptoms is unknown. Mutations in the NSUN7 gene cause infertility in mice due to impaired sperm motility . Nothing is known regarding the functions of NSun3 and NSun6 proteins.
Together, RNA bisulfite sequencing, m5C-RIP, Aza-IP and miCLIP provide powerful tools to determine methylated substrates of all NSun protein family members. This will help to provide a more detailed and informative picture of a methylome-wide m5C landscape.
Although there are several types of RNA methylation , so far, transcriptome-wide approaches determining site-specific RNA methylation have only been performed for m6A and m5C. As already discussed, the m5C modification is most prominently associated with various ncRNAs, but recent studies suggest that it may also be a common modification of mRNAs [18, 24]. By contrast, most m6A sites were found in mRNAs and only rarely occurred in ncRNAs, though they were, for instance, found in various long intergenic non-coding RNAs (lincRNAs) . In mRNAs, both m6A and m5C modifications are commonly found in coding regions and UTRs rather than in introns [4, 24]. Enrichment of m6A sites is most pronounced near stop codons [3, 4], whereas m5C is not enriched at stop codons. Both m6A and m5C can occur in 3′ UTRs [4, 18], suggesting a potential role in regulating microRNA responses. It is intriguing to speculate that m6A and m5C modifications may cooperate in regulating RNA function and this should be a focus for future studies.
A key difference between the m6A and m5C transcriptome-wide studies is that m5C sites in RNA were experimentally determined at the nucleotide resolution, whereas the precise locations of m6A sites have to be computationally predicted [3, 4]. The chemistry of bisulfite sequencing makes it unsuitable for the detection of m6A and the lack of understanding of how the m6A modification is catalyzed makes experimental determination at the nucleotide resolution difficult. Identification of m5C sites at nucleotide resolution relies on a covalently bound enzyme-RNA catalytic intermediate, but it is not known whether m6A modification also occurs via the formation of a temporary covalent complex. Thus, the methods available to study m6A and m5C RNA modifications differ technically. The two studies analyzing m6A in the transcriptome used almost identical approaches, resulting in high similarity between two datasets [3, 4]. The development of novel techniques to identify RNA m6A sites may be beneficial to identify additional targets not yet detected by the RIP-seq approach. However, it is clear that the currently available techniques detecting m6A and m5C have vastly increased our knowledge regarding RNA methylation and have provided a solid platform for functional studies of these modifications.
Little is known about m6A RNA methyltransferases, except that they consist of several subunits and include the methyltransferase like 3 (METTL3) enzymatic component . No other enzyme has been identified that catalyzes RNA m6A modification in humans. A recent study showed that METTL3-dependent m6A RNA methylation in circadian transcripts is crucial for regulating their own negative transcription-translation feedback loop . Thus, the authors demonstrated a novel mechanism whereby RNA methylation regulates the circadian period and the function of the biological clock.
Enzymes associated with regulating both m6A and m5C modifications have also been associated with human diseases. Genetic polymorphisms causing upregulation of the m6A-demethylase fat mass and obesity-associated (FTO) gene are associated with increased body mass index and an increased risk of obesity . In addition, Fto knockout mice exhibit impaired synaptic transmission in dopaminergic neurons . Interestingly, m6A RIP-seq analyses on the mid-brain and striatum of Fto knockout mice revealed increased levels of m6A modification in a subset of mRNAs involved in dopamine signaling pathways. Given the intellectual disability and epilepsy phenotypes observed in NSun2-deficient patients, it is intriguing to speculate that the interplay of different RNA modifications may regulate neuronal signaling pathways. For example, one possibility is that mRNA methylation may regulate the translation rates of proteins that have been associated with synaptic plasticity. This hypothesis may, at least in part, explain the memory and learning deficits associated with the loss of NSun2 . In addition, spatiotemporal control of protein translation at the growth cones of terminally differentiated neurons during neuronal circuit formation is known to be an important regulator of axonal pathfinding and synapse formation. It may also be worth considering whether mRNA methylation plays a regulatory role in such events.
A regulatory role for NSun2 in protein translation is supported by the fact that we showed no correlation of potentially methylated mRNAs to their turnover or stability . In addition, in testicular round spermatids, NSun2 is a component of the chromatoid body, which is known to be a cytoplasmic subdomain for localized protein translation control . Simultaneous deletion of NSun2 and Dnmt2 impair protein synthesis and tRNA stability . The systematic co-ordination of protein translation steps occur during differentiation processes of embryonic stem cells , and aberrant translational control may contribute to the neurodevelopmental disease phenotypes in humans lacking functional NSun2 [19, 38, 39, 75]. In addition, mis-regulation of protein translation has been associated with other intellectual disability associated genes, such as fragile-X mental retardation 1 (FMR1) .
The most consistent NSun2-dependent m5C site identified in a non-tRNA is C69 in the vault RNA, vtRNA1.1 . Using NSun2-deficient patient-derived fibroblasts, we showed that methylation of vtRNA1.1 influenced its processing into microRNA-like molecules, which regulated the calcium channel, voltage-dependent, gamma subunit (CACNG)7 and CACNG8 mRNAs . Both CACNG7 and CACNG8 encode voltage-gated calcium channels, and genes encoding ion channels are often mutated in disorders associated with epilepsy and intellectual disability . We previously observed that the NSun2 protein is expressed in Purkinje neurons of the cerebellum as well as other neurons throughout the mouse brain . Mis-regulation of synaptic transmission in neurons of NSun2-deficient patients may contribute to the human disease.
Despite all the technical advances, further optimization of current methods is required to achieve much closer agreement of the site-specific deposition of m5C, especially in the non-tRNA targets identified by each method. Only then will we have a comprehensive characterization of the global distribution of m5C in RNA that will help to further define the functional roles of this modification.
The recently adapted single-molecule, real-time (SMRT) sequencing technologies for the detection of RNA sequences, together with the successful engineering of novel reverse transcriptases more sensitive to the different RNA modifications occurring in RNAs, may enable the identification of m5C transcriptome-wide without the need of any RNA processing step, not even cDNA conversion, which is usually biased and causes the loss of information about structure and base modifications [78, 79].
In summary, the current transcriptome-wide approaches for determining m5C methylation sites are furthering our understanding of this RNA modification in terms of regulatory functions and disease. Indeed, it may not be long before our expectations from understanding RNA m5C modification matches, or even exceeds, that of DNA m5C modification.
5-azacytidine-mediated RNA immunoprecipitation
Calcium channel, voltage-dependent, gamma subunit
CTS telomere maintenance complex component 1
Fat mass and obesity-associated
High performance liquid chromatography
Long intergenic non-coding RNAs
Methyltransferase like 3
Methylation-individual nucleotide resolution crosslinking immmunoprecipitation
Src homology 2 domain containing F
This article is published under license to BioMed Central Ltd.