Searching genomes for ribozymes and riboswitches
© BioMed Central Ltd 2007
Published: 30 April 2007
Skip to main content
© BioMed Central Ltd 2007
Published: 30 April 2007
New regulatory RNAs with complex structures have recently been discovered, among them the first catalytic riboswitch, a gene-regulatory RNA sequence with catalytic activity. Here we discuss some of the experimental approaches and theoretical difficulties attached to the identification of new ribozymes in genomes.
Catalysis by RNA was discovered a quarter of a century ago. The discoveries that certain introns were capable of self-splicing  and that the RNA moiety of bacterial ribonuclease P (RNase P) on its own could process precursor tRNAs  were the first indications that catalytic remnants of a postulated RNA world had persisted until the present day. By the late 1980s, the catalytic scope of RNA had been extended by the discovery of the so-called small nucleolytic ribozymes (or RNA-based enzymes). This family consists of four members: the hammerhead , the hairpin [4, 5], the hepatitis delta virus (HDV) [6, 7] and the Neurospora crassa Varkud satellite (VS) [8, 9] ribozymes. All the small nucleolytic ribozymes are involved in the processing of RNA replication intermediates and catalyze a simple RNA cleavage or ligation reaction.
In contrast to this simple reaction, self-splicing of the group I and group II introns involves two consecutive reaction steps (Figure 1c,d). The first frees the 3'-OH of the 5' exon, which allows, in the second step, an attack of the phosphodiester at the junction between the last residue of the intron and the first residue of the 3' exon. Self-splicing group I introns make use of the 3'-hydroxyl of an exogenous guanosine as the initial attacking nucleophile; the guanosine is phosphorylated in the reaction and released (Figure 1c). In the self-splicing group I introns, the formation of an intermediate with a 2',3'-cyclic phosphodiester bond has not been observed, probably because that might entail a loss of structural integrity in the spliced exons by the formation of 2',5'-phosphodiester connectivity in the second reaction step . A similar two-step strategy is adopted by the self-splicing group II introns [16, 17], but in this case the attacking nucleophile is the 2'-hydroxyl of the conserved intronic branchpoint adenosine (Figure 1d). While this forms an RNA lariat in the intron, the structural integrity of the connected exons is ensured. It should be noted that the splicing of tRNA introns in the Eukarya and the Archaea does not result from self-splicing as in the Bacteria, but starts with the action of an endonuclease, a protein enzyme, which leaves 2',3'-cyclic phosphate termini [18–20].
The persistence of the RNA world has been splendidly confirmed by the demonstration that the ribosome is a ribozyme - that is, the ribosomal RNA components are the catalytically active elements in polypeptide synthesis  - placing ribozyme activity at the heart of modern cells and showing that ribozymes could catalyze reactions other than the cleavage and ligation of RNA (Figure 1e). The first indications of catalytic RNA in the ribosome came from biochemical data  that showed persistence of ribosome catalytic activity after digestion and denaturation of the ribosomal proteins. The final proof that rRNA is the catalyst in protein biosynthesis came from crystallographic work that showed that the peptidyltransferase reaction center of the ribosome is devoid of any protein component, and is made up exclusively of rRNA residues .
The natural occurrence of ribozymes and riboswitches
Rfam accession number†
Group I intron
Thermus thermophila 
More than 20,000 sequences from all three kingdoms‡
Didymium iridis (branching enzyme, group I intron derivative) 
Group II intron
Saccharomyces cerevisiae mitochondria [17,18]
More than 8,000 sequences from all three kingdoms‡
Tobacco ringspot virus satellite RNA (sTRSV) 
Several additional satellite RNAs of plant viruses§
Viroids of the Avsunviroidae family [99,100]
Carnation small viroid-like RNA (CarSV RNA) 
Satellite DNAs of various amphibian species [102,103], Schistosoma mansoni  and Dolichopoda cave crickets 
Arabidopsis thaliana genome 
Tobacco ringspot virus satellite RNA (sTRSV) 
Two additional satellite RNAs of plant viruses: sCYMV and sARMV 
Human hepatitis delta virus RNA 
Homo sapiens genome (intronic) 
Escherichia coli 
More than 1,000 sequences from various Bacterial phyla¶
Archeal phyla: Crenarchaeota, Euryarchaeota 
Neurospora crassa Varkud satellite 
Bacillus subtilis 
Bacterial phyla: Actinobacteria, Firmicutes
B. subtilis 
Bacterial phyla: Proteobacteria, Firmicutes
E. coli and Salmonella typhimurium btuB mRNAs 
Bacterial phyla: Actinobacteria, Proteobacteria, Deinococcus-thermus, Bacteroidetes, Spirochaetes, Chloroflexi, Firmicutes, Fusobacteria, Cyanobacteria, Thermogales
Flavin mononucleotide (FMN)
20 Gram-positive and Gram-negative bacteria [23,24,108]
Bacterial phyla: Actinobacteria, Deinococcus-thermus, Thermus/deinococcus group, Proteobacteria, Firmicutes, Thermotogae, Fusobacteria, Thermogales
B. subtilis 
Bacterial phyla: Proteobacteria, Firmicutes
B. subtilis 
Bacterial phyla: Actinobacteria, Proteobacteria, Fusobacteria, Firmicutes
B. subtilis [25,110-112]
Bacterial phyla: Proteobacteria, Thermogales, Firmicutes
S. enterica 
B. subtilis [114-116]
Bacterial phyla: Cyanobacteria, Actinobacteria, Proteobacteria, Firmicutes
Thiamine pyrophosphate (TPP)
Rhizobium etli 
Bacterial phyla: Actinobacteria, Deinococcus-thermus, Bacteroidetes, Proteobacteria, Thermus/deinococcus group, Spirochaetes, Chloroflexi, Firmicutes, Fusobacteria, Cyanobacteria, Thermogales
Eukaryal phyla: Metazoa, Cercozoa, Fungi, Viridiplantae
Archeal phyla: Euryarchaeota
Riboswitches are bimodular RNAs that are made up of a ligand-binding region (an aptamer) and a domain that controls gene expression. They are usually located in the 5' untranslated regions of bacterial mRNAs, where they control the expression of the gene by binding a low molecular weight metabolite that triggers a conformational change in the RNA [23–26]. In recent years, many of these genetic control elements have been discovered, and it has become clear that they are structurally and functionally highly diverse [27, 28]. Riboswitches control gene expression at both the transcriptional and translational levels, and can act as 'on' or 'off' switches. The majority of riboswitches are negative control elements, and among these, the first catalytic riboswitch discovered - glmS  - employs the ultimate method of switching off gene expression: when it binds its cognate ligand it cleaves itself, thus destroying the function of the mRNA of which it is a part.
The biological function of other recently discovered catalytic RNAs is less clear. Using an ingenious in vitro selection scheme, Szostak and co-workers  recently discovered an HDV-ribozyme-like element in an intron of a human mRNA and have demonstrated its biochemical activity. In this scheme, a library of uniformly sized, small circular DNAs was used as templates for rolling-circle transcription; self-cleaving RNAs can thus be identified by the appearance of unit-length RNA fragments. Cedergren and co-workers identified and biochemically characterized hammerhead ribozymes in the genomes of schistosomes  and cave crickets , and, using database searching, our group recently identified novel examples of hammerhead ribozymes  and found two hammerhead sequences encoded at distinct loci in the genome of Arabidopsis thaliana that we have characterized as catalytically active in vitro and in vivo .
A ribozyme with a new branching activity, GIR1, has recently been experimentally identified in slime molds . On the basis of its secondary structure, this ribozyme belongs to the group I intron family. It carries out the first cleavage step of a group II intron, however, leading to the formation of a small lariat with a 2',5'-linkage at the 5' end of the endonuclease mRNA of which it forms a part, thereby protecting the message from exonuclease degradation. Thus, in this case, a similar secondary structure scaffold is the basis for two ribozymes catalyzing different chemical reactions: activation of an internal O2'-hydroxyl group in the case of the new ribozyme compared with activation of an O3'-hydroxyl group of an external cofactor for the rest of the group I intron family (Figure 1c). This is yet another example of the fact that similar RNA sequences can assume two different folds and catalyze two different chemical reactions, as shown by Schultes and Bartel . Minor variations could convert a starting sequence into either of these highly active ribozymes, demonstrating that the evolving paths of RNA sequence can easily cross in sequence space. Similarly, RNA folds recognizing different ligands may be very close in sequence space : for example, a small series of 'neutral' mutations (that is, mutations that have no effect on secondary structure) transformed a flavin-binding aptamer into a GMP-binding aptamer . Extensive networks of neutral variation in sequence space interconnect RNA regions with similar function and structure [50, 51], as confirmed by the recent elucidation of more three-dimensional RNA structures (see [43, 52] for reviews).
It is now recognized that the most common RNA-RNA binding contact is the so-called A-minor motif . This occurs between two contiguous adenines in one partner RNA and the shallow/minor groove side of two stacked Watson-Crick pairs in the other. An analysis of tertiary contacts shows that the contiguous adenines can originate from a variety of local environments (for example, bulging, apical or internal loops) and that the only molecular recognition requirement in the receptor RNA is the presence of two Watson-Crick base pairs [54, 55]. In other words, coupled to the vast shape space accessible through mutations neutral for secondary structure, there are weak but crucial sequence constraints imposed by the tertiary contacts. In RNA architectures, the additional structural constraints originate from the topology of the secondary structure (junctions of helices, number of base pairs within helices, and so on). In short, RNA sequences (and thus their structure and function) are characterized by neutrality at all levels from molecular recognition between motifs to secondary structure and three-dimensional architecture.
Are there more ribozymes that catalyze 2',5'-phosphodiester bond formation or cleavage to be discovered? Scattered evidence of the occurrence of 2',5'-bonds exists throughout the literature. A 2',5'-phosphodiester bond was observed in vitro  and in vivo  during circularization of the genome of the peach latent mosaic viroid, and the HDV ribozyme, unlike the hammerhead ribozyme, has been shown to cleave 2',5'-linkages efficiently .
Novel catalytic RNA entities can, in principle, be looked for either by database searches using defined consensus motifs from a given ribozyme or by experimentally testing candidate RNAs for biochemical activity. Both approaches have advantages and disadvantages. Database searches require RNA sequence alignments (as produced, for example, by Rfam ) coupled with covariance analysis [63–67]. The quality of the sequence alignment is central to this process, however, and not many databases are as carefully hand-curated as the RNase P database . In database screening, the definition of what we consider to be the consensus motif of a given catalytic RNA is crucial. Even if a catalytic RNA motif is well defined, searches are complicated by the requirement to combine a complex assembly of structural (hairpin) and sequence information, which prevents simple solutions such as purely sequence-based homology searches. Generally, the tools available adequately identify isolated hairpins . Given a pattern description for a catalytic RNA motif, several programs, such as PatScan  or RNAMOT , can be used to screen the public databases. Hits from such searches require further analysis, and initially, a calculation of the secondary structure is necessary - although usually not sufficient. A secondary structure, calculated using a program such as RNAfold , is predictive if the required helical elements of the RNA motif under consideration will form in the hit sequence. Secondary-structure prediction programs have difficulty in accurately predicting large structures, however, and can also produce vast numbers of alternative structures when scanning whole genomes [73, 74].
For individual sequences found in a database search, a test of their particular biochemical activity (Figure 1) might be sufficient. However, functionally similar RNA molecules frequently exhibit numerous and highly divergent sequence insertions or deletions that interrupt the pattern of secondary-structure motifs and render the computer description of a given motif inadequate for finding sequences with similar activity. Furthermore, the use of pattern-description programs is incomplete if the complexity of the RNA structure - which goes way beyond the Watson-Crick base pairing  - is not taken into consideration. These issues, and whether the additional, essential tertiary interactions of a given RNA motif will form, can be addressed by a combination of comparative analysis of similar ribozymes with isostericity matrices, which give the geometrically equivalent base pairs for each particular type of base-base interaction . All pairwise base-base interactions present in nucleic acids have been classified into 12 families, where each family is a 4 × 4 matrix of the bases A, G, C, and U . This classification allows the deduction of all possible geometrically equivalent base pairs in a given family. The isostericity matrices have been verified for several RNA motifs using structural alignments anchored in crystal structures . Thus, for assumed structurally homologous positions in an RNA motif, one can compare the resulting pairwise interactions with the known isostericity matrices to assess the validity of an RNA motif assignment in an alignment . As this type of analysis is an iterative process, it is worth noting that it might also lead to refinement and extension of the pattern of the consensus motif that the search was started with. If applied to large assemblies of sequence information, as has been done for the kink-turn and C-loop RNA motifs , this approach allows a broader description (the comprehensiveness of which is currently unknown) and refinement of a given motif.
The analysis of co-variation of nucleotides in sequence alignments underlies most manual or automated secondary-structure determination. However, high sequence conservation (which is usually considered a marker for conservation of function) leads to serious ambiguities and difficulties in deriving secondary structures. The catalytic riboswitch glmS is a good example: the crystal structure  presents a different secondary structure from that deduced from sequences. The new helices involve pairings between segments, conserved at more than 95% in sequence, and thus giving no co-variation signal. The requirement for a well-defined RNA motif in database searches is also an intrinsic limitation of this approach.
As pointed out earlier, most of the reactions known to be naturally catalyzed by RNA (Figure 1) involve the breakage or formation of 3',5' (and occasionally 2',5') phosphodiester bonds. RNA has the potential to catalyze other chemical reactions, however. As well as peptide formation in the ribosome, Diels-Alder cycloaddition  and Michael addition  can be catalyzed by RNA, as shown by in vitro Darwinian evolution. Thus, reactions catalyzed by RNA in nature might be more diverse than currently known. The discovery of such activities is likely to be serendipitous and made by keen observers of RNA molecular behavior.
New small or large noncoding RNAs are regularly being discovered in both bacteria and mammals. Recent evidence shows that most of the mammalian genome is transcribed in complex patterns, producing tens of thousands of novel transcripts [82, 83]. Novel RNAs are regularly predicted on the basis of their sequence conservation or secondary-structure elements [84–87]. But these predictions do not utilize information on the non-Watson-Crick base pairing or tertiary structure so crucial to the activity of many ribozymes, and, as discussed above, these features are often not well conserved in the sequence. Nor do the predictive algorithms used give any indication of what the RNA function might be. Vertebrate genomes contain a large number of conserved noncoding elements (CNEs) or ultraconserved elements [88, 89], whose biological functions and mechanisms of action remain to be established. The evidence for transcription of most of these conserved elements is, however, still scanty [89–91]. In any case, the recent additions to the list of natural catalytic RNAs indicate that there are likely to be many more to come; new algorithms will be required that use all available information to identify and classify them.
We thank François Michel (CGM, CNRS, Gif-sur-Yvette) and Michael Pheasant (IMB, University of Queensland, St Lucia) for constructive comments on the manuscript and Neocles Leontis (Bowling Green, OH) for numerous discussions. CH acknowledges support by the Deutsche Forschungsgemeinschaft (grant HA3459-3) and by the EU-STREP Fosrak and EW support by grants ANR-05-BLAN-0331-04 and CEE BAC RNA:LSHG-CT-2005-018618.