Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes
© Balasubramanian et al.; licensee BioMed Central Ltd. 2009
Received: 21 November 2008
Accepted: 5 January 2009
Published: 5 January 2009
The availability of genome sequences of numerous organisms allows comparative study of pseudogenes in syntenic regions. Conservation of pseudogenes suggests that they might have a functional role in some instances.
We report the first large-scale comparative analysis of ribosomal protein pseudogenes in four mammalian genomes (human, chimpanzee, mouse and rat). To this end, we have assigned these pseudogenes in the four organisms using an automated pipeline and make the results available online. Each organism has a large number of ribosomal protein pseudogenes (approximately 1,400 to 2,800). The majority of them are processed (generated by retrotransposition). However, we do not see a correlation between the number of pseudogenes associated with a ribosomal protein gene and its mRNA abundance. Analysis of pseudogenes in syntenic regions between species shows that most are conserved between human and chimpanzee, but very few are conserved between primates and rodents. Interestingly, syntenic pseudogenes have a lower rate of nucleotide substitution than their surrounding intergenic DNA. Moreover, evidence from expressed sequence tags indicates that two pseudogenes conserved between human and mouse are transcribed. Detailed analysis shows that one of them, the pseudogene of RPS27, is likely to be a protein-coding gene. This is significant as previous reports indicated there are exactly 80 ribosomal protein genes encoded by the human genome.
Our analysis indicates that processed ribosomal protein pseudogenes abound in mammalian genomes, but few of these are conserved between primates and rodents. This highlights the large amount of recent retrotranspositional activity in mammals and a relatively larger amount of it in the rodent lineage.
Pseudogenes are DNA sequences similar to genes encoding functional proteins, but are presumed to be nonfunctional due to mutations and truncation by premature stop codons. In this study, we focus on the largest family of pseudogenes, processed pseudogenes of ribosomal proteins (RPs). Previous in silico studies have shown that the human genome consists of thousands of processed RP pseudogenes, although there is only one functional gene for each of the 80 human RPs, with the exception of three functional RP retrotransposons [1–5]. The availability of numerous whole genome sequences presents us an opportunity to do a comparative analysis of these pseudogenes in various organisms.
Processed pseudogenes are formed by reverse transcription and integration of processed mRNA into the genome. In the case of human processed pseudogenes, their integration into the genome has been shown to be mediated by L1 transposons and this is believed to be the primary mechanism by which they are generated . We chose to focus on RP pseudogenes because they constitute the largest family of pseudogenes (approximately 2000 RP processed pseudogenes). RP genes are constitutively expressed at reasonably stable levels and are very highly conserved. In addition, RPs have high levels of sequence conservation among various species, which enables us to trace lineages of their pseudogenes easily . The large dataset of RP pseudogenes in conjunction with several completely sequenced genomes allows us to identify orthologous ribosomal pseudogenes in syntenic regions.
Sakai et al.  estimate that processed pseudogenes are formed at a rate of about 1-2% per gene per million years based on the analysis of processed pseudogenes in human and mouse genomes. Gene duplications occur at a predicted rate of 0.9% per gene per million years in the human genome and are believed to be an important resource for genome evolution. Therefore, they suggest that processed pseudogenes might also play a role in increasing genome diversity, similar to duplication events.
To date, there has been no systematic evaluation of processed pseudogenes in syntenic regions on a large scale. While a study on kinases indicated that processed pseudogenes are not conserved between human and mouse, this study pertains to a very small sample size of about 100 kinase pseudogenes . Suyama et al.  identified and annotated genes and duplicated pseudogenes under the assumption that processed pseudogenes will not be found in syntenic regions. However, there is no a priori reason to expect this. In fact, many studies have identified transcribed processed pseudogenes both by in silico methods as well as targeted experimental analyses. Harrison et al.  analyzed expressed sequence tag (EST) and microarray expression data and came up with a list of about 200 processed pseudogenes that are transcribed in the human genome. The ENCODE consortium experimentally validated transcription of some pseudogenes. They annotated 201 pseudogenes in the ENCODE regions; two-thirds of these pseudogenes were processed. It was shown that at least a fifth of the 201 pseudogenes were transcribed based on pseudogene-specific RACE (rapid amplification of cDNA ends) analyses combined with results obtained from tiling microarray data and high throughput sequencing . Recently, two studies have shown that processed pseudogenes regulate gene expression by means of the RNA interference pathway in mouse oocytes [13, 14]. Another study has shown that some ABC transporter pseudogenes are transcriptionally active. They have also shown that the gene expression of an ABC transporter protein is regulated by the expression of its pseudogene in the human genome . Thus, processed pseudogenes are emerging as interesting elements in the genomic landscape capable of being potentially functional.
An elegant study showed that a small number of pseudogenes with high sequence identity to the parent protein are conserved between human and mouse . They suggest that the conservation of sequence in such pseudogenes with high identity to their parent despite being 70 million years old (time of human-mouse divergence) implies a functional role for such pseudogenes. Based on expression evidence and the fact that these conserved sequences are found in syntenic regions between human and mouse, they catalogued a set of 20 pseudogenes that could be potentially functional. The 20 pseudogenes included only two processed pseudogenes that are conserved between human and mouse. The large family of RP processed pseudogenes and the availability of whole genome sequences of many organisms allow us to perform a comprehensive and systematic comparative analysis of RP processed pseudogenes in sytenic regions. It is conceivable that some of them would be conserved across species if they were biologically relevant. RP pseudogenes present a specific problem in that they are often annotated mistakenly as genes due to very high sequence similarity to the parent protein. Here, we use the method developed to identify RP pseudogenes , which is elaborated in the Materials and methods section.
For this study, we identified processed RP pseudogenes in four genomes - human, chimpanzee, mouse and rat - using an automated pipeline . We investigated the degree to which processed RP pseudogenes are conserved among the four species. While a significant number of papers have addressed the global synteny between human, chimpanzee, mouse and rat based on DNA sequence alignments, we do not have comprehensive data on detailed local synteny [18–21]. In order to identify well-defined syntenic regions, we defined syntenic regions as sequences conserved in position between orthologous gene pairs. This is similar to the methods used by others where synteny has been derived based on local gene orthology [10, 22].
Results and discussion
Catalogue of ribosomal protein pseudogenes
Total number of processed RP pseudogenes in human, chimpanzee, mouse and rat genomes identified by the pipeline 
The number of processed pseudogenes associated with each RP for the four organisms is shown in Additional data file 2. Our analysis is primarily focused on the major group of pseudogenes, processed pseudogenes that are at least 70% long compared to their parent proteins. Calculations that included pseudogenic fragments and low confidence matches did not affect the comparative results obtained [1, 23]. Moreover, we are interested in identifying candidate pseudogenes that are exceptionally well conserved over a long time period. It is clear that all four genomes are replete with processed RP pseudogenes. The human, chimpanzee, mouse and rat genomes contain 1,822, 1,462, 2,092 and 2,848 processed RP pseudogenes, respectively. The length of coding sequence associated with each human RP gene is included in parentheses in Additional data file 2; these clearly show that the number of pseudogenes arising from a RP gene is not influenced by mRNA length. Our assignments can be downloaded from . The number of pseudogenes per RP varies dramatically from a few in number to over a hundred in some cases. The higher number of processed RP pseudogenes in rat and mouse may reflect the reported higher rates of retrotranspositional activity in the rodent lineage [18, 20].
Analysis of expression levels
Identification and analysis of syntenic pseudogenes
Number of processed RP pseudogenes found in syntenic regions
Number of processed RP pseudogenes in syntenic regions
Sequence divergence of pseudogenes
Nucleotide substitution analysis
Comparison of number of nucleotide substitutions per site between pseudogenes and intergenic sequences in syntenic regions of human and mouse
Human chromosomal location
Mouse chromosomal location
Careful manual analysis of the human-mouse syntenic pseudogenes indicates that the pseudogene of RPS27 is very likely to be a functional protein-coding gene (RPS27L) highly similar to RPS27. The proteins encoded by human RPS27 and RPS27L are the same length (84 amino acids) and differ at only three residues (5, 12 and 17). The similarity of these two loci at the amino acid level suggests that either RPS27 or RPS27L arose via duplication of the other locus. This is further supported by the arrangement of flanking genes; both RPS27 and RPS27L are flanked on one side by RAS oncogene family genes (RAB13 for RPS27, RAB8B for RPS27L) in the same tail to tail arrangement. However, genes on the other flank are different (nucleoporin 210 kDa-like (NUP210L) for RPS27, lactamase, beta (LACTB) for RPS27L) and intronic conservation is very low. Very low conservation of intronic and flanking sequence suggests that any duplication event was not recent and this is supported by the conservation of synteny; LACTB/RPS27L/RAB8B is conserved in chimp, macaque, mouse, dog, cow and monodelphis (but not rat, chicken, Xenopus or zebrafish) and RAB13/RPS27/NUP210L shows a very similar pattern of conservation (although this synteny is conserved in rat). Further support for function comes from the strong evidence of transcription at the RPS27L locus, which is seen in both the human and mouse genomes as well as other vertebrates (Figure 7 in Additional data file 1). This is a significant finding because eighty ribosomal proteins in the human genome have been carefully mapped and the RPS27-like gene has not been identified in this study . The comprehensive Ribosomal Protein Gene database, which catalogues RP data for several organisms, does not include this gene . Thus, this serendipitous finding provides the basis for further experimental study of the RPS27L locus.
Of the 1,282 human-chimp pseudogne pairs found in syntenic regions, 545 pairs are found within introns of genes. After excluding this group of intronic pseudogenes, we calculated the number of nucleotide substitutions per site in pseudogenes and the intergenic DNA surrounding the pseudogenes. The average number of substitutions per site since the human-chimpanzee divergence is 0.020 and 0.075 in pseudogenes and intergenic regions, respectively. Substitutions in pseudogenes are significantly slower than their neighboring intergenic sequences (p << 0.001, pairwise t-test). We find that the pseudogenes evolve slower than the surrounding intergenic DNA. This implies that the pseudogenes conserved in human and chimpanzee might be under some biological constraint.
Analysis of decayed pseudogenes
It has been noted that 22% of the human genome is composed of ancient repeats, in contrast to a corresponding number of 5% in the mouse genome . It has been rationalized that the fast mutation rates in mouse makes such sequences undetectable. Therefore, it is difficult to identify very decayed pseudogenes. Previous studies indicate that our method used to identify pseudogenes in the human genome is fairly robust and that the cutoffs chosen for various parameters are optimal . We have performed a similar analysis for the mouse genome. Our results indicate that we have comprehensively identified all the pseudogenes in the mouse genome (data included in Additional data file 1). In our current analyses, less than 20% of RP pseudogenes are classified as either fragments or low confidence matches in human, chimp, mouse and rat genomes (Table 1). Thus, only a very few ribosomal pseudogenes represent substantially decayed pseudogenes. Nonetheless, we analyzed human and mouse pseudogenic fragments to ensure the inclusion of older pseudogenes that would have decayed significantly in our analysis. Of the 326 mouse pseudogenic fragments, only one has a corresponding human pseudogene in syntenic regions. None of the low confidence matches in human and mouse genomes had corresponding pseudogenic matches in syntenic regions. Thus, the analyses of all classes of pseudogenes - the longer processed pseudogenes (length ≥ 70% of parent protein), pseudogenic fragments (length <70% of parent protein) and the low confidence matches - indicate that there is very little preservation of processed RP pseudogenes between human and mouse.
We have systematically analyzed the conservation of processed pseudogenes across four species by looking at a large family of RP processed pseudogenes in syntenic regions. This is the first large-scale comparative analysis of processed pseudogenes. This analysis indicates that while processed RP pseudogenes abound in both human and rodent species, there is virtually no preservation of processed RP pseudogenes between human and rodents. The divergence of RP pseudogenes from their parent genes indicates that most pseudogenes in rodents are of recent origin. This is in line with the reported increased retrotranspositional activity in rodents relative to humans and in accordance with research that indicates that retrotransposition in the hominid lineage has decreased significantly over the past 40 million years [18, 30–32]. Our result is also consistent with the previous report that showed that about 80% of all human processed pseudogenes are primate-specific sequences . We did not detect older RP pseudogenes that may have originated from a common ancestor to man and mouse due to faster neutral substitution and higher deletion rates in rodents. Our analyses show that either RP processed pseudogenes present in the human-rodent ancestors have been deleted in current human and mouse/rat genomes or they have decayed significantly beyond recognition by our methods. The RP pseudogenes detected by our methods are predominantly of recent origin and arose by independent lineage-specific retrotranspositional activities. Interestingly, both in the case of human-mouse and human-chimpanzee, the syntenic processed RP pseudogenes appear to have evolved slower than neutral DNA. This is suggestive of a potential biological role for the conserved syntenic pseudogenes. EST evidence of transcription in both human and mouse, together with strong conservation of exons and evidence of transcription in many vertebrates, indicates that RPS27L, identified as a pseudogene, is likely to be a functional gene.
Materials and methods
Synteny based on gene orthology
We derived syntenic regions based on the criterion that syntenic regions in two species should have corresponding orthologs of genes on the two sets of chromosomes. We obtained syntenic blocks based on gene orthology between two organisms as follows: first, we located the genes on either side of a pseudogene; second, we identified the corresponding orthologous genes in the second organism - the human gene annotations and their ortholog annotations in the other organisms were directly extracted from Ensembl release 36 ; third, the region encapsulated between the two sets of orthologous genes on either side of the pseudogene constitutes a syntenic block.
Figure 3 illustrates the methodology used to define syntenic regions between human and mouse. This method defines syntenic regions rather conservatively. To make it less restrictive, we did not constrain the search to include only immediate neighboring genes. We allowed any two regions to be syntenic provided the RP pseudogene was sandwiched between a set of orthologous gene pairs on either side. This means that as long as we were able to find a pair of orthologous genes on either side of the pseudogene irrespective of any number of intervening genes with no orthologs in the other organism, we still defined it as a syntenic block. Thus, this method does not take into consideration potential loss of local synteny due to recombination and chromosomal rearrangements. Recombination rates are non-uniform across the genome and vary depending on the species . Moreover, segmental duplications of varying nature in different species will also affect synteny mapping . Despite these limitations, control calculations designed to test how well random genomic DNA could be located between orthologous gene regions showed that large scale synteny is largely preserved, similar to the earlier large scale genome-wide alignments . We validated this method using two different controls as discussed below.
First, we evaluated how well this method performed by identifying orthologous RP genes between human and mouse in syntenic regions. Of the 79 orthologous RP genes, 76 were identified in syntenic regions. Thus, 96% of the RP genes were identified in syntenic regions. Second, we also looked at the occurrence of 1,000 bp DNA sequences extracted randomly from the genome in syntenic regions to evaluate the extent to which chromosomal rearrangements might affect the identification of syntenic blocks. We chose 1,000 bp regions from the chimp and mouse genomes and identified syntenic blocks around these regions. We found 94% and 86% of such randomly chosen 1,000 bp regions from the chimp and mouse genomes, respectively, to be syntenic to the human genome. A similar control calculation also showed that 86% of randomly chosen 1,000 bp mouse regions were found in syntenic regions of the rat genome. Sample sizes >10,000 were used for these validations. These results indicate that a significant portion of the genomes can be found in syntenic blocks and the errors that might arise due to chromosomal rearrangements are small. Thus, this method of finding syntenic blocks based on gene orthology is fairly robust and provides a good way to identify pseudogenes in syntenic regions.
Identification of processed RP pseudogenes
We identified processed RP pseudogenes in four organisms - human, chimpanzee, mouse and rat - using a well-established automated pipeline for identification of pseudogenes [1, 17]. In a nutshell, this involves identification of pseudogenes based on sequence homology to RPs. The pipeline procedure was modified a little as described here. One of the pipeline steps uses gene annotations to filter out genes from pseudogene candidate sequences. Many RP pseudogenes are often mistakenly annotated as genes in gene annotation databases, including Ensembl , and because there are an unusually large number of processed RP pseudogenes, most of them are highly similar to their parent protein. Therefore, we decided to use pseudopipe without reference to RP gene annotations from Ensembl. Instead, we used RP sequences from the Ribosomal Protein Gene database as input and considered the RP genes annotated in this database as the only functional genes . The human, chimp, mouse and rat genome versions corresponding to the assembly in Ensembl release 36 were used as input for the pipeline.
We calculated the nucleotide sequence divergence between the parent RP gene and each pseudogene using the evolutionary analysis package MEGA3 . We calculated the evolutionary distance between the parent RP gene and each pseudogene following the Kimura 2-parameter model . The distance is a measure of the number of nucleotide substitutions per site.
Nucleotide substitution analysis for syntenic pseudogenes
We calculated the number of nucleotide substitutions per site since the human-chimpanzee divergence and human-mouse divergence for each pair of corresponding syntenic pseudogenes using the Kimura 2-parameter model . Pairs of syntenic pseudogenes between human and chimpanzee and human and mouse were aligned by ClustalW for this analysis . We also performed similar calculations on intergenic DNA by aligning 10 kb of intergenic DNA surrounding the syntenic pseudogene on either side. Gaps in alignments were regarded as transversions for this analysis, where only the first gap in an indel was included and the rest were not counted. For this analysis, we excluded pseudogenes that are within introns of genes as intronic sequences are known to be conserved  and would not serve as a good model for neutrally drifting DNA.
Evidence for transcription
We used EST data from dbEST for verifying if human and mouse pseudogenes in syntenic regions are transcribed . For evidence of transcription, we required a stringent 100% sequence identity of the EST transcripts to the matched region. In cases of less than 100% sequence identity, we required that the EST match the pseudogene better than the parent gene or any other region in the genome.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 includes details on the sensitivity of our method for pseudogene identification and the detailed analysis of one of the human-mouse syntenic pseudogenes that appears to be a protein-coding gene. Additional data file 2 includes a table showing the number of processed pseudogenes associated with each RP gene for human, mouse, chimpanzee and rat.
expressed sequence tag
SB thanks the anonymous reviewer for helpful comments and Ekta Khurana for valuable discussions. This work was funded by a grant from NIH, grant number 5U54HG004555-02.
- Zhang Z, Harrison P, Gerstein M: Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res. 2002, 12: 1466-1482. 10.1101/gr.331902.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Carriero N, Gerstein M: Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet. 2004, 20: 62-67. 10.1016/j.tig.2003.12.005.PubMedView ArticleGoogle Scholar
- Uechi T, Tanaka T, Kenmochi N: A complete map of the human ribosomal protein genes: assignment of 80 genes to the cytogenetic map and implications for human disorders. Genomics. 2001, 72: 223-230. 10.1006/geno.2000.6470.PubMedView ArticleGoogle Scholar
- Kenmochi N, Kawaguchi T, Rozen S, Davis E, Goodman N, Hudson TJ, Tanaka T, Page DC: A map of 75 human ribosomal protein genes. Genome Res. 1998, 8: 509-523.PubMedGoogle Scholar
- Uechi T, Maeda N, Tanaka T, Kenmochi N: Functional second genes generated by retrotransposition of the X-linked ribosomal protein genes. Nucleic Acids Res. 2002, 30: 5369-5375. 10.1093/nar/gkf696.PubMedPubMed CentralView ArticleGoogle Scholar
- Esnault C, Maestre J, Heidmann T: Human LINE retrotransposons generate processed pseudogenes. Nat Genet. 2000, 24: 363-367. 10.1038/74184.PubMedView ArticleGoogle Scholar
- Nakao A, Yoshihama M, Kenmochi N: RPG: the Ribosomal Protein Gene database. Nucleic Acids Res. 2004, 32: D168-170. 10.1093/nar/gkh004.PubMedPubMed CentralView ArticleGoogle Scholar
- Sakai H, Koyanagi KO, Imanishi T, Itoh T, Gojobori T: Frequent emergence and functional resurrection of processed pseudogenes in the human and mouse genomes. Gene. 2007, 389: 196-203. 10.1016/j.gene.2006.11.007.PubMedView ArticleGoogle Scholar
- Caenepeel S, Charydczak G, Sudarsanam S, Hunter T, Manning G: The mouse kinome: discovery and comparative genomics of all mouse protein kinases. Proc Natl Acad Sci USA. 2004, 101: 11707-11712. 10.1073/pnas.0306880101.PubMedPubMed CentralView ArticleGoogle Scholar
- Suyama M, Harrington E, Bork P, Torrents D: Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes. PLoS Comput Biol. 2006, 2: e76-10.1371/journal.pcbi.0020076.PubMedPubMed CentralView ArticleGoogle Scholar
- Harrison PM, Zheng D, Zhang Z, Carriero N, Gerstein M: Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res. 2005, 33: 2374-2383. 10.1093/nar/gki531.PubMedPubMed CentralView ArticleGoogle Scholar
- Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei CL, Gingeras TR, Guigo R, Harrow J, Gerstein MB: Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Res. 2007, 17: 839-851. 10.1101/gr.5586307.PubMedPubMed CentralView ArticleGoogle Scholar
- Tam OH, Aravin AA, Stein P, Girard A, Murchison EP, Cheloufi S, Hodges E, Anger M, Sachidanandam R, Schultz RM, Hannon GJ: Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature. 2008, 453: 534-538. 10.1038/nature06904.PubMedPubMed CentralView ArticleGoogle Scholar
- Watanabe T, Totoki Y, Toyoda A, Kaneda M, Kuramochi-Miyagawa S, Obata Y, Chiba H, Kohara Y, Kono T, Nakano T, Surani MA, Sakaki Y, Sasaki H: Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature. 2008, 453: 539-543. 10.1038/nature06908.PubMedView ArticleGoogle Scholar
- Piehler AP, Hellum M, Wenzel JJ, Kaminski E, Haug KB, Kierulf P, Kaminski WE: The human ABC transporter pseudogene family: Evidence for transcription and gene-pseudogene interference. BMC Genomics. 2008, 9: 165-10.1186/1471-2164-9-165.PubMedPubMed CentralView ArticleGoogle Scholar
- Svensson O, Arvestad L, Lagergren J: Genome-wide survey for biologically functional pseudogenes. PLoS Comput Biol. 2006, 2: e46-10.1371/journal.pcbi.0020046.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M: PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics. 2006, 22: 1437-1439. 10.1093/bioinformatics/btl116.PubMedView ArticleGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.PubMedView ArticleGoogle Scholar
- Chimpanzee Sequencing and Analysis Consortium: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437: 69-87. 10.1038/nature04072.View ArticleGoogle Scholar
- Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera , Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, et al: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428: 493-521. 10.1038/nature02426.PubMedView ArticleGoogle Scholar
- Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D: Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA. 2003, 100: 11484-11489. 10.1073/pnas.1932072100.PubMedPubMed CentralView ArticleGoogle Scholar
- Goodstadt L, Ponting CP: Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol. 2006, 2: e133-10.1371/journal.pcbi.0020133.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Harrison PM, Liu Y, Gerstein M: Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 2003, 13: 2541-2558. 10.1101/gr.1429003.PubMedPubMed CentralView ArticleGoogle Scholar
- Ribosomal Pseudogenes. [http://www.pseudogene.org/ribosomal-protein]
- Goncalves I, Duret L, Mouchiroud D: Nature and structure of human genes that generate retropseudogenes. Genome Res. 2000, 10: 672-678. 10.1101/gr.10.5.672.PubMedPubMed CentralView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
- Pavlicek A, Gentles AJ, Paces J, Paces V, Jurka J: Retroposition of processed pseudogenes: the impact of RNA stability and translational control. Trends Genet. 2006, 22: 69-73. 10.1016/j.tig.2005.11.005.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004, 5: 150-163. 10.1093/bib/5.2.150.PubMedView ArticleGoogle Scholar
- Wu CI, Li WH: Evidence for higher rates of nucleotide substitution in rodents than in man. Proc Natl Acad Sci USA. 1985, 82: 1741-1745. 10.1073/pnas.82.6.1741.PubMedPubMed CentralView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.PubMedView ArticleGoogle Scholar
- Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N: Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 2003, 4: R74-10.1186/gb-2003-4-11-r74.PubMedPubMed CentralView ArticleGoogle Scholar
- Marques AC, Dupanloup I, Vinckenbosch N, Reymond A, Kaessmann H: Emergence of young human genes after a burst of retroposition in primates. PLoS Biol. 2005, 3: e357-10.1371/journal.pbio.0030357.PubMedPubMed CentralView ArticleGoogle Scholar
- Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, et al: Ensembl 2007. Nucleic Acids Res. 2007, 35: D610-617. 10.1093/nar/gkl996.PubMedPubMed CentralView ArticleGoogle Scholar
- Hellmann I, Prufer K, Ji H, Zody MC, Paabo S, Ptak SE: Why do human diversity levels vary at a megabase scale?. Genome Res. 2005, 15: 1222-1231. 10.1101/gr.3461105.PubMedPubMed CentralView ArticleGoogle Scholar
- She X, Liu G, Ventura M, Zhao S, Misceo D, Roberto R, Cardone MF, Rocchi M, Green ED, Archidiacano N, Eichler EE: A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications. Genome Res. 2006, 16: 576-583. 10.1101/gr.4949406.PubMedPubMed CentralView ArticleGoogle Scholar
- GEO. [http://www.ncbi.nlm.nih.gov/geo]
- Kimura M: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980, 16: 111-120. 10.1007/BF01731581.PubMedView ArticleGoogle Scholar
- Thompson JD, Gibson TJ, Higgins DG: Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. 2002, Chapter 2: Unit 2.3.Google Scholar
- Hare MP, Palumbi SR: High intron sequence conservation across three mammalian orders suggests functional constraints. Mol Biol Evol. 2003, 20: 969-978. 10.1093/molbev/msg111.PubMedView ArticleGoogle Scholar
- Boguski MS, Lowe TM, Tolstoshev CM: dbEST--database for "expressed sequence tags". Nat Genet. 1993, 4: 332-333. 10.1038/ng0893-332.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.