Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates
© Zhang et al.; licensee BioMed Central Ltd. 2010
Received: 2 October 2009
Accepted: 8 March 2010
Published: 8 March 2010
Unitary pseudogenes are a class of unprocessed pseudogenes without functioning counterparts in the genome. They constitute only a small fraction of annotated pseudogenes in the human genome. However, as they represent distinct functional losses over time, they shed light on the unique features of humans in primate evolution.
We have developed a pipeline to detect human unitary pseudogenes through analyzing the global inventory of orthologs between the human genome and its mammalian relatives. We focus on gene losses along the human lineage after the divergence from rodents about 75 million years ago. In total, we identify 76 unitary pseudogenes, including previously annotated ones, and many novel ones. By comparing each of these to its functioning ortholog in other mammals, we can approximately date the creation of each unitary pseudogene (that is, the gene 'death date') and show that for our group of 76, the functional genes appear to be disabled at a fairly uniform rate throughout primate evolution - not all at once, correlated, for instance, with the 'Alu burst'. Furthermore, we identify 11 unitary pseudogenes that are polymorphic - that is, they have both nonfunctional and functional alleles currently segregating in the human population. Comparing them with their orthologs in other primates, we find that two of them are in fact pseudogenes in non-human primates, suggesting that they represent cases of a gene being resurrected in the human lineage.
This analysis of unitary pseudogenes provides insights into the evolutionary constraints faced by different organisms and the timescales of functional gene loss in humans.
Pseudogenes (ψ) are nongenic DNA segments that exhibit a high degree of sequence similarity to functional genes but contain disruptive defects. The initial pseudogenization of a functional gene is most likely a single mutagenic event that results in premature stop codons, abolished splice junctions, shifts to the coding frame, or impaired transcriptional regulatory sequences. Most pseudogenes are disabled copies of a functional 'parent' gene and can be classified as either processed or duplicated pseudogenes depending on whether they are generated by the retro-transposition of processed mRNA transcripts or the duplication of gene-containing DNA segments in the genome. Recently, the pseudogene complement of the human genome has been investigated both in gene family-specific studies [1–4] and in comprehensive surveys [5–7]. Of the approximately 20,000 pseudogenes identified in early studies, most, if not all, do not represent the extinction of a function as their 'parent' genes are intact and functional.
A third group of pseudogenes particularly relevant to functional analyses are unitary pseudogenes, which are unprocessed pseudogenes with no functional counterparts. They are generated by disruptive mutations occurring in functional genes and prevent them from being successfully transcribed or translated. They differ from duplicated pseudogenes in that the disabled gene had an established function rather than being a more recent copy of a functional gene. The initial analysis of the euchromatic sequence of the human genome identified 37 unitary pseudogene candidates . In addition to unitary pseudogenes with fixed disruptive nucleotide substitutions, human genes with polymorphic disruptive sites that are currently segregating in the human population have also been indentified [8–10], and many of them provide the genetic bases of certain inheritable diseases . Such gene deactivation, which happens in situ giving rise to a unitary pseudogene, results in a loss to the functional part of the genetic repertoire of the organism. Polymorphic pseudogenes are unlikely to become fixed in a population if the gene loss is deleterious. However, various evolutionary processes, such as genetic drift, migration (population bottleneck), and in some cases, natural selection, can lead to fixation. A number of genes are known to have been lost in the human lineage in comparison with other mammals [4, 12–15].
In this study, we develop a novel comparative genomic approach to identify genes disabled in situ without a functional copy (unitary pseudogenes) using the absence of human proteins orthologous to their mouse counterparts as the signals of losses of well-established genes. Our method is able to systematically detect the sequence signature left by such genic losses, distinguishing true loss from mere loss of redundant genes following duplication or retrotransposition. We identify historic and contemporary losses of protein-coding genes in the human lineage since the last common ancestor of euarchontoglires (primates and rodents). In addition to pseudogenes in tandem gene families, we identify 76 losses of well-established genes in the human lineage since the common ancestor with mouse. Moreover, we also find 11 genes with polymorphic disruptive sites. This latter set represents gene losses on a very different timescale: the genic and pseudogenic alleles are segregating in the current human population and are subject to various evolutionary forces.
Gene loss is indicated by the absence of orthologs
After a speciation event, the increasing divergence between two resultant species reflects the diminution in their genic orthology as gains and losses of genes gradually accumulate in each of them. Thus, the presence of genes unique to one species relative to another indicates either gene gains in one or gene losses in the other. In common with many other genomic features, genes in all species are in a state of flux during evolution. However, since all species are related to one another through speciation, gains and losses of genes in one species can be identified only relative to another. Based on this observation, we developed a pipeline that uses the orthologous relationship between genes from a pair of species to detect gene losses in one of them.
Many genes were lost in the human lineage since the human-mouse divergence
Using the human-mouse genic orthology, we identify 228 pseudogenic loci - about 1% of the human gene catalog - in the human genome, which include 98 olfactory receptors, 23 vomeronasal receptors, and 1 zinc finger protein. The large number of olfactory receptors and vomeronasal receptors found in our study is consistent with previous observations [17, 18]. These gene families form tandem gene clusters and have experienced copy number changes and complex local rearrangements. Because the dynamics of gene clusters make it difficult to unambiguously discern ortholog/paralog relationships between species, it is difficult to discern the 'unitary' status of the olfactory receptor/vomeronasal receptor/zinc finger pseudogenes (Table S1 in Additional file 1) and thus they are excluded from further analyses in this study.
Human unitary pseudogenes
Human unitary pseudogene genomic location
Mouse ortholog symbol
Mouse gene name
a disintegrin and metallopeptidase domain 1b
a disintegrin and metallopeptidase domain 26B
a disintegrin and metallopeptidase domain 3 (cyritestin)
a disintegrin and metallopeptidase domain 5
acyl-coenzyme A amino acid N-acyltransferase 2
acyltransferase 3 [RIKEN cDNA 5330437I02 gene]
acyltransferase like 1B
aldehyde oxidase 3-like 1
ATP-binding cassette, sub-family A (ABC1), member 17
cytochrome P450, family 2, subfamily t, polypeptide 4
cytochrome c, testis
Desc4 [RIKEN cDNA 9930032O22 gene]
double C2, gamma
Feta [RIKEN cDNA 4930417 M19 gene]
guanylate cyclase 2 g
gulonolactone (L-) oxidase
histone cluster 3, H2ba
major urinary protein 4
mannose binding lectin (A) 1
neurotrophin receptor associated death domain
nuclear receptor subfamily 1, group H, member 5
preferentially expressed antigen in melanoma
protein tyrosine phosphatase, receptor type, V
protocadherin gamma subfamily B, 8
secretory blood group 1
Sirpb3 [RIKEN cDNA F830045P16 gene]
solute carrier family 7 (cationic amino acid transporter, y+ system), member 15
sulfotransferase family 1D, member 1
taste receptor, type 2, member 134
testicular cell adhesion molecule 1
testis expressed gene 16
testis expressed gene 21
testis-specific serine kinase 5
threonine aldolase 1
toll-like receptor 12
trace amine-associated receptor 3
trace amine-associated receptor 4
transient receptor potential cation channel, subfamily C, member 2
transmembrane protease, serine 11c
transmembrane protease, serine 8 (intestinal)
In a recent study, Mochida et al. showed NEPN is a secreted N-glycosylated protein inhibitor of transforming growth factor-β signaling in mouse and also identified putative NEPN gene orthologs in pig, dog, rat, and chicken . The human ortholog was not found, and its absence was postulated to be a missed identification due to a lesser homology with its counterparts in other mammals. As this study and a previous one  demonstrate, however, despite the lack of a closely related homolog in the human genome, NEPN is a pseudogene not only in human but also in chimpanzee, gorilla, and rhesus with a shared coding sequence (CDS) disruptive mutation; thus, its inactivation occurred at least 30 millions of years ago, before the divergence between the catarrhines and the New World monkeys.
Hydrolase-related activity and structure are enriched in human unitary pseudogenes
Compared with mouse, human has lost five testis-specific genes: testicular cell adhesion molecule 1 (TCAM1), testis expressed gene 16 (TEX16), testis expressed gene 21 (TEX21), testis-specific serine kinase 5 (TSSK5), and cytochrome c, testis (CYCT) . The losses of these testis-specific genes in the human lineage may have affected the distinctive processes that occur in male germinal cells  and thus contributed to the differentiated fertility between two lineages.
Gene loss has occurred throughout primate evolution
One interesting case is the evolution of NR1H5 in primates. A previous study of the nuclear receptor pseudogenes  has shown that NR1H5 is a pseudogene in human, chimpanzee, and rhesus monkey with three (out of 14 in total) disruptive mutations - one frame-shift mutation and one splice-junction mutation in the very early part of the gene structure and one nonsense mutation at the end of the CDS - shared by these three primate species. In the same study, based on sequences from human, mouse, rat, and chicken, the silencing of NR1H5 was dated to be approximately 42 million years ago (MYA), which was slightly later than 42.9 MYA, the estimated time of divergence between the catarrhines and the New World monkeys . However, because of the uncertainties in the estimates of both dates (for example, the 95% credibility interval of the divergence time estimation is from 36.1 to 51.1 MYA), it is not conclusive that the pseudogenization of NR1H5 occurred after the divergence between the catarrhines and the New World monkeys. To solve this problem, we identify NR1H5 in the recently published genomic sequences of marmoset, a New World monkey, and determine whether it contains any of the three pseudogenic mutations common to human, chimpanzee, and rhesus. Despite the fact that only the first one-third of the NR1H5 CDS can be found in marmoset due to the incompleteness of its genome assembly, the two important common disruptive mutations, whose positions are covered by the partial sequence identification, are absent. This finding suggests that the pseudogenization of NR1H5 in the human lineage occurred indeed after the divergence between the catarrhines and the New World monkeys.
Using current genome sequences of human, chimpanzee, gorilla, orangutan, rhesus, marmoset, and tarsier, we identify 11 genes - ADAM3, CTF2, HIST3H2BA, MBL1, MUP, TMPRSS8, ADAM1B, ADAM5, DOC2G, HYAL6, and TAS2R134 - with human-specific CDS disruptions, which occurred after the divergence of humans and chimpanzees. Based on our sequence analysis, however, we find the last five of them - ADAM1B, ADAM5, DOC2G, HYAL6, and TAS2R134 - are possibly also disabled in other primates with disruptions at different sites. Under the assumption that the neutral mutation rate has remained constant since the human-chimpanzee divergence at 6.6 MYA, we estimate the time in the hominid ancestor when the human-specific inactivation mutations appeared in the aforementioned 11 genes. The inactivation time of eight genes can be meaningfully calculated, and the estimates are plotted along the timeline from 6.6 MYA, when human and chimpanzee diverged, to the present (Figure 5b; Table S3 in Additional file 1). None of unitary pseudogenes seems to be generated by the insertion of an Alu sequence into the coding sequence of an ancestral functional gene. As the plot shows, unlike Alu sequences, which had an exceptional surge of activity around 40 MYA , the pseudogenization events occurred in a temporally random fashion - that is, there is no burst of gene losses during the human evolution since the human-chimpanzee divergence. This difference in their age distributions reflects the difference in underlying generative mechanisms.
Some genes contain polymorphic disruptive sites and are segregating in the human population
Human polymorphic pseudogenes
CDS disruptive mutation
HapMap SNP ID
taT (Y) → taA
Cag (Q) → Tag
Cga (R) → Tga
Caa (Q) → Taa
Gaa (E) → Taa
Aaa (K) → Taa
Polymorphic pseudogenes with the disruptive sites typed in the HapMap Projecta
CDS disrupted gene
Cga (R) → Tga
Gaa (E) → Taa
Aaa (K) → Taa
Disrupted codon positionc
Reference allele in human
Reference allele in other primatesd
Test statistic for HWE in the meta-populatione
0.285 (P = 0.867)
8.659 (P = 0.013)
0.071 (P = 0.965)
The pseudogene complement of the human genome has been comprehensively surveyed in several early studies [5–7]. Using sequence similarity between the proteome and the (translated) genome as the signature, these studies found pseudogenic copies of functional genes that were generated after duplication or retrotransposition in the human genome. Such duplicated or processed pseudogenes are probably of little evolutionary significance, as the former are disabled soon after duplication and the latter 'dead on arrival' . In this study, however, we systematically identify human unitary pseudogenes, a class of pseudogenes that are especially interesting as it is the functional genes themselves, not their genomic copies generated by duplication or retrotransposition, that have been disabled. Some human unitary pseudogenes have been identified on an individual basis when a particular gene or gene family was studied (see the references in Table S2 in Additional file 1). Using a comparative genomic approach, Zhu et al.  identified 26 losses of well-established genes in the human genome that were all lost at least 50 MYA after their birth. We compared our and their sets and found that in spite of using different methodological approaches, both studies had in common many gene losses in the human genome (Table S5 in Additional file 1).
Within a population, the pseudogenization of a gene does not happen instantaneously. Rather, after a disruptive mutation occurs, the alleles at the locus undergo a fixation process. Depending on the outcome, such a mutation is either fixed or lost. Thus, every gene loss goes through two stages: a polymorphic stage in the contemporary population subject to evolutionary forces; and a fixed stage freed from selective pressure. The fixed mutation becomes the base substitution in the species under study relative to the other and is identified through comparison of the genomes of two species. By comparing the human and the mouse genomes, we identify 76 fixed unitary pseudogenes. In addition, we identify 11 human genes with pseudogenic alleles, whose disruptive mutations include nonsense mutations and frameshifts. Our identification of polymorphic pseudogenes is by no means comprehensive as we search in the reference genome sequence for only the loci that are associated with both CDS disruptions and functional mRNA sequences. To obtain a comprehensive set of polymorphic pseudogenes, one approach would be to map variation sites in dbSNP to the reference genome and identify variations that can disrupt the ORF of known genes.
Being at a relatively early stage of pseudogenization, polymorphic pseudogenes in a population are subject to various evolutionary forces depending on the function of the normal alleles and the interaction between different genotypes and the environment. Since the loss of a single-copy gene is often deleterious and unlikely to be fixed in a population , it remains unclear under what circumstances genes were silenced and how the losses were tolerated and fixed in the ancestral population. It has been proposed that, under certain conditions, a gene could become disposable to the fitness of the organism if the function that it provides becomes redundant. When this happens, the pseudogenic allele could be fixed in the population by random genetic drift because the loss of the gene product did not constitute a disadvantage and, thus, there is little selection against the gene loss. This release from selective pressure is believed to be how the nonfunctionalization of L-gulono-γ-lactone oxidase gene could be fixed in humans and guinea pigs : it has been hypothesized that the guinea pig and human ancestors subsisted on a naturally ascorbic acid-rich diet; therefore, the loss of the enzyme did not constitute a disadvantage.
On the other hand, as argued by the 'less is more' hypothesis, gene loss may serve as an engine of evolutionary change . Instead of being a neutral event, the silencing of a gene could be advantageous to the organism and consequently sweep through the population to fixation - the kind of adaptive evolution illustrated by the inactivation of the α-1,3-galactosyltransferase gene in catarrhines , the CMP-N-acetylneuraminic acid hydroxylase gene , the olfactory receptor genes , and the sarcomeric myosin gene  in humans as there seems to be a correlation between pseudogenization and physiological/anatomic changes. In addition to these fixed unitary pseudogenes, studies have also shown that some null alleles confer a selective advantage for the polymorphic pseudogenes in the human population. For example, the chemokine receptor CCR5 gene in human has a pseudogenic allele with a 32-bp deletion. Homozygotes of this null allele are strongly protected from infection by various pathogens, including HIV, and heterozygotes receive some protection . Another example is the caspase-12 gene. It has been shown that carriers of the caspase-12 pseudogene are more resistant to severe sepsis , and the null allele has spread through most of the human population within the past 100,000 years because of positive selection .
There are 6,236 Mouse Genome Informatics (MGI) mouse proteins and 6,020 Ensembl human proteins outside of the InParanoid-assigned human-mouse orthologs. Such an absence of orthology is a result of both gene deaths that generated unitary pseudogenes and gene births that gave rise to novel genes in both species. Using the absence of orthologs of mouse proteins in human as the signal, we identify 76 such losses of well-established genes in the human genome. Of the 2,005 human proteins that have no mouse orthologs and cannot be mapped to the mouse reference genome, 638 passed the quality control and thus are included in the current Ensembl release of the human protein set. Because they cannot be mapped to the genome of dog, the closest out-group of the human-mouse lineage with the best genomic sequences, we believe the reason for their lack of mouse orthologs is that they are novel human genes, not that their mouse orthologs have been deleted. If we take the 15,885 human-mouse orthologs assigned by InParanoid as the set of genes before the divergence between human and mouse, the unitary pseudogenes and the novel genes generated in the human lineage since the last common ancestor of euarchontoglires, approximately 75 MYA, represent, respectively, a loss and a gain of approximately 0.5% and 4% of the number of ancestral genes. Despite aforementioned examples of gene losses under positive selection, this striking skew toward gene birth indicates strongly that gene births are a more significant force for evolutionary change than gene losses. It also confirms the notion that as they represent functional losses to a species, unitary pseudogenes are expected to be rare.
Unitary pseudogenes are unprocessed pseudogenes with no functional counterparts. With complete genome sequences of model organisms, we have developed a novel method to detect such pseudogenes in a genome through analyzing the global inventory of orthologs between two organisms. Using this approach with very conservative cutoffs to look for gene losses along the human lineage after its divergence from rodents approximately 75 MYA, we identify 76 unitary pseudogenes in the human genome. As relics of genes, they shed particular light on the unique features of the human genome during evolution. By comparing orthologous sequences, we assign ages to primate unitary pseudogenes, and find that the former functional genes appear to have been disabled at a fairly uniform rate throughout primate evolution and not in a sudden burst. Furthermore, we find 11 polymorphic pseudogenes that have nonfunctional pseudogenic alleles currently segregating in the human population Comparing them with their orthologs in other primates, we find that two are in fact pseudogenes in non-human primates, suggesting that these actually represent cases of a gene that is in the process of being resurrected in the human lineage. Identification and analysis of human unitary pseudogenes afford unique insights into the evolution and dynamics of the human genic repertoire and the human genome at large.
Materials and methods
Identification of human unitary pseudogenes
The overall strategy of our approach is depicted in Figure 1a. To discover human unitary pseudogenes, we use mouse proteins as the reference. Because by definition a unitary pseudogene and a functional ortholog in a genome are mutually exclusive for a specific gene in another genome, we first identify mouse proteins that do not have human orthologs. To find such mouse proteins, we use the InParanoid human-mouse ortholog set (version 6.1, based on human Ensembl 43 and mouse MGI 12 December 2006 protein sets). InParanoid is used because it balances the false negative and false positive rates and was top-ranked as an orthology tool [40, 41]. These mouse proteins are then mapped to the human reference genome (Hsap NCBI build 36.1, hg18) using BLAT  with its default parameters. If the best mapping of a mouse protein to the human genome gives a gene structure similar to that of the mouse gene, the mapped human genomic region is extracted and examined for disruptions (nonsense mutations and frameshifts) to the coding sequence using GeneWise .
Some of the initially discovered human pseudogenes are redundant as they could be identified by more than one mouse gene due to duplicated gene annotations or high sequence similarities among members of certain protein families. The redundancy is removed by clustering the initial set of pseudogenic candidates into pseudogenic loci based on the overlap among their genomic coordinates. These loci are grouped into four sets based on the annotation of the mouse proteins expressed from: named genes; cDNA/expressed sequences with introns; cDNA/expressed sequences without introns; and modeled/predicted genes. Given the low possibility for unitary pseudogenes to be intronless and the difficulty to assess the reliability of the modeled or predicted genes, the loci in the last two sets are excluded from further consideration.
Loci in the first two sets are carefully examined to ascertain their pseudogene status. Prior to manual annotation, all genomic sequences are sent to an automated analysis pipeline for similarity searches and ab initio gene predictions. The searches are run on a computer farm and stored in an Ensembl MySQL database using the Ensembl analysis pipeline system  and the results displayed in the Zmap genome viewer. Additional external predictions and annotation can be visualized in Zmap via a distributed annotation system (DAS). The otterlace annotation interface allows the user to build genes and edit annotations based on homology to aligned mRNA, expressed sequence tag and protein evidence by adding transcripts, exon coordinates, CDSs, gene names and descriptions, remarks and polyadenylation signals and sites .
All predicted unitary pseudogene loci are checked to ensure the validity of the orthologous mouse protein-coding gene, to verify the conservation of synteny between the human and mouse loci, and to confirm the pseudogenicity of the human locus. Mouse loci identified as orthologs to putative human unitary pseudogenes are fully manually annotated; that is, the complete gene structures and CDSs of all alternative splice variants are elucidated to confirm both the coding potential of the locus and the accuracy of the MGI annotated CDSs. Mouse loci identified as lacking a CDS are rejected as unitary pseudogenes. Conservation of synteny between mouse and human orthologs is established by the identification of conserved flanking loci in both the Zmap viewer and Ensembl MultiContig View. Where the position of the putative orthologs is not conserved, the human locus is rejected as a unitary pseudogene. Finally, the putative human unitary pseudogene locus is fully manually annotated. Loci are confirmed as unitary pseudogenes where the alignment of the orthologous mouse protein sequence indicates a CDS disruption (premature stop, frame-shift or truncation) fixed in the human genome.
We also identify several cases where the ORF of a gene is disrupted in the human reference genome sequence but locus-specific transcripts lack the disrupting mutation. Such a contradiction may be a result of polymorphism in the human population, as the genomic DNA and the mRNA were obtained from different individuals. However, in some cases an apparent error in the genomic sequence appears responsible. To identify and remove false positives, we check the validity of the base call under consideration in the human reference genome by examining the sequences of the reads in the trace archive. We confirm the transcript sequence by multiple independent copies available in GenBank. All errors in the genome sequence were reported to the Genome Reference Consortium.
Identification of orthologous genic or pseudogenic sequences in 43 species
We examine 44 vertebrates for genic or pseudogenic sequences orthologous and syntenic to human unitary pseudogenes. The organism, release version and time of the genomic sequence download from the Ensembl database are listed in Table S7 in Additional file 1.
To identify orthologous and syntenic sequences, we first use the Fetch Alignments tool of Galaxy  to extract 'stitched' blocks of the alignment of the above 44 genomic sequences for each of the 76 human unitary pseudogenes in the human genome. Using the global multiple sequence alignment ensures the orthology and the synteny of mapped genomic sequences among species. The sequences in the alignment blocks are then mapped back using BLAT to their corresponding genomes to recover any sequences not included in the alignments. The subsequences corresponding to the 76 human unitary pseudogenes in the 44 genomes are extracted from the start minus 5 kb and the end plus 5 kb of the BLAT alignments. The mouse protein sequences are then aligned to the corresponding genomic subsequences using GeneWise to identify their orthologs in the 44 genomes.
Functional and structural analyses of human unitary pseudogenes
For functional and structural analyses, we search for GO terms and Pfam domains that are over-represented within the human unitary pseudogenes. Because pseudogenes are nonfunctional and thus not included in the human gene annotation set, such analyses cannot be performed directly. To circumvent this problem, we use the 76 mouse functional orthologs of human unitary pseudogenes as their proxies. To perform the analyses, we combine all human genes and the 76 mouse genes into one gene list and retrieve their GO and Pfam annotations from Ensembl. BiNGO  is used to test the 76 mouse genes in comparison with the combined gene list for GO term association on the GO hierarchy. We also test for over-representation of Pfam domains using the standard hypergeometric test with subsequent false discovery rate correction for multiple hypotheses testing.
Estimation of the nonfunctionalization time of a human-specific unitary pseudogene
in which ω1 is the KA/KS ratio in the human lineage. When only a small number of species are used to estimate TN, its estimated value should be viewed with caution.
a disintegrin and metallopeptidase domain
gulonolactone (L-) oxidase
Mouse Genome Informatics
major urinary protein
million of years ago
open reading frame
single nucleotide polymorphism
Vmn2r putative pheromone receptor.
We thank Laurens Wilming, Marie-Marthe Suner, Charles Steward, and Ifat Barnes at the Wellcome Trust Sanger Institute for annotating some of the predicted human unitary pseudogenic loci. This work was supported by an NIH grant from National Library of Medicine (1K99LM009770-01) to ZDZ. Additional funding was provided by NIH grants from National Human Genome Research Institute to MG.
- Glusman G, Yanai I, Rubin I, Lancet D: The complete human olfactory subgenome. Genome Res. 2001, 11: 685-702. 10.1101/gr.171001.PubMedView ArticleGoogle Scholar
- Zhang Z, Gerstein M: The human genome has 49 cytochrome c pseudogenes, including a relic of a primordial gene that still functions in mouse. Gene. 2003, 312: 61-72. 10.1016/S0378-1119(03)00579-1.PubMedView ArticleGoogle Scholar
- Zhang Z, Harrison P, Gerstein M: Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res. 2002, 12: 1466-1482. 10.1101/gr.331902.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang ZD, Cayting P, Weinstock G, Gerstein M: Analysis of nuclear receptor pseudogenes in vertebrates: how the silent tell their stories. Mol Biol Evol. 2008, 25: 131-143. 10.1093/molbev/msm251.PubMedView ArticleGoogle Scholar
- Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N: Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 2003, 4: R74-10.1186/gb-2003-4-11-r74.PubMedPubMed CentralView ArticleGoogle Scholar
- Torrents D, Suyama M, Zdobnov E, Bork P: A genome-wide survey of human pseudogenes. Genome Res. 2003, 13: 2559-2567. 10.1101/gr.1455503.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Harrison PM, Liu Y, Gerstein M: Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 2003, 13: 2541-2558. 10.1101/gr.1429003.PubMedPubMed CentralView ArticleGoogle Scholar
- The International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.View ArticleGoogle Scholar
- Dean M, Carrington M, Winkler C, Huttley GA, Smith MW, Allikmets R, Goedert JJ, Buchbinder SP, Vittinghoff E, Gomperts E, Donfield S, Vlahov D, Kaslow R, Saah A, Rinaldo C, Detels R, O'Brien SJ: Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene. Hemophilia Growth and Development Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, San Francisco City Cohort, ALIVE Study. Science. 1996, 273: 1856-1862. 10.1126/science.273.5283.1856.PubMedView ArticleGoogle Scholar
- Tournamille C, Colin Y, Cartron JP, Le Van Kim C: Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy-negative individuals. Nat Genet. 1995, 10: 224-228. 10.1038/ng0695-224.PubMedView ArticleGoogle Scholar
- Stenson PD, Mort M, Ball EV, Howells K, Phillips AD, Thomas NS, Cooper DN: The Human Gene Mutation Database: 2008 update. Genome Med. 2009, 1: 13-10.1186/gm13.PubMedPubMed CentralView ArticleGoogle Scholar
- Chou HH, Hayakawa T, Diaz S, Krings M, Indriati E, Leakey M, Paabo S, Satta Y, Takahata N, Varki A: Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution. Proc Natl Acad Sci USA. 2002, 99: 11736-11741. 10.1073/pnas.182257399.PubMedPubMed CentralView ArticleGoogle Scholar
- Koshizaka T, Nishikimi M, Ozawa T, Yagi K: Isolation and sequence analysis of a complementary DNA encoding rat liver L-gulono-gamma-lactone oxidase, a key enzyme for L-ascorbic acid biosynthesis. J Biol Chem. 1988, 263: 1619-1621.PubMedGoogle Scholar
- Stedman HH, Kozyak BW, Nelson A, Thesier DM, Su LT, Low DW, Bridges CR, Shrager JB, Minugh-Purvis N, Mitchell MA: Myosin gene mutation correlates with anatomical changes in the human lineage. Nature. 2004, 428: 415-418. 10.1038/nature02358.PubMedView ArticleGoogle Scholar
- Wu XW, Lee CC, Muzny DM, Caskey CT: Urate oxidase: primary structure and evolutionary implications. Proc Natl Acad Sci USA. 1989, 86: 9412-9416. 10.1073/pnas.86.23.9412.PubMedPubMed CentralView ArticleGoogle Scholar
- Berglund AC, Sjolund E, Ostlund G, Sonnhammer EL: InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res. 2008, 36: D263-266. 10.1093/nar/gkm1020.PubMedPubMed CentralView ArticleGoogle Scholar
- Gilad Y, Man O, Paabo S, Lancet D: Human specific loss of olfactory receptor genes. Proc Natl Acad Sci USA. 2003, 100: 3324-3327. 10.1073/pnas.0535697100.PubMedPubMed CentralView ArticleGoogle Scholar
- Young JM, Trask BJ: V2R gene families degenerated in primates, dog and cow, but expanded in opossum. Trends Genet. 2007, 23: 212-215. 10.1016/j.tig.2007.03.004.PubMedView ArticleGoogle Scholar
- Nishikimi M, Fukuyama R, Minoshima S, Shimizu N, Yagi K: Cloning and chromosomal mapping of the human nonfunctional gene for L-gulono-gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis missing in man. J Biol Chem. 1994, 269: 13685-13688.PubMedGoogle Scholar
- Derouet D, Rousseau F, Alfonsi F, Froger J, Hermann J, Barbier F, Perret D, Diveu C, Guillet C, Preisser L, Dumont A, Barbado M, Morel A, deLapeyrière O, Gascan H, Chevalier S: Neuropoietin, a new IL-6-related cytokine signaling through the ciliary neurotrophic factor receptor. Proc Natl Acad Sci USA. 2004, 101: 4827-4832. 10.1073/pnas.0306178101.PubMedPubMed CentralView ArticleGoogle Scholar
- Csoka AB, Scherer SW, Stern R: Expression analysis of six paralogous human hyaluronidase genes clustered on chromosomes 3p21 and 7q31. Genomics. 1999, 60: 356-361. 10.1006/geno.1999.5876.PubMedView ArticleGoogle Scholar
- Mochida Y, Parisuthiman D, Kaku M, Hanai J, Sukhatme VP, Yamauchi M: Nephrocan, a novel member of the small leucine-rich repeat protein family, is an inhibitor of transforming growth factor-beta signaling. J Biol Chem. 2006, 281: 36044-36051. 10.1074/jbc.M604787200.PubMedView ArticleGoogle Scholar
- Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D: Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007, 3: e247-10.1371/journal.pcbi.0030247.PubMedPubMed CentralView ArticleGoogle Scholar
- Chamero P, Marton TF, Logan DW, Flanagan K, Cruz JR, Saghatelian A, Cravatt BF, Stowers L: Identification of protein pheromones that promote aggressive behaviour. Nature. 2007, 450: 899-902. 10.1038/nature05997.PubMedView ArticleGoogle Scholar
- Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticleGoogle Scholar
- Grimes SR: Testis-specific transcriptional control. Gene. 2004, 343: 11-22. 10.1016/j.gene.2004.08.021.PubMedView ArticleGoogle Scholar
- Steiper ME, Young NM: Primate molecular divergence dates. Mol Phylogenet Evol. 2006, 41: 384-394. 10.1016/j.ympev.2006.05.021.PubMedView ArticleGoogle Scholar
- The International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.View ArticleGoogle Scholar
- The International HapMap Consortium: A haplotype map of the human genome. Nature. 2005, 437: 1299-1320. 10.1038/nature04226.PubMed CentralView ArticleGoogle Scholar
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, et al: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449: 913-918. 10.1038/nature06250.PubMedPubMed CentralView ArticleGoogle Scholar
- Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4: e72-10.1371/journal.pbio.0040072.PubMedPubMed CentralView ArticleGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290: 1151-1155. 10.1126/science.290.5494.1151.PubMedView ArticleGoogle Scholar
- The Chimpanzee Sequencing and Analysis Consortium: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437: 69-87. 10.1038/nature04072.View ArticleGoogle Scholar
- Graur D, Li W-H: Fundamentals of Molecular Evolution. 2000, Sunderland, MA: Sinauer Associates, Inc, 2Google Scholar
- Olson MV: When less is more: gene loss as an engine of evolutionary change. Am J Hum Genet. 1999, 64: 18-23. 10.1086/302219.PubMedPubMed CentralView ArticleGoogle Scholar
- Galili U, Swanson K: Gene sequences suggest inactivation of alpha-1,3-galactosyltransferase in catarrhines after the divergence of apes from monkeys. Proc Natl Acad Sci USA. 1991, 88: 7401-7404. 10.1073/pnas.88.16.7401.PubMedPubMed CentralView ArticleGoogle Scholar
- Saleh M, Vaillancourt JP, Graham RK, Huyck M, Srinivasula SM, Alnemri ES, Steinberg MH, Nolan V, Baldwin CT, Hotchkiss RS, Buchman TG, Zehnbauer BA, Hayden MR, Farrer LA, Roy S, Nicholson DW: Differential modulation of endotoxin responsiveness by human caspase-12 polymorphisms. Nature. 2004, 429: 75-79. 10.1038/nature02451.PubMedView ArticleGoogle Scholar
- Xue Y, Daly A, Yngvadottir B, Liu M, Coop G, Kim Y, Sabeti P, Chen Y, Stalker J, Huckle E, Burton J, Leonard S, Rogers J, Tyler-Smith C: Spread of an inactive form of caspase-12 in humans is due to recent positive selection. Am J Hum Genet. 2006, 78: 659-670. 10.1086/503116.PubMedPubMed CentralView ArticleGoogle Scholar
- Bekpen C, Marques-Bonet T, Alkan C, Antonacci F, Leogrande MB, Ventura M, Kidd JM, Siswara P, Howard JC, Eichler EE: Death and resurrection of the human IRGM gene. PLoS Genet. 2009, 5: e1000403-10.1371/journal.pgen.1000403.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen F, Mackey AJ, Vermunt JK, Roos DS: Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One. 2007, 2: e383-10.1371/journal.pone.0000383.PubMedPubMed CentralView ArticleGoogle Scholar
- Hulsen T, Huynen MA, de Vlieg J, Groenen PM: Benchmarking ortholog identification methods using functional genomics data. Genome Biol. 2006, 7: R31-10.1186/gb-2006-7-4-r31.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ: BLAT - the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMedPubMed CentralView ArticleGoogle Scholar
- Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res. 2004, 14: 988-995. 10.1101/gr.1865504.PubMedPubMed CentralView ArticleGoogle Scholar
- Searle SM, Gilbert J, Iyer V, Clamp M: The otter annotation system. Genome Res. 2004, 14: 963-970. 10.1101/gr.1864804.PubMedPubMed CentralView ArticleGoogle Scholar
- Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE, Guigo R: GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006, 7 Suppl 1: S4.1-S4.9. 10.1186/gb-2006-7-s1-s4.Google Scholar
- Galaxy. [http://galaxy.psu.edu/]
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005, 21: 3448-3449. 10.1093/bioinformatics/bti551.PubMedView ArticleGoogle Scholar
- Edwards DR, Handsley MM, Pennington CJ: The ADAM metalloproteinases. Mol Aspects Med. 2008, 29: 258-289. 10.1016/j.mam.2008.08.001.PubMedView ArticleGoogle Scholar
- Hunt MC, Alexson SE: Novel functions of acyl-CoA thioesterases and acyltransferases as auxiliary enzymes in peroxisomal lipid metabolism. Prog Lipid Res. 2008, 47: 405-421. 10.1016/j.plipres.2008.05.001.PubMedView ArticleGoogle Scholar
- Levy I, Wu YQ, Roeckel N, Bulle F, Pawlak A, Siegrist S, Mattei MG, Guellaen G: Human testis specifically expresses a homologue of the rodent T lymphocytes RT6 mRNA. FEBS Lett. 1996, 382: 276-280. 10.1016/0014-5793(96)00183-4.PubMedView ArticleGoogle Scholar
- Garattini E, Mendel R, Romao MJ, Wright R, Terao M: Mammalian molybdo-flavoenzymes, an expanding family of proteins: structure, genetics, regulation, function and pathophysiology. Biochem J. 2003, 372: 15-32. 10.1042/BJ20030121.PubMedPubMed CentralView ArticleGoogle Scholar
- Piehler AP, Wenzel JJ, Olstad OK, Haug KB, Kierulf P, Kaminski WE: The human ortholog of the rodent testis-specific ABC transporter Abca17 is a ubiquitously expressed pseudogene (ABCA17P) and shares a common 5' end with ABCA3. BMC Mol Biol. 2006, 7: 28-10.1186/1471-2199-7-28.PubMedPubMed CentralView ArticleGoogle Scholar
- Csoka AB, Frost GI, Stern R: The six hyaluronidase-like genes in the human and mouse genomes. Matrix Biol. 2001, 20: 499-508. 10.1016/S0945-053X(01)00172-X.PubMedView ArticleGoogle Scholar
- Guo N, Mogues T, Weremowicz S, Morton CC, Sastry KN: The human ortholog of rhesus mannose-binding protein-A gene is an expressed pseudogene that localizes to chromosome 10. Mamm Genome. 1998, 9: 246-249. 10.1007/s003359900735.PubMedView ArticleGoogle Scholar
- Birtle Z, Goodstadt L, Ponting C: Duplication and positive selection among hominin-specific PRAME genes. BMC Genomics. 2005, 6: 120-10.1186/1471-2164-6-120.PubMedPubMed CentralView ArticleGoogle Scholar
- Kelly RJ, Rouquier S, Giorgi D, Lennon GG, Lowe JB: Sequence and expression of a candidate for the human Secretor blood group alpha(1,2)fucosyltransferase gene (FUT2). Homozygosity for an enzyme-inactivating nonsense mutation commonly correlates with the non-secretor phenotype. J Biol Chem. 1995, 270: 4640-4649. 10.1074/jbc.270.9.4640.PubMedView ArticleGoogle Scholar
- Meinl W, Glatt H: Structure and localization of the human SULT1B1 gene: neighborhood to SULT1E1 and a SULT1D pseudogene. Biochem Biophys Res Commun. 2001, 288: 855-862. 10.1006/bbrc.2001.5829.PubMedView ArticleGoogle Scholar
- Caenepeel S, Charydczak G, Sudarsanam S, Hunter T, Manning G: The mouse kinome: discovery and comparative genomics of all mouse protein kinases. Proc Natl Acad Sci USA. 2004, 101: 11707-11712. 10.1073/pnas.0306880101.PubMedPubMed CentralView ArticleGoogle Scholar
- Edgar AJ: Mice have a transcribed L-threonine aldolase/GLY1 gene, but the human GLY1 gene is a non-processed pseudogene. BMC Genomics. 2005, 6: 32-10.1186/1471-2164-6-32.PubMedPubMed CentralView ArticleGoogle Scholar
- Roach JC, Glusman G, Rowen L, Kaur A, Purcell MK, Smith KD, Hood LE, Aderem A: The evolution of vertebrate Toll-like receptors. Proc Natl Acad Sci USA. 2005, 102: 9577-9582. 10.1073/pnas.0502272102.PubMedPubMed CentralView ArticleGoogle Scholar
- Lindemann L, Ebeling M, Kratochwil NA, Bunzow JR, Grandy DK, Hoener MC: Trace amine-associated receptors form structurally and functionally distinct subfamilies of novel G protein-coupled receptors. Genomics. 2005, 85: 372-385. 10.1016/j.ygeno.2004.11.010.PubMedView ArticleGoogle Scholar
- Wes PD, Chevesich J, Jeromin A, Rosenberg C, Stetten G, Montell C: TRPC1, a human homolog of a Drosophila store-operated channel. Proc Natl Acad Sci USA. 1995, 92: 9652-9656. 10.1073/pnas.92.21.9652.PubMedPubMed CentralView ArticleGoogle Scholar
- Piehler AP, Hellum M, Wenzel JJ, Kaminski E, Haug KB, Kierulf P, Kaminski WE: The human ABC transporter pseudogene family: Evidence for transcription and gene-pseudogene interference. BMC Genomics. 2008, 9: 165-10.1186/1471-2164-9-165.PubMedPubMed CentralView ArticleGoogle Scholar
- Graw J, Klopp N, Loster J, Soewarto D, Fuchs H, Becker-Follmann J, Reis A, Wolf E, Balling R, Habre de Angelis M: Ethylnitrosourea-induced mutation in mice leads to the expression of a novel protein in the eye and to dominant cataracts. Genetics. 2001, 157: 1313-1320.PubMedPubMed CentralGoogle Scholar
- Sheng J, Ding X: Identification of human genes related to olfactory-specific CYP2G1. Biochem Biophys Res Commun. 1996, 218: 570-574. 10.1006/bbrc.1996.0101.PubMedView ArticleGoogle Scholar
- Steinmetz M, Moore KW, Frelinger JG, Sher BT, Shen FW, Boyse EA, Hood L: A pseudogene homologous to mouse transplantation antigens: transplantation antigens are encoded by eight exons that correlate with protein domains. Cell. 1981, 25: 683-692. 10.1016/0092-8674(81)90175-6.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.