Skip to main content

Advertisement

Table 2 KOGs represented by exactly one ortholog in seven analyzed eukaryotic genomes (examples)

From: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes

KOG number (Predicted) function Multiprotein complex Functional class* Prokaryotic homologs Fitness class Comments
      Yeast Worm§  
Genes experimentally or computationally characterized previously
0392 SNF2 family DNA-dependent ATPase TBP-DNA complex   Many bacteria and archaea (COG0553) 0 1 Involved in regulation of transcription from POL II promoters [104]
0121 Nuclear cap-binding protein complex, subunit CBP20 (RRM-domain-containing RNA-binding protein) Cap-binding complex A Several bacteria (COG0724) 1 X RRM-domain proteins show scattered presence in bacteria and might have been horizontally transferred from eukaryotes
0213 U2-snRNP associated splicing factor 3b, subunit 1 Spliceosome A None 0 0  
0227 snRNA-associated protein, splicing factor 3a, subunit b (Prp11p) Spliceosome A None 0 0  
2268 Predicted nucleic-acid-binding protein kinase of the RIO1 family; 40S ribosomal subunit biogenesis/18S rRNA processing Pre-40S subunit A Orthologs in most archaea but not in bacteria (COG0478) 0 X One of the very small number of protein kinases that show a clear-cut orthologous relationship between all eukaryotes and most archaea, and, apparently, the only one containing a helix-turn-helix nucleic-acid-binding domain. [105] Associated with yeast pre-40S subunit and required for its maturation. [106]
3031 Protein required for 60S ribosomal subunit biogenesis; [107] contains the IMP4 domain, which is involved in rRNA processing [108]; paralog of KOG3095 and KOG3292, which are also represented in all analyzed genomes. Processosome A Distantly related to COG2136, represented by orthologs in most archaea, but not in bacteria (KSM, unpublished) 0 X The COG2136 proteins appear to be subunits of the predicted archaeal exosome [109]. Apparently, this gene has undergone at least two ancient duplications in eukaryotes
3045 Predicted RNA methylase involved in rRNA processing Processosome? A Distantly related to numerous Rossmann-fold methylases but prokaryotic orthologs could not be confidently identified 1 1 This protein (Rrp8p in yeast) has been shown to participate in the processing of rRNA and sequence analysis reveals the presence of a Rossmann-fold methylase domain [110]. Therefore Rrp8p probably methylates either snoRNA or rRNA itself.
3064 RNA-binding nuclear protein containing a distinct C4 Zn-finger; implicated in the biogenesis of 60S ribosomal subunits [111] Processosome A None 0 0 Initially identified in yeast as the MAK16 protein required for dsRNA virus reproduction [112]
0291, 0302, 0306, 310, 0319, 0650, 1272 WD40-repeat proteins, subunits of rRNA processing complexes [69, 70] Processosome A WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319) all 0 X,X,1,X,1,1,1  
0284 Polyadenylation factor I complex, subunit PFS2, WD40-repeat protein Poly-adenylation complex A Same as above (COG2319) 0 X  
0337 RNA helicase involved in 28S rRNA processing Processosome A Most of the archaea and bacteria (COG0513) 0 X  
0343 RNA helicase involved in 28S rRNA processing Processosome A Most of the archaea and bacteria (COG0513) 0 X  
1069 3'-5' exoribonuclease (RNAse PH), exosome subunit Rrp46 Exosome A Most bacteria and archaea (COG0689) 0 1  
1070 Exosome subunit Rrp5 (RNA-binding S1 domain fused to TPR repeats) Exosome A Most bacteria (COG0539, COG0457) 0 1  
1135 mRNA cleavage and polyadenylation complex subunit CFT2 (CPSF) Cleavage and polyadenylation complex A Most archaea and some bacteria (COG1236) 0 0  
1914 mRNA cleavage and polyadenylation factor I complex, subunit RNA14 Cleavage and polyadenylation complex A None 0 X  
1975 RNA (guanine-7-) methyltransferase (capping enzyme subunit) Capping enzyme A Numerous methyltrans-ferases (COG0500) but no ortholog 0 1  
2051 Nonsense-mediated mRNA decay complex, subunit 2 NMD complex A None 1 X  
2554 Pseudouridylate synthase ? A Most archaea and bacteria (COG0101) 1 1  
2613 Upf1p-interacting protein, NMD complex subunit Nmd3p NMD complex A All archaea, no bacteria (COG1499) 0 X  
2771 tRNA-specific adenosine-34 deaminase subunit Tad3p Heterodimeric RNA-specific deaminase A Most bacteria and some archaea (COG0590) 0 X  
2780 Protein involved in ribosomal large subunit assembly (RPF1), contains IMP4 domain Processosome A Most archaea, no bacteria (COG2136) 0 1  
2781 Subunit of the small (ribosomal) subunit (SSU) processosome (snoRNP), IMP4 Processosome A Most archaea, no bacteria (COG2136) 0 1  
2874 Protein involved in rRNA processing and ribosomal assembly ? A All archaea, no bacteria (COG1094) 0 1 Predicted RNA-binding protein containing KH domain
3013 Exosome subunit Rrp4 Exosome A Most archaea, on bacteria (COG1097) 0 X  
3031 Protein involved in large ribosome subunit assembly and 28S rRNA processing (Rrf2) Processosome A None 0 X Contains the BRIX domain
3322 RNAse P/MRP subunit, involved in processing of pre-tRNAs and the 5.8S rRNA RNAse P/MRP holoenzyme A None 0 1  
3448 Predicted snRNP core protein Spliceosome A All archaea, no bacteria (COG1958) 0 1  
3482 Small nuclear ribonucleoprotein (snRNP) SMF subunit Spliceosome A All archaea, no bacteria (COG1958) 0 0  
2463 Predicted RNA-binding protein, consisting of a PIN domain and a Zn-ribbon. Involved in 26S proteasome assembly 26S proteasome, pre-40S subunit A,O Represented by orthologs in all archaea but no bacteria (COG1349) 0 X PIN domain has been detected in exosome subunits and is thought to have RNA-binding properties or even nuclease activity [113, 114]. The demonstration of the role of this protein (Nob1p) in proteasome assembly [115], 40S ribosome subunit assembly, and the processing of 18S rRNA 3'-end [116] supports the connection between degradation of RNA and proteins that seems to have been established already in archaea [109].
3273 Predicted RNA-binding protein containing KH domain, interacts with Nob1p 26S proteasome, pre-40S subunit A,O Orthologs in all archaea but no bacteria (COG1094) 0 0 This is the second predicted RNA-binding protein involved in proteasome assembly, [115] which emphasizes the aforementioned link between RNA and protein processing
1831 Deadenylating 3'-5' exonuclease, negative regulator of PolII transcription CCR4-NOT core complex AK None 0 0  
1159 NADP-dependent flavoprotein reductase, probably sulfite reductase subunit ? CL Many bacteria (COG0369) 0 X Genetic evidence of a role in DNA replication [117]
1800 Ferredoxin/adrenodoxin reductase ? C Most bacteria and some archaea (COG0493) 0 X  
1173 Anaphase-promoting complex (APC), Cdc16 subunit (TPR-repeat protein) APC D Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of Cdc16 0 0  
3437 Anaphase-promoting complex (APC), subunit 10 APC D None 1 1  
1358 Serine palmitoyltransferase ? I Most bacteria and some archaea (COG0156) 0 0  
1511 Mevalonate kinase ? I Most archaea and some bacteria (COG1577) 0 X  
3059 N-acetylglucosaminyltransferase complex, subunit PIG-C/GPI2, involved in phosphatidylinositol biosynthesis N-acetylglucos-aminyltransferase complex I None 0 1  
0467 Translation elongation factor 2 paralog (GTPase) ? J All (COG0480) 0 X Involved in 60S ribosomal subunit maturation [118]
1147 Glutamyl-tRNA synthetase Multispecificity aminoacyl-tRNA synthetase complex J All (COG0008) 0 X  
2784 Phenylalanyl-tRNA synthetase, beta subunit Heterodimeric phenylalanyl-tRNA synthetase J All (COG0016) 0 X  
3123 Diphtamide synthase (methyltransferase) ? J All archaea, no bacteria (COG1798) 1 1  
0261 RNA polymerase III, largest subunit RNAPIII holoenzyme K All (COG0086) 0 X  
0262 RNA polymerase I, largest subunit RNAPI holoenzyme K All (COG0086) 0 X  
0215 RNA polymerase III, second largest subunit RNAPIII holoenzyme K All (COG0085) 0 X  
0216 RNA polymerase I, second largest subunit RNAPI holoenzyme K All (COG0085) 0 X  
1063 RNA polymerase II elongator complex, subunit ELP2, WD repeat protein RNA polymerase II elongator complex K WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319) 1 X  
1131 RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, 5'-3' helicase subunit RAD3 RNAPII holoenzyme K Most archaea and bacteria (COG1199) 0 X  
1920 RNA polymerase II Elongator subunit RNAP II elongator complex K None 1 X  
1932 TBP-associated factor (Taf2p) TFIID complex K None 0 X  
2009 Transcription initiation factor TFIIIB, Bdp1 subunit (Myb domain) TFIIIB K None 0 0  
2076 RNA polymerase III transcription factor TFIIIC, TPR-repeat-containing protein TFIIIC K Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of TFIIC 0 X  
2487 RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB4 TFIIH K None 0 1  
2691 RNA polymerase II subunit 9 RNAP II holoenzyme K Most archaea, no bacteria (COG1594) 1 X  
2807 RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, SSL1 subunit TFIIH K No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins 0 0 Consists of a von Willebrand A domain most closely related to those in the proteasome subunit RPN10 [119] and a Zn-finger domain
2907 RNA polymerase I transcription factor TFIIS, subunit A12.2/RPA12 TFIIS K All archaea, no bacteria (COG1594) 1 0  
3169 RNA polymerase II transcriptional regulation mediator Mediator complex [120] K None 0 X  
3233 RNA polymerase III subunit C34 RNAP III holoenzyme K None 0 1  
3297 RNA polymerase III subunit C25 RNAP III holoenzyme K All archaea, no bacteria (COG1095) 0 0  
3438 Subunit common to RNA polymerases I (A) and III (C); Rpc19p RNAP I and III holoenzymes K   0 1  
3471 RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB2 TFIIH K None 0 X  
3490 Transcription elongation factor SPT4, Zn-ribbon protein Chromatin-associated transcription complexes K None 1 1  
3497 RNA polymerase II subunit; Rpb10p RNAP II holoenzyme K All archaea, no bacteria (COG1644) 0 X  
3901 Transcription initiation factor IID subunit (Taf13p) TFIID K None 0 X  
3949 RNA polymerase II elongator complex, subunit ELP4 RNAP II elongator complex K None 1 1  
4086 SOH1 protein potentially involved in Pol II transcription regulation and repair SMCC complex [121] K None 1 X  
1532 Predicted GTPase of the XAB1 family [122] TBP-free TAF(II) complex L All archaea and several bacteria (COG1100) 0 0 XP-A-binding protein in humans, thus implicated in repair ([122] and references therein).
1533 Predicted GTPase of the XAB1 family (paralog of KOG1757) [122] TBP-free TAF(II) complex? L All archaea and several bacteria (COG1100) 0 X Might have a function in repair given the paralogous relationship with KOG1757.
1625 DNA polymerase α processivity subunit, inactivated phosphatase DNA polymerase α holoenzyme L Small subunit of archaeal DNA polymerase II (COG1311) 0 0 The small, regulatory subunit of DNA polymerase α also forms a pan-eukaryotic KOG3044, which is a paralog of KOG0861 (the only recent duplication in KOG3044 is seen in vertebrates). In contrast, another paralog, the small subunit of DNA polymerase ε, is represented in animals, fungi and the early-branching protozoan Plasmodium, but not in plants or Microsporidia. Thus, the history of this polymerase subunit apparently involved inactivation of the phosphatase (or nuclease) inherited from archaea, with subsequent duplications at early stages of eukaryotic evolution [123]
0479 DNA replication licensing factor MCM3 Pre-replication complex L All archaea, no bacteria (COG1241) 0 X  
0481 DNA replication licensing factor MCM5 Pre-replication complex L All archaea, no bacteria (COG1241) 0 X  
0482 DNA replication licensing factor MCM7 Pre-replication complex L All archaea, no bacteria (COG1241) 0 0  
0964 Structural maintenance of chromosome protein 3 (cohesin subunit SMC3) Sister chromatid cohesion complex L Many archaea and bacteria (COG1196) 0 X  
0979 Structural maintenance of chromosome protein 5 (cohesin subunit SMC5) Sister chromatid cohesion complex L Many archaea and bacteria (COG1196) 0 X  
1942 TBP-interacting protein TIP49 (DNA helicase) chromatin remodeling complex L Most of the archaea, no bacteria (COG1224) 0 0  
1979 DNA mismatch repair ATPase, MLH1 Mismatch repair complex L Most bacteria and some archaea (COG0323) 1 1  
2267 DNA primase, large subunit DNA polymerase α:primase complex L All archaea, no bacteria (COG2219) 0 0  
2299 Ribonuclease HI Replisome L All archaea, most bacteria (COG0164) 1 X  
2310 DNA repair exonuclease MRE11 MRN complex involved in double-strand break repair L All archaea, most bacteria (COG0420) 1 1  
2929 Origin recognition complex, subunit 2 (ORC2) ORC L None 1 1  
0179 20S proteasome, regulatory subunit beta type PSMB1/PRE7 (paralog of KOG0185) 20S proteasome O All archaea but only actinomycetes among bacteria (COG0638) 0 0  
0185 20S proteasome, regulatory subunit beta type PSMB4/PRE4 (paralog of KOG0179) 20S proteasome O All archaea but only actinomycetes among bacteria (COG0638) 0 0  
2708 Predicted metalloprotease with chaperone activity (RNAse H/HSP70 fold) [124] Putative complex involved in translation regulation [125] O Represented by orthologs in all archaea and bacteria (COG0533) 0 X One of the few remaining uncharacterized proteins that are universally conserved in all cellular life forms. The only experimentally demonstrated activity is that of sialoglycoprotease but fusion with a distinct protein kinase in several archaea and analysis of gene neighborhood suggest a fundamental role in signal transduction, possibly translation regulation. [125]
0301 Protein required for normal rates of ubiquitin-dependent proteolysis, contains WD40 repeats Proteasome? O Same as above (COG2319) 1 X  
0358 Chaperonin complex component, TCP-1 delta subunit (CCT4) TCP-1 O All archaea and nearly all bacteria (COG0459) 0 0  
0363 Chaperonin complex component, TCP-1 beta subunit (CCT2) TCP-1 O All archaea and nearly all bacteria (COG0459) 0 0  
0687 26S proteasome regulatory complex, subunit RPN7/PSMD6 26S proteasome O None 0 0  
1299 Vacuolar sorting protein VPS45/Stt10 (Sec1 family) t-SNARE complex O None 1 X Involved in t-SNARE complex assembly [126]
1349 GPI-anchor transamidase complex, GPI8 subunit GPI-anchor transamidase complex O Distantly related proteases in some bacteria (no COG) 0 1  
1943 Beta-tubulin folding cofactor D, involved in chromosome segregation ? O None 1 1  
2015 NEDD8-activating complex, UBA3 subunit NEDD8-activating complex O Most bacteria and some archaea (COG0476) 1 1  
2126 Phosphoethanolamine N-methyltransferase involved in GPI-anchor biosynthesis ? O Several bacteria and archaea (COG1524) 0 X  
2884 26S proteasome regulatory complex, subunit RPN10/PSMD4 26S proteasome regulatory complex O No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins 1 1 Contains von Willebrand A domain
2908 26S proteasome regulatory complex, subunit RPN9/PSMD13 26S proteasome regulatory complex O None 0 0 Contains PINT domain
0209 Endoplasmic reticulum membrane P-type ATPase ? P Many bacteria and some archaea (COG0474) 1 X  
3379 Uncharacterized member of the histidine triad superfamily of nucleotide hydorlases ? R Most archaea and bacteria (COG0537) 1 X Only biochemical function predicted.
2635 Coatomer (COPI) complex delta subunit COPI complex U None 0 0  
2927 Membrane component of ER protein translocation apparatus (Sec62) Sec complex U None 0 1  
2978 Dolichol-phosphate mannosyltransferase ? U All archaea, most bacteria (COG0463) 0 X  
3198 Signal recognition particle, subunit Srp19 Signal recognition particle U All archaea, no bacteria (COG1400) 0 X  
3315 Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking TRAPP U None 0 X  
3369 Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking TRAPP U None 0 X  
1992 Nuclear export receptor CSE1/CAS (importin beta) ? YU None 0 X  
New functional predictions       
2316 PP-loop family ATP pyrophosphatase domain, which in fungi, plants and insects is fused to a duplicated translation inhibitor domain. The fusion, along with the phyletic pattern of the PP-ATPase domain, suggests an essential function in translation regulation ? A Orthologs of the PP-loop domain are present in all archaea (COG2102) but not in bacteria. Orthologs of the translation inhibitor domain are found in most bacteria and several archaea (COG0251) 1 X PP-loop ATPases have been previously implicated in base thiolation in various RNAs [127] and proteins in this K/COG might have a similar function, which is likely to be conserved in eukaryotes and archaea. However, the fusion with translation inhibitor, which has been reported to have endoribonuclease activity [128] is a eukaryote-specific feature
2523 Predicted RNA-binding protein containing a PUA domain, probable role in RNA modification [129] Putative novel RNA modification complex A Orthologs present in all archaea (COG2016) but not in bacteria 1 X Several of the archaeal orthologs of this protein form fusions with a PP-loop ATPase domain implicated in base thiolation [127]. Thus, the proteins of this KOG might interact with those of KOG2840 (pan-eukaryotic, duplications in Arabidopsis and worm) or KOG2594 (missing in humans and microsporidia) to form a novel enzymatic complex involved in RNA modification
0270, 0271, 1539 WD40-repeat proteins Processosome A WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319) all 0 X,1,X By analogy with other conserved WD40-repeat proteins, predicted to be subunits of rRNA processing/ribosome assembly complexes
2321 Nucleolar protein, contains WD40 repeats rRNA processosome? A WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319) 0 1 Probable subunit of an rRNA-processing complex
1763 Uncharacterized conserved protein containing a CCCH Zn-finger; possible role in RNA processing or splicing ? A None 1 1 CCCH fingers have been shown to bind 3' untranslated regions in various mRNAs [130, 131]
2837 Protein containing a U1-type, RNA-binding C2H2 Zn-finger. Probable role in RNA splicing/processing Spliceosome? A None 0 0 U1-type fingers are essential for the assembly of U1 RNP [132]
3073 Predicted RNA-binding protein containing PIN domain and involved in 18S rRNA processing Pre-40S subunit A Most archaea, no in bacteria (COG1412) 0 1 Interacts with Nop14p and is required for 40S subunit biogenesis and 18S rRNA maturation (11694595). The presence of the PIN domain suggests RNA-binding and, possibly, RNAse activity
3154 Uncharacterized protein with potential function in translation or ribosomal biogenesis Pre-40S subunit? A? Most archaea, no bacteria (COG2042) 1 X The general functional prediction stems from the observation that the gene for this protein forms a predicted conserved operon with the gene for ribosomal protein L40E in several archaeal genomes
3214 Small protein containing a Zn-ribbon, possibly RNA-binding; potential role in RNA processing or transcription regulation ? A? Conserved in Crenarchaeota (COG4888) 1 1  
3800 Predicted E3 ubiquitin ligase containing RING finger, subunit of transcription/repair factor TFIIH and CDK-activating kinase assembly factor TFIIH KO None 0 X  
3176 Predicted α-helical protein, possibly involved in replication/repair; paralog of KOG3636 A novel complex with PCNA involved in replication? L? Conserved in most (possibly all) archaea but not in bacteria (COG1711) 0 X A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished work)
3303 Predicted α-helical protein, possibly involved in replication/repair transcription; paralog of KOG3508 A novel complex with PCNA involved in replication? L? Conserved in most (possibly all) archaea but not in bacteria (COG1711) 0 0 A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished.work)
0396 Predicted E3 ubiquitin ligase Ub ligase O None 1 1 The proteins in this KOG contain a modified RING domain, which might not be capable of metal-binding similarly to the U-box domain [133] that has been shown to function as E3 [134]
1443 Multitransmembrane protein, predicted drug/metabolite transporter ? R Most archaea and bacteria (COG0697) 1 X  
2647 Multitransmembrane protein, potential transporter ? R Most bacteria and some archaea (COG0628) 0 1  
2488 Predicted N-acetyltransferase ? R Most archaea and bacteria (COG0454) 1 X Putative role in ribosomal maturation?
3347 Predicted nucleotide kinase; nuclear protein (Fap7p) ? R Conserved in all archaea but not in bacteria (COG1936) 0 1 Involved in oxidative stress reponse in yeast [135]
3974 Predicted sugar kinase Putative novel complex with KOG2585 proteins R All archaea and most bacteria (COG0063) 1 1 Based on fusions seen in prokaryotes, predicted to interact functionally and, possibly, physically with uncharacterized proteins of KOG2585 (represented in all eukaryotes but includes paralogs in some species)
No functional prediction       
2318 Uncharacterized conserved protein ? S None 0 1  
3237 Uncharacterized conserved protein containing coiled-coil domain ? S None 0 1 Coiled-coil domains are often involved in complex assembly; this could be an uncharacterized component of the chromatin or the spliceosome
  1. *Abbreviations for the functional categories are as in Figure 3. 0, essential gene (lethal knockout); 1, non-essential gene (non-lethal knockout); X indicates that no data is available for the given gene. Data from [85]. §Data from [86].