- Open Access
Quod erat demonstrandum?The mystery of experimental validation of apparently erroneous computational analyses of protein sequences
© Iyer et al, licensee BioMed Central Ltd 2001
- Received: 3 July 2001
- Accepted: 4 October 2001
- Published: 13 November 2001
Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment.
We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the β-lactamase-like superfamily of metal-dependent hydrolases.
In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts.
- Computational Prediction
- Viral Movement Protein
- Archaeal Methanogen
- Polysaccharide Hydrolase
- Bacterial Orthologs
The availability of a large number of protein sequences, including complete protein sets encoded in diverse genomes, and the rapidly growing database of protein structures have already greatly impacted on our understanding of the evolution of protein structure and function [1,2]. This process has been aided by the development of powerful algorithms and sensitive computational tools for detecting sequence and structural similarities between proteins. In particular, methods that extract information from multiple alignments to construct various types of sequence profiles and use the resulting sequence profiles for iterative database searching, such as PSI-BLAST and Hidden-Markov-Model (HMM)-based approaches, have substantially improved the detection of subtle similarities between proteins that previously were amenable only to direct structural comparison [3,4]. The sensitivity and accuracy of these methods have been extensively tested and statistical approaches for validating the observed similarities are available [5,6,7,8,9,10,11].
Despite these achievements, detection and interpretation of relationships between homologous proteins that have limited sequence similarity remains a major challenge. Such studies typically require a case-by-case approach that is guided by a detailed understanding of protein sequence-structure patterns and is rooted in the biology of the proteins analyzed. Prediction of structures and function(s) of uncharacterized proteins is one of the principal outcomes of these analyses, and experimental verification of such predictions tends to increase confidence in the validity of sequence-structure comparative approaches. The negative feedback from experiments that failed to confirm a computational prediction is potentially even more important, because it could result in revision and refinement of the computational methods.
When examining cases of reported prediction followed by experimental validation, however, we encountered several paradoxical situations. In each of these, a prediction that has been reportedly confirmed by experiment was incompatible with results obtained with several standard computational procedures. More importantly, alternative predictions, supported by statistically significant sequence and/or structural similarity, were made in some of these cases. Here we present several such mysteries, describe the refutation of the original predictions and the new predictions, wherever feasible, and discuss the discrepancy between the computational and experimental results. The choice of the cases was not systematic; rather, those chosen were notable because they relied on novel computational techniques, exploited particularly subtle sequence or structural motifs, and dealt with crucial biological problems.
MJ1477: a predicted archaeal cysteinyl-tRNA synthetase
Aminoacyl-tRNA synthetases (aaRSs) specific for 17 of the 20 amino acids are universally present in cellular life forms. The three exceptions are GlnRS, AsnRS and CysRS. GInRS and AsnRS are missing in many bacteria and archaea because glutamine and asparagine are incorporated into proteins through transamidation of glutamate and aspartate, respectively. CysRS is missing in two archaeal methanogens whose genomes have been sequenced - Methanobacterium thermoautotrophicum and Methanococcus jannaschii . No alternative mechanism for cysteine incorporation into proteins is known; hence the absence of CysRS in these organisms was an enigma.
Two solutions to this puzzle, both unusual, have recently been proposed and experimentally validated. One involves non-orthologous gene displacement, a situation in which the same essential function is carried out by distantly related or even unrelated proteins in different organisms [13,14]. It has been shown that M. jannaschii ProRS, a class II synthetase that is unrelated to the class I CysRS, substituted for the missing CysRS activity [15,16,17]. The other solution involved a new candidate for the role of CysRS, the MJ1477 protein from M. jannaschii. This protein and its orthologs (direct evolutionary counterparts related by vertical descent from a common ancestor) from the bacteria Thermotoga maritima and Deinococcus radiodurans were identified as 'distant orthologs' of the Bacillus subtilis CysRS by using a computational method specifically designed to detect distantly related orthologs . The method is based on application of discriminant analysis to alignment scores, in order to separate the scores for pairs of functionally identical proteins from different genomes from the scores for proteins with different functions. This prediction was then validated experimentally by showing that MJ1477 had CysRS activity in vitro and that an ortholog of MJ1477 from D. radiodurans, DR0705, complemented a CysRS deficient, temperature-sensitive, lethal E. coli mutant strain . An important corollary of these surprising findings is a rapid divergence of the MJ1477 family from CysRS, such that all the catalytic and otherwise functionally important residues characteristic of this enzyme, and also present in other class I aaRSs, have changed. Furthermore, MJ1477 and its orthologs do not have the accessory domains found in all known CysRS, namely the DALR domain (named after a distinct amino-acid signature), which is shared by aaRSs of several specificities, and another domain specific to CysRS .
Therefore we are forced to conclude that MJ1477 and its homologs are not related to CysRS and there is nothing in the computational analysis of these proteins that would point to an aaRS activity. In contrast, we predict these proteins to be extracellular polygalactosaminidases or similar polysaccharide hydrolases. The polysaccharide hydrolase and aaRS functions seem to be essentially incompatible. First, a secreted enzyme is unlikely to function as an aaRS whose site of action is, by definition, intracellular. Second, even if an entirely new class of aaRSs is postulated, the reaction catalyzed by this new aaRS does not resemble polysaccharide hydrolysis or its reversal. Aminoacyl-tRNA synthetases catalyze a succession of reactions, which involve: hydrolysis of the α-β phosphate bond in ATP; condensation of AMP with the cognate amino acid, resulting in the formation of an aminoacyl-adenylate; displacement of the AMP moiety of the aminoacyl-adenylate with the cognate tRNA, producing aminoacyl-tRNA. Even if the two condensation reactions, in very general terms, could be considered a reversal of the polysaccharide hydrolysis reaction, there is no indication that polysaccharide hydrolases could bind and hydrolyze ATP, and the multiple alignment of the MJ1477 family did not include any conserved signatures typical of potential phosphate-binding loops (Figure 1). Neither does this family contain any recognizable RNA-binding domains. Finally, M. thermoautotrophicum does not encode any homologs of MJ1477, ruling out the possibility that this family encompasses CysRS of both archaeal methanogens. Taken together, these observations appear to effectively refute the prediction of a CysRS activity, thus pitting computational results against experimental data.
MJ0301: a predicted dihydropteroate synthase
MJ0757: a predicted thymidylate synthase
An alternative TS or its subunit is predicted to be encoded by a gene from Dictyostelium that rescues a slime mold mutant auxotrophic for thymidylate . This protein is not homologous to the canonical TS, but its orthologs in bacteria and archaea show an almost perfect complementary phyletic distribution (COG1351).
Cmpp16: a plant 'paralog' of plant viral movement proteins
Viral movement proteins (MPs) are encoded by diverse, unrelated families of plant viruses, such as positive-strand RNA, negative-strand RNA, single-stranded DNA and double-stranded DNA viruses, and are essential for cell-to-cell movement of all these viruses [31,32]. To isolate potential host homologs of the red clover necrotic mosaic virus (RCNMV) MP, antibodies to this protein were used to screen phloem extracts of Cucurbita maxima, resulting in the detection of a protein designated Cmpp16. This protein was identified as a 'paralog' (generally, this term refers to homologous genes related by duplication within the same genome) of the viral MPs on the basis of sequence similarity detected using the Megalign program . Subsequently, Cmpp16 was shown to bind RNA, which is a common property of viral MPs, and to induce an increase of the size-exclusion limit of plasmodesmata, also a mechanism associated with the MPs .
Human activating transcription factor-2 (ATF-2): a predicted histone acetyltransferase
Histone acetyltransferases (HAT) are key regulators of eukaryotic transcription. GCN5-like HATs, which modulate chromatin-associated transcription, belong to a vast superfamily of amino-group acetyl- and myristoyl-transferases with extremely diverse functions . ATF-2 is a basic leucine zipper (b-ZIP) family transcription factor that binds to cyclic AMP-response elements (CRE) and activates transcription . Vertebrate ATF-2 also has an amino-terminal zinc finger, which is involved in transcription activation . Non-vertebrate orthologs of ATF-2, in Drosophila, Caenorhabditis elegans and yeasts, lack the zinc finger. In experiments designed to isolate ATF-2-associated HAT, ATF-2 alone was shown to be sufficient for the acetyltransferase activity. Examining the region of ATF-2 that showed HAT activity, the authors found some sequence similarity and at least one motif resembling the acetyltransferase superfamily and concluded that ATF-2 contained a GCN5-like acetyltransferase domain . Subsequent site-directed mutagenesis supported the importance of the reported acetyltransferase motifs for the HAT activity of ATF-2.
Predicted PAS domain in the phytochrome-interacting transcription factor PIF3
PAS domains are sensory modules in various signal transduction proteins from all major lineages of cellular life . PAS domains are typically implicated in sensing oxygen, redox potential, light and small ligands . In addition, PAS domains are sites for protein-protein interactions and are responsible for the formation of homo- and hetero-dimers in several signal transduction pathways that involve transcriptional activation. A PAS domain has been reported in the transcription factor PIF3 from Arabidopsis, which interacts with a phytochrome photoreceptor and transduces light signals to photoresponsive plant genes . It has been hypothesized that the purported PAS domain of PIF3 directly interacts with the PAS domains of the phytochrome . This hypothesis was later tested experimentally and evidence was presented that the PAS domain of PIF3 indeed was a major contributor to the interaction between the two proteins .
In the six cases described above, we provide evidence for rejecting the homologous relationships and functional predictions inferred for the proteins in question by using computational methods. The number of examples in this category could be increased, and some have already been considered in the literature, for example the spurious discovery of a 'functional PDZ domain' in the molecular chaperone ClpA (, see refutation in ) or the finding of an ATPase domain and death effector domains in the apoptosis-associated protein FLASH (, see refutation in ). The common and most striking aspect of all these cases is that the predictions based on apparently erroneous computational analysis were supported by experiments. What are the solutions to this clash between computational and experimental evidence?
We envisage three main possibilities. The first, experiment-centered view would hold that experimental evidence always has the upper hand and that, even if the alternative computational solutions that we describe here seem more plausible than the original predictions, the latter are correct insofar as they are supported by experiment. Epistemologically, this argument is not sound because hypotheses (computational predictions in this case) cannot be proved by the success of the experiments they prompt. They can only be falsified by experiments producing results incompatible with the predictions . Simply put, the experiments could have worked for a wrong reason. For example, this seems particularly likely in the case of the site-directed mutagenesis of the transcription factor ATF-2 discussed above. The mutagenized residues probably are indeed important for the function of this protein, but not because they are part of a GCN5-like acetyltransferase domain, which this protein does not contain. Similar logic applies to the case of the predicted, but apparently nonexistent, PAS domain in the transcription factor PIF3. More important, however, computational predictions are falsifiable within the realm of computational analysis itself. Falsification is offered by alternative, unequivocally supported predictions that are incompatible with the original ones. In four of the six cases described (CysRS, DHPS, TS and MP), such evidence was obtained by computational methods.
The second possibility is that, although the computational predictions described here are correct, whereas the original ones are wrong, the experimental evidence is also solid. In each of the described cases, this would elevate the biochemical activities identified through these experiments to the status of major, unexpected discoveries, because the chemistry underlying them would have to be extremely unusual. In particular, if the identification of the M. jannaschii cysteinyl-tRNA synthetase is indeed correct, this enzyme would have to be a derivative of a specific family of polysaccharide hydrolases containing a signal peptide but no recognizable ATP-binding or RNA-binding domains.
The third explanation is that the original computational predictions triggered over-interpretation of the experimental results that, in reality, might have been obtained as a result of nonspecific activities, contamination or other artifacts. In this regard, it is important to realize that not only computational predictions, but biological experiments also, are intrinsically error-prone and open to conflicting interpretations. The probabilistic nature of computational analyses is well realized (and at times, perhaps, overrated) by most researchers, probably because explicit calculation of probability or likelihood is at the core of most widely used computer methods for sequence and structure analyses. In this regard, it is prudent to note that the alternative computational predictions presented here should be considered to be 'more likely' than the original ones, rather than to contradict the latter in an absolute sense. As we attempted to show above, however, the difference in the likelihood of two mutually incompatible predictions can be overwhelming, with one supported by multiple lines of evidence as opposed to the other. In contrast to computational studies, experimental ones are often, consciously or unconsciously, treated as demonstration of 'final truth'. In reality, however, probabilistic inference is inherent in practically any interpretation of experimental results when questions are asked such as "How likely is it that the protein under study has a particular biochemical activity in vivo?" or "How central is this activity for the in vivo function of the protein under study, given the results of a surrogate in vitro assay?" Thus, certain experimental designs may not be appropriate to ascertain the actual in vivo biochemistry of a protein. Furthermore, even if the particular activities detected under these conditions are genuine, the likelihood of these being relevant in vivo needs to be additionally assessed. Accordingly, when strong computational predictions seem not to be borne out by experiment, the conditions and design of the experiments deserve special scrutiny: they might have given a negative result for a wrong reason. A case in point is the MJ0107 protein, the apparent archaeal ortholog of DHPS, which failed to show dihydropteroate synthase activity . We strongly believe that this issue needs to be revisited. All this considered, the results of independent application of computational and experimental techniques tend to be complementary, and useful in adding or reducing confidence in the biological conclusions of a particular study.
Finally, it should be emphasized that these cautionary notes on application of computational methods in protein function prediction in no way suggest that new computational approaches that depart sharply from more established ones are doomed to failure. Indeed, the most popular advanced search methods based on sequence profiles - PSI-BLAST and Hidden Markov Model (HMM) search - are rather recent innovations [11,51,52]. Furthermore, methods based on a different principle, such as protein sequence-structure threading, have a recent history of success despite uncertainties in their statistical foundations [22,53,54,55,56]. It does seem, however, that when a structurally and functionally plausible prediction is produced, with a high confidence, by a well tested, statistically sound computational method, an incompatible prediction yielded by a new method without a clear statistical foundation is most likely to be incorrect.
The non-redundant protein-sequence database at the National Center for Biotechnology Information (NCBI) was searched using the gapped version of the BLAST program . Sequence-profile searches were carried out using the PSI-BLAST program, with the cut-off for inclusion of sequences into the profile set at E = 0.01 [3,9], and the HMMer program package . Multiple alignments of amino-acid sequences were generated using the T_Coffee program . Protein secondary-structure predictions were generated using the PHD program [59,60], with multiple alignments of individual protein families used as queries. Sequence-structure threading was carried out using the combined-fold-prediction algorithm  or the 3D-PSSM algorithm based on the use of a three-dimensional position-specific scoring matrix . Signal peptides in protein sequences were predicted using the SignalP program . The COG database [62,63] was used as a source of information on orthologous relationships between proteins.
- Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y: Predicting function: from genes to genomes and back. J Mol Biol. 1998, 283: 707-725. 10.1006/jmbi.1998.2144.PubMedView ArticleGoogle Scholar
- Koonin EV, Aravind L, Kondrashov AS: The impact of comparative genomics on our understanding of evolution. Cell. 2000, 101: 573-576.PubMedView ArticleGoogle Scholar
- Aravind L, Koonin EV: Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol. 1999, 287: 1023-1040. 10.1006/jmbi.1999.2653.PubMedView ArticleGoogle Scholar
- Murzin AG: Progress in protein structure prediction. Nat Struct Biol. 2001, 8: 110-112. 10.1038/84088.PubMedView ArticleGoogle Scholar
- Karlin S, Bucher P, Brendel V, Altschul SF: Statistical methods and insights for protein and DNA sequences. Annu Rev Biophys Biophys Chem. 1991, 20: 175-203. 10.1146/annurev.bb.20.060191.001135.PubMedView ArticleGoogle Scholar
- Karlin S, Brendel V: Chance and statistical significance in protein and DNA sequence analysis. Science. 1992, 257: 39-49.PubMedView ArticleGoogle Scholar
- Karlin S, Altschul SF: Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci USA. 1993, 90: 5873-5877.PubMedPubMed CentralView ArticleGoogle Scholar
- Karlin S: Statistical studies of biomolecular sequences: score-based methods. Phil Trans R Soc Lond B. 1994, 344: 391-402.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Bundschuh R, Olsen R, Hwa T: The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 2001, 29: 351-361. 10.1093/nar/29.2.351.PubMedPubMed CentralView ArticleGoogle Scholar
- Durbin R, Eddy S, Krogh A, Mitchison G: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, MA: Cambridge University Press;. 1998View ArticleGoogle Scholar
- Ibba M, Soll D: The renaissance of aminoacyl-tRNA synthesis. EMBO Rep. 2001, 2: 382-387.PubMedPubMed CentralView ArticleGoogle Scholar
- Koonin EV, Mushegian AR, Bork P: Non-orthologous gene displacement. Trends Genet. 1996, 12: 334-336. 10.1016/0168-9525(96)20010-1.PubMedView ArticleGoogle Scholar
- Galperin MY, Walker DR, Koonin EV: Analogous enzymes: independent inventions in enzyme evolution. Genome Res. 1998, 8: 779-790.PubMedGoogle Scholar
- Stathopoulos C, Li T, Longman R, Vothknecht UC, Becker HD, Ibba M, Soll D: One polypeptide with two aminoacyl-tRNA synthetase activities. Science. 2000, 287: 479-482. 10.1126/science.287.5452.479.PubMedView ArticleGoogle Scholar
- Lipman RS, Sowers KR, Hou YM: Synthesis of cysteinyl-tRNA(Cys) by a genome that lacks the normal cysteine-tRNA synthetase. Biochemistry. 2000, 39: 7792-7798. 10.1021/bi0004955.PubMedView ArticleGoogle Scholar
- Stathopoulos C, Jacquin-Becker C, Becker HD, Li T, Ambrogelly A, Longman R, Soll D: Methanococcus jannaschii prolyl-cysteinyl-tRNA synthetase possesses overlapping amino acid binding sites. Biochemistry. 2001, 40: 46-52. 10.1021/bi002108x.PubMedView ArticleGoogle Scholar
- Fabrega C, Farrow MA, Mukhopadhyay B, de Crecy-Lagard V, Ortiz AR, Schimmel P: An aminoacyl tRNA synthetase whose sequence fits into neither of the two known classes. Nature. 2001, 411: 110-114. 10.1038/35075121.PubMedView ArticleGoogle Scholar
- Wolf YI, Aravind L, Grishin NV, Koonin EV: Evolution of aminoacyl-tRNA synthetases - analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res. 1999, 9: 689-710.PubMedGoogle Scholar
- Nielsen H, Brunak S, von Heijne G: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng. 1999, 12: 3-9. 10.1093/protein/12.1.3.PubMedView ArticleGoogle Scholar
- Tamura J-I, Kaname H, Kadowaki K, Igarashi Y, Kodama T: Molecular cloning and sequence analysis of the gene encoding an endo α-1,4 polygalactosaminidase of Pseudomonas sp. 881. J Ferment Bioeng. 1995, 80: 305-310. 10.1016/0922-338X(95)94196-X.View ArticleGoogle Scholar
- Fischer D: Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput. 2000, 119-130.Google Scholar
- Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol. 2000, 299: 499-520. 10.1006/jmbi.2000.3741.PubMedView ArticleGoogle Scholar
- Hampele IC, D'Arcy A, Dale GE, Kostrewa D, Nielsen J, Oefner C, Page MG, Schonfeld HJ, Stuber D, Then RL: Structure and function of the dihydropteroate synthase from Staphylococcus aureus. J Mol Biol. 1997, 268: 21-30. 10.1006/jmbi.1997.0944.PubMedView ArticleGoogle Scholar
- Xu H, Aurora R, Rose GD, White RH: Identifying two ancient enzymes in Archaea using predicted secondary structure alignment. Nat Struct Biol. 1999, 6: 750-754. 10.1038/11525.PubMedView ArticleGoogle Scholar
- COG: Phylogenetic classification of proteins encoded in complete genomes. [http://www.ncbi.nlm.nih.gov/COG/]
- Aurora R, Rose GD: Seeking an ancient enzyme in Methanococcus jannaschii using ORF, a program based on predicted secondary structure comparisons. Proc Natl Acad Sci USA. 1998, 95: 2818-2823. 10.1073/pnas.95.6.2818.PubMedPubMed CentralView ArticleGoogle Scholar
- Aravind L: An evolutionary classification of the metallo-β-lactamase fold proteins. In Silico Biology. 1998, 1: 8-[http://www.bioinfo.de/isb/1998/01/0008/]Google Scholar
- Matthews DA, Appelt K, Oatley SJ: Crystal structure of Escherichia coli thymidylate synthase with FdUMP and 10-propargyl-5,8-dideazafolate. Adv Enzyme Regul. 1989, 29: 47-60. 10.1016/0065-2571(89)90093-9.PubMedView ArticleGoogle Scholar
- Dynes JL, Firtel RA: Molecular complementation of a genetic marker in Dictyostelium using a genomic DNA library. Proc Natl Acad Sci USA. 1989, 86: 7966-7970.PubMedPubMed CentralView ArticleGoogle Scholar
- Mushegian AR, Koonin EV: Cell-to-cell movement of plant viruses. Insights from amino acid sequence comparisons of movement proteins and from analogies with cellular transport systems. Arch Virol. 1993, 133: 239-257.PubMedView ArticleGoogle Scholar
- Melcher U: The '30K' superfamily of viral movement proteins. J Gen Virol. 2000, 81: 257-266.PubMedView ArticleGoogle Scholar
- Xoconostle-Cazares B, Xiang Y, Ruiz-Medrano R, Wang HL, Monzer J, Yoo BC, McFarland KC, Franceschi VR, Lucas WJ: Plant paralog to viral movement protein that potentiates transport of mRNA into the phloem. Science. 1999, 283: 94-98. 10.1006/aphy.2000.6050.PubMedView ArticleGoogle Scholar
- CD-search. [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi]
- Ponting CP, Parker PJ: Extending the C2 domain family: C2s in PKCs delta, epsilon, eta, theta, phospholipases, GAPs, and perforin. Protein Sci. 1996, 5: 162-166.PubMedPubMed CentralView ArticleGoogle Scholar
- Xoconostle-Cazares B, Ruiz-Medrano R, Lucas WJ: Proteolytic processing of CmPP36, a protein from the cytochrome b(5) reductase family, is required for entry into the phloem translocation pathway. Plant J. 2000, 24: 735-747. 10.1046/j.1365-313X.2000.00916.x.PubMedView ArticleGoogle Scholar
- Neuwald AF, Landsman D: GCN5-related histone N-acetyl-transferases belong to a diverse superfamily that includes the yeast SPT10 protein. Trends Biochem Sci. 1997, 22: 154-155. 10.1016/S0968-0004(97)01034-7.PubMedView ArticleGoogle Scholar
- Hai TW, Liu F, Coukos WJ, Green MR: Transcription factor ATF cDNA clones: an extensive family of leucine zipper proteins able to selectively form DNA-binding heterodimers. Genes Dev. 1989, 3: 2083-2090.PubMedView ArticleGoogle Scholar
- Nagadoi A, Nakazawa K, Uda H, Okuno K, Maekawa T, Ishii S, Nishimura Y: Solution structure of the transactivation domain of ATF-2 comprising a zinc finger-like subdomain and a flexible subdomain. J Mol Biol. 1999, 287: 593-607. 10.1006/jmbi.1999.2620.PubMedView ArticleGoogle Scholar
- Kawasaki H, Schiltz L, Chiu R, Itakura K, Taira K, Nakatani Y, Yokoyama KK: ATF-2 has intrinsic histone acetyltransferase activity which is modulated by phosphorylation. Nature. 2000, 405: 195-200. 10.1016/S0168-9002(96)01050-9.PubMedView ArticleGoogle Scholar
- Wootton JC: Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem. 1994, 18: 269-285. 10.1016/0097-8485(94)85023-2.PubMedView ArticleGoogle Scholar
- Taylor BL, Zhulin IB: PAS domains: internal sensors of oxygen, redox potential, and light. Microbiol Mol Biol Rev. 1999, 63: 479-506.PubMedPubMed CentralGoogle Scholar
- Anantharaman V, Koonin EV, Aravind L: Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains. J Mol Biol. 2001, 307: 1271-1292. 10.1006/jmbi.2001.4508.PubMedView ArticleGoogle Scholar
- Ni M, Tepperman JM, Quail PH: PIF3, a phytochrome-interacting factor necessary for normal photoinduced signal transduction, is a novel basic helix-loop-helix protein. Cell. 1998, 95: 657-667.PubMedView ArticleGoogle Scholar
- Zhu Y, Tepperman JM, Fairchild CD, Quail PH: Phytochrome B binds with greater apparent affinity than phytochrome A to the basic helix-loop-helix factor PIF3 in a reaction requiring the PAS domain of PIF3. Proc Natl Acad Sci USA. 2000, 97: 13419-13424. 10.1073/pnas.230433797.PubMedPubMed CentralView ArticleGoogle Scholar
- Levchenko I, Smith CK, Walsh NP, Sauer RT, Baker TA: PDZ-like domains mediate binding specificity in the Clp/Hsp100 family of chaperones and protease regulatory subunits. Cell. 1997, 91: 939-947.PubMedView ArticleGoogle Scholar
- Neuwald AF, Aravind L, Spouge JL, Koonin EV: AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res. 1999, 9: 27-43.PubMedGoogle Scholar
- Imai Y, Kimura T, Murakami A, Yajima N, Sakamaki K, Yonehara S: The CED-4-homologous protein FLASH is involved in Fas-mediated activation of caspase-8 during apoptosis. Nature. 1999, 398: 777-785. 10.1038/19709.PubMedView ArticleGoogle Scholar
- Koonin EV, Aravind L, Hofmann K, Tschopp J, Dixit VM: Apoptosis. Searching for FLASH domains. Nature. 1999, 401: 662-663. 10.1038/44317.PubMedView ArticleGoogle Scholar
- Popper K: The Logic of Scientific Discovery. New York/London: Routledge;. 1999Google Scholar
- Altschul SF, Koonin EV: PSI-BLAST - a tool for making discoveries in sequence databases. Trends Biochem Sci. 1998, 23: 444-447. 10.1016/S0968-0004(98)01298-5.PubMedView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Bryant SH, Altschul SF: Statistics of sequence-structure threading. Curr Opin Struct Biol. 1995, 5: 236-244. 10.1016/0959-440X(95)80082-4.PubMedView ArticleGoogle Scholar
- Jones DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol. 1999, 287: 797-815. 10.1006/jmbi.1999.2583.PubMedView ArticleGoogle Scholar
- Panchenko A, Marchler-Bauer A, Bryant SH: Threading with explicit models for evolutionary conservation of structure and sequence. Proteins. 1999, 37: 133-140. 10.1002/(SICI)1097-0134(1999)37:3+<133::AID-PROT18>3.3.CO;2-4.View ArticleGoogle Scholar
- Panchenko AR, Marchler-Bauer A, Bryant SH: Combination of threading potentials and sequence profiles improves fold recognition. J Mol Biol. 2000, 296: 1319-1331. 10.1006/jmbi.2000.3541.PubMedView ArticleGoogle Scholar
- Eddy SR: Hidden Markov models. Curr Opin Struct Biol. 1996, 6: 361-365. 10.1016/S0959-440X(96)80056-X.PubMedView ArticleGoogle Scholar
- Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.PubMedView ArticleGoogle Scholar
- Rost B, Schneider R, Sander C: Protein fold recognition by prediction-based threading. J Mol Biol. 1997, 270: 471-480. 10.1006/jmbi.1997.1101.PubMedView ArticleGoogle Scholar
- Rost B, Sander C, Schneider R: PHD - an automatic mail server for protein secondary structure prediction. Comput Appl Biosci. 1994, 10: 53-60.PubMedGoogle Scholar
- Nielsen H, Engelbrecht J, Brunak S, von Heijne G: A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int J Neural Syst. 1997, 8: 581-599. 10.1142/S0129065797000537.PubMedView ArticleGoogle Scholar
- Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science. 1997, 278: 631-617. 10.1126/science.278.5338.631.PubMedView ArticleGoogle Scholar
- Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29: 22-28. 10.1093/nar/29.1.22.PubMedPubMed CentralView ArticleGoogle Scholar