Identification of plant-specific proteins. (a) Classification of Arabidopsis proteins based on their pattern of sequence similarity to other organisms. The 27,288 Arabidopsis proteins were classified on the basis of their phylogenetic profiles (PP). Each PP recorded whether similar sequences were found or not found in the protein sets from the following organisms: Homo sapiens, Rattus norvegicus, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, Schizosaccharomyces pombe, Saccharomyces cerevisiae, a combined set of 88 species of Bacteria, and a combined set of 16 species of Archaea. Not drawn to scale. (b) Identification of putative plant-specific proteins. The Arabidopsis proteins that lack similarity to any other organism (7,868 proteins represented in the black circle in (a)) were compared against sequences in the expressed sequence tag (EST) database of Arabidopsis and 13 other plant species. A total of 3,848 Arabidopsis proteins were identified as plant specific because they showed sequence similarity to proteins in the Arabidopsis EST database and to proteins in EST databases of at least four other plant species (at E-value ≤ 10-10). In addition, 892 other Arabidopsis proteins show similarity to the Arabidopsis and one to three other plant EST databases, 2,691 Arabidopsis proteins exhibit similarity to sequences in the Arabidopsis EST database only, and 437 lack similarity to any sequence in the EST databases used.