Visualization of the phylogenetic content of five genomes using dekapentagonal maps
© Zhaxybayeva et al.; licensee BioMed Central Ltd. 2004
Received: 4 November 2003
Accepted: 13 January 2004
Published: 16 February 2004
The methods presented here summarize phylogenetic relationships of genomes in visually appealing and informative figures. Dekapentagonal maps depict phylogenetic information for orthologous genes present in five genomes, and provide a pre-screen for putatively horizontally transferred genes. If the majority of individual gene phylogenies are unresolved, bipartition histograms provide a means of uncovering and analyzing the plurality consensus. Analyses of genomes representing five photosynthetic bacterial phyla and of the prokaryotic contributions to the eukaryotic cell illustrate the utility of the methods.
Transfer of genetic information between divergent organisms has turned the tree of life into a net or web , and genomes into mosaics. Different parts of genomes have different histories; therefore representing the history of genome evolution as a single tree appears inconsistent with the data. Nevertheless, the assumption of a tree-like process still underlies many approaches. Recently, we developed a tool that provides an assessment and graphic illustration of the mosaic nature of microbial genomes . The tool is based on maximum likelihood (ML) mapping developed by Korbinian Strimmer and Arndt von Haeseler . They utilized Bayesian posterior probabilities to assess the phylogenetic information contained in an alignment of four homologous sequences. With four sequences there are only three possible tree topologies, and thus the three posterior probabilities corresponding to these three trees must sum to one. Utilizing a barycentric coordinate system, the resulting probability vector is represented as a point in an equilateral triangle, where the distances of the point to the three sides represent the three probabilities. Strimmer and von Haeseler applied this approach to depict the phylogenetic information content present in a multiple sequence alignment. We adapted this approach to represent the phylogenetic information content present in four completely sequenced genomes (for details and methodology see ; for an extension that improves taxon sampling and uses bootstrap support values see ). Unfortunately, this approach is limited to the analysis of only four genomes at a time. In many instances, it is interesting to compare more than four genomes simultaneously (for example ). The number of possible tree topologies for N taxa is (2N - 5)!/ [2N-3(N - 2)!] , and therefore rises dramatically as N increases. There are 15 possible unrooted tree topologies for five taxa, 105 for six taxa, and so on. Creating a visually appealing graphic representation poses a difficult challenge.
Here we report a new mapping approach to visualize data from the analyses of five genomes. The utility of this approach is illustrated by applying it to the evolution of photosynthetic bacteria and by dissecting the eukaryotic genome with respect to different prokaryotic contributions. Where the majority of the individual gene phylogenies are unresolved, a histogram giving the frequency of well-supported bipartitions provides a useful complement to the support-value maps.
Results and discussion
It is worth noting that while every probability vector maps to a unique place in the optimized dekapentagonal map, the reverse is not true. A single point inside the dekapentagonal map corresponds to infinitely many probability vectors. For example, a point in the center just indicates that the probabilities for topologies on opposing sites of the dekapentagon cancel each other out, but it does not indicate the identities of these topologies. Also, some points might be located close to one vertex only because the probability vector equally supports the topologies located on both neighboring vertices of the vertex. However, these points are only 'misplaced' because of the fact that the corresponding datasets do not strongly favor one or other topology; that is, these vectors represent unresolved relationships.
We use a genetic algorithm to find the optimal arrangement of the topologies at the polygon vertices. The optimality criterion is to minimize the sum of shortest distances for each mapped probability vector to the polygon's circumference. We found that the algorithm quickly converges towards solutions that are related to one another by rotation; that is, the neighborhood relations between the different topologies are the same. As our genetic optimization algorithm is a stochastic process, we measure its success on the basis of the probability of convergence. Our confidence that the algorithm did indeed find an optimal solution rises with the probability that on subsequent runs the algorithm can reproduce the same solution and that other solutions found are always inferior to the one deemed optimal. We consistently obtained a convergence rate in the range of 66% to 100%: from 50 independent runs, 33 in one case and 50 in the other converged on the same arrangement, while 17 arrangements in the former case were suboptimal. This suggests that our genetic optimization algorithm does indeed converge on the optimal arrangement.
Comparative studies have shown that bootstrap values are more conservative measures of support than Bayesian posterior probabilities [2, 4, 8, 9], and therefore they provide a more realistic assessment of the support that the different topologies receive. Also, simulation studies have shown that increase of the size of a dataset by introducing additional homologous sequences improves the accuracy of the reconstruction  (see  and  for recent discussion). Therefore, in addition to plotting posterior probabilities, we also calculated and mapped bootstrap support values for each QuintOP from extended datasets - that is, the datasets containing additional homologous sequences (see  for details on the calculation of bootstrap support values from extended datasets).
We applied both probability mapping according to  and bootstrap support-value mapping to two different genome quintets. The first is the case of five bacterial genomes representing the five phyla that contain organisms with chlorophyll-based photosynthesis. The other is an interdomain genome quintet consisting of representatives of all three domains of life.
Analysis of five photosynthetic bacterial genomes
Contributions to a eukaryotic genome during its evolution
List of QuintOPs that support the indicated tree topology with bootstrap support above 65%
Undecaprenyl diphosphate synthase homologs
Tree 11 
Tree 11 
Arginyl-tRNA synthetase homologs
Tree 11 
Succinyl-CoA synthetase, beta subunit
Tree 11 
Signal recognition particle, subunit SRP54
Tree 11 
Tree 11 
Tree 11 
Glu-tRNA amidotransferase, subunit A homologs
Tree 11 
Phenylalanyl-tRNA synthetase alpha subunit
Tree 11 
Tree 11 
Tree 12 
Carbamoyl-phosphate synthase, small subunit
Tree 12 
Ketol-acid reductoisomerase homologs
Tree 12 
Tree 12 
Tree 12 
Tree 12 
NH3-dependent NAD+ synthetase
Tree 9: genes of mitochondrial origin
Tree 9: genes of mitochondrial origin
Carbamoyl-phosphate synthase large subunit
Tree 2 
Translation initiation factor eIF-2B homologs
Ribosomal protein S3 homologs
The dekapentagonal maps depicted in Figures 7 and 8 emphasize the mosaicism of the eukaryotic genome of yeast, and delineate different contributions to the yeast genome that have occurred over the course of evolution. The map reveals that individual datasets support different, in some instances conflicting, hypotheses proposed to explain the origin of eukaryotes. While the resulting maps illustrate the mosaic nature of the eukaryotic genome, their discriminatory power regarding different proposed contributions is limited. For example, the datasets that support the traditional topology (number 11) are equally compatible with genes that were contributed to the eukaryotic cell via the chronocyte . Because our approach only considers unrooted trees, the two scenarios result in identical topologies, with only the branch lengths differing under the two scenarios, that is, the genes contributed by the chronocyte are expected to have the eukaryotic genes on very long branches . Another shortcoming is that the map includes only two bacterial taxa. Without inspecting the phylogenies inferred from the extended datasets (see above) it is impossible to decide if many genes were contributed from a single bacterium, as assumed in hypotheses proposed in [23, 24, 28], or were acquired through many independent transfers .
Dekapentagonal mapping provides a useful extension to the earlier developed ML-, posterior probability, and bootstrap support-values mapping for four genomes described in  and . For the analyses of four genomes the mapping of the support values to the two-dimensional space is unique; for analyses of five genomes we had to select one out of the many possible projections of the 15-dimensional support-value vectors to two-dimensional space. We used an optimality criterion to perform a heuristic search for a map that would emphasize genome mosaicism and frequently unresolved bifurcations. Support-value mapping using an optimized barycentric coordinate system allows us to dissect genomes into parts that have different evolutionary histories, and to focus attention on genes that contain atypical phylogenetic information.
If most of the individual molecular phylogenies are unresolved, analysis of individual bipartitions provides a means to assess a plurality phylogenetic signal. The modified Lento plot  applied to extended datasets provides both the bipartitions supported by the plurality of genes, and the number of genes that significantly disagree with these bipartitions.
Materials and methods
The first genome quintet consists of five photosynthetic bacteria from five bacterial phyla: Rhodobacter capsulatus, Chlorobium tepidum, Chloroflexus aurantiacus, Heliobacillus mobilis and Synechocystis sp. PCC 6803.
The second genome quintet consists of genomes representing all three domains of life: the yeast genome of Saccharomyces cerevisiae, the alpha-proteobacterium Rhodobacter capsulatus, the Gram-positive bacterium Bacillus subtilis, the euryarchaeote Archaeoglobus fulgidus and the crenarchaeote Sulfolobus solfataricus.
The Rhodobacter capsulatus and Heliobacillus mobilis genome data were obtained from Integrated Genomics . Genome sequence for Chlorobium tepidum was downloaded from The Institute for Genomic Research (TIGR) . The Rhodopseudomonas palustris genome was downloaded from the DOE Joint Genome Institute . Other genomes for the genome quintets were downloaded from the National Center for Biotechnology Information (NCBI) .
Assembly of quintets of orthologous proteins (QuintOPs)
Detection of QuintOPs was analogous to detection of quartets of orthologous proteins . In brief, for each genome in a genome quintet, BLAST  searches of every ORF in one genome against the other three genomes were performed using the blastp program. The E-value cutoff for the BLAST searches was set to 10-4. We defined QuintOPs as those sets of genes that mutually pick each other as the top-scoring hit in all pairwise genome BLAST comparisons. The amino-acid sequences for each QuintOP were retrieved and the datasets were aligned with ClustalW . Maximum likelihoods for 15 tree topologies for each QuintOP were calculated using TREE-PUZZLE version 5.1  under the auto-detected substitution model. Posterior probability vectors were calculated from ML values.
Assembly of extended datasets for the QuintOPs
For each sequence in a QuintOP we detect the top-scoring BLAST  hit with an E-value above 10-8 in each of 60 completely sequenced archaeal and bacterial reference genomes (Aeropyrum pernix, Archaeoglobus fulgidus, Anabaena sp., Aquifex aeolicus, Agrobacterium tumefaciens, Borrelia burgdorferi, Bradyrhizobium japonicum, Bifidobacterium longum, Bacillus subtilis, Brucella suis, Buchnera sp., Clostridium acetobutylicum, Caulobacter crescentus, Corynebacterium glutamicum, Campylobacter jejuni, Clamydophila pneumoniae, Deinococcus radiodurans, Escherichia coli K12, Fusobacterium nucleatum, Halobacterium sp., Haemophilus influenzae, Helicobacter pylori, Leptospira interrogans, Lactococcus lactis, Listeria monocytogenes, Lactobacillus plantarum, Mycoplasma genitalium, Methanococcus jannaschii, Methanopyrus kandleri, Mezorhizobium loti, Methanosarcina mazei, Methanobacterium thermoautotrophicum, Mycobacterium tuberculosis, Neisseria meningitides, Oceanobacillus iheyensis, Pseudomonas aeruginosa, Pyrobaculum aerophilum, Pyrococcus horikoshii, Pasteurella multocida, Rickettsia conorii, Ralstonia solanacearum, Staphylococcus aureus, Streptomyces coelicolor, Sinorhizobium meliloti, Shewanella oneidensis, Sulfolobus solfataricus, Salmonella typhi, Synechocystis sp., Thermoplasma acidophilum, Thermosynechococcus elongates, Thermotoga maritime, Treponema pallidum, Thermoanaerobacter tengcongensis, Tropheryma whipplei, Ureaplasma urealyticum, Vibrio cholerae, Wigglesworthia brevipalpis, Xanthomonas campestris, Xylella fastidiosa, Yersinia pestis). These genomes were downloaded from the NCBI . The resulting sequences were added to the QuintOP dataset and duplicated sequences were eliminated. The datasets were aligned with ClustalW , and 100 bootstrap samples were generated using the SEQBOOT program from the PHYLIP package version 3.6a2.1 . The distances were generated using TREE-PUZZLE version 5.1  under the auto-detected substitution model. Neighbor-joining trees were calculated from these distances using NEIGHBOR from the PHYLIP package version 3.6a2.1 . The resulting trees were parsed with respect to which of the 15 five-taxon subtrees they contain.
Calculation of posterior probability vector locations for individual QuintOPs
The dekapentagon was placed into the Cartesian coordinate system with its center coinciding with the origin of the coordinate system. Then the coordinates (x i , y i ) of a vertex i are x i = R*cos(i*360/15), y i = R*sin(i*360/15), where R is the distance from origin to the vertex (equal for all the vertices due to the location of the origin of the coordinate system), and 1 ≤ i ≤ 15. For each pair of vertices i and j the coordinates of the center of gravity M ij (x M , y M ) are calculated according to the law of the lever: x M = x i + (x j - x i )*p j /(p i + p j ), y M = y i + (y j - y i )*p j /(p i + p j ), where p i and p j are the posterior probabilities of vertices i and j. The process is repeated for all pairs of vertices, and then iteratively for all 'intermediate' centers of gravities until only one pair of coordinates remains, which gives the center of gravity of the dekapentagon that is equivalent to the location of probability vector. The resulting coordinates of the dekapentagon's center of gravity do not depend on the order in which the masses are combined.
Finding of optimal arrangement and testing it for reproducibility
There are (15 - 1)!/2 = 14!/2 ≈ 4*1010 possible arrangements of topologies on dekapentagon's vertices (only free circular permutations  are counted, and the arrangements that become equivalent by rotation of dekapentagon or flipping the dekapentagon over are considered as the same arrangements). The arrangement was considered optimal when the topologies arranged at the polygon vertices in such way that maximizes the sum of all distances of the barycentric points from the center of the polygon. There are too many arrangements of topologies around the dekapentagon to search for the optimal arrangement exhaustively. Therefore, we used a heuristic search for optimal solutions based on a hybrid genetic algorithm . Each tree topology was assigned a numerical identifier (1 through 15), and the arrangements of topologies around the dekapentagon's vertices were encoded as arrays of the tree topology identifiers where each position in the array represents a position on the polygon circumference. The genetic algorithm applies mutation and cross-over operations to each successive generation of arrangements until the optimal solution is obtained . Each generation consisted of a population of 300 individuals. In order to preserve diversity among the individuals as much as possible and prevent premature convergence of the algorithm the population was divided into 10 demes (subpopulations) each with 30 individuals and with controlled migration between demes.
We hybridized the genetic algorithm by equipping the algorithm with a local search heuristic in addition to the global search strategy based on the genetic operators to explore better the space of possible arrangements. A manuscript reporting details on the algorithm for finding the optimal arrangements is in preparation (L.H., O.Z. and J.P.G., unpublished work). The program calculating the optimal arrangement of topologies is available on request.
To test the reproducibility, the search for the optimal arrangement was repeated independently 50 times with different starting seeds.
The resulting posterior probability and bootstrap support vectors were plotted into dekapentagonal maps using GNUPLOT version 3.7 .
Analyses of genes from the chlorophyll biosynthesis pathway
Sequences from the genome quintet were supplemented with homologous sequences from other photosynthetic bacteria to improve taxon sampling, aligned with ClustalW , and phylogenetic trees were reconstructed. For distance and parsimony analyses, 100 bootstrap samples were generated with SEQBOOT . Distances were calculated in TREE-PUZZLE v. 5.1  with among-site rate variation taken into account. Neighbor-joining trees were calculated with NEIGHBOR , Fitch-Margoliash trees with FITCH , protein parsimony trees with PROTPARS . MrBayes version 3.0B4  analyses were run three times independently for 500,000 generations per run (100,000 of which were burned in), under the JTT substitution model , and with an exponential prior set for branch length.
Software packages used
Scripts for data manipulation were written in Perl and used many of the SEALS package subroutines . Tree-parsing programs were written in Java utilizing PAL library classes . The genetic algorithm was written in C++ and is based on the genetic algorithm library GALIB version 2.4.5 .
Additional data files
Additional data file 1 contains accession numbers for the datasets in two genome quintets analyzed in this article.
We thank Korbinian Strimmer for useful comments on the manuscript. This work was supported through the NASA Astrobiology Institute at Arizona State University, the NASA Exobiology Program, and in part through the NSF Microbial Genetics Program.
- Gogarten JP: The early evolution of cellular life. Trends Ecol Evol. 1995, 10: 147-151. 10.1016/S0169-5347(00)89024-2.PubMedView ArticleGoogle Scholar
- Zhaxybayeva O, Gogarten JP: Bootstrap, Bayesian probability and maximum likelihood mapping: Exploring new tools for comparative genome analyses. BMC Genomics. 2002, 3: 4-10.1186/1471-2164-3-4.PubMedPubMed CentralView ArticleGoogle Scholar
- Strimmer K, von Haeseler A: Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA. 1997, 94: 6815-6819. 10.1073/pnas.94.13.6815.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhaxybayeva O, Gogarten JP: An improved probability mapping approach to assess genome mosaicism. BMC Genomics. 2003, 4: 37-10.1186/1471-2164-4-37.PubMedPubMed CentralView ArticleGoogle Scholar
- Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE: Whole-genome analysis of photosynthetic prokaryotes. Science. 2002, 298: 1616-1620. 10.1126/science.1075558.PubMedView ArticleGoogle Scholar
- Li W-H: Molecular Evolution. 1997, Sunderland, MA: Sinauer AssociatesGoogle Scholar
- Billera LJ, Holmes SP, Vogtmann K: Geometry of the space of phylogenetic trees. Adv Appl Math. 2001, 27: 733-767. 10.1006/aama.2001.0759.View ArticleGoogle Scholar
- Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJ: Comparison of bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol. 2003, 20: 248-254. 10.1093/molbev/msg042.PubMedView ArticleGoogle Scholar
- Alfaro ME, Zoller S, Lutzoni F: Bayes or bootstrap? A simulation study comparing the performance of bayesian markov chain monte carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol Biol Evol. 2003, 20: 255-266. 10.1093/molbev/msg028.PubMedView ArticleGoogle Scholar
- Graybeal A: Is it better to add taxa or characters to a difficult phylogenetic problem?. Syst Biol. 1998, 47: 9-17. 10.1080/106351598260996.PubMedView ArticleGoogle Scholar
- Hillis DM, Pollock DD, McGuire JA, Zwickl DJ: Is sparse taxon sampling a problem for phylogenetic inference?. Syst Biol. 2003, 52: 124-126. 10.1080/10635150309356.PubMedPubMed CentralView ArticleGoogle Scholar
- Rosenberg MS, Kumar S: Taxon sampling, bioinformatics, and phylogenomics. Syst Biol. 2003, 52: 119-124. 10.1080/10635150309344.PubMedPubMed CentralView ArticleGoogle Scholar
- Daubin V, Moran NA, Ochman H: Phylogenetics and the cohesion of bacterial genomes. Science. 2003, 301: 829-832. 10.1126/science.1086568.PubMedView ArticleGoogle Scholar
- Xiong J, Fischer WM, Inoue K, Nakahara M, Bauer CE: Molecular evidence for the early evolution of photosynthesis. Science. 2000, 289: 1724-1730. 10.1126/science.289.5485.1724.PubMedView ArticleGoogle Scholar
- Woese CR, Kandler O, Wheelis ML: Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA. 1990, 87: 4576-4579.PubMedPubMed CentralView ArticleGoogle Scholar
- Lake JA: Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature. 1988, 331: 184-186. 10.1038/331184a0.PubMedView ArticleGoogle Scholar
- Lawson FS, Charlebois RL, Dillon JA: Phylogenetic analysis of carbamoylphosphate synthetase genes: complex evolutionary history includes an internal duplication within a gene which can root the tree of life. Mol Biol Evol. 1996, 13: 970-977.PubMedView ArticleGoogle Scholar
- Schofield JP: Molecular studies on an ancient gene encoding for carbamoyl-phosphate synthetase. Clin Sci (Lond). 1993, 84: 119-128.View ArticleGoogle Scholar
- van den Hoff MJ, Jonker A, Beintema JJ, Lamers WH: Evolutionary relationships of the carbamoylphosphate synthetase genes. J Mol Evol. 1995, 41: 813-832.PubMedView ArticleGoogle Scholar
- Olendzenski L, Gogarten JP: Deciphering the molecular record for the early evolution of life: Gene duplication and horizontal gene transfer. In: Thermophiles: The Keys to Molecular Evolution and the Origin of Life?. Edited by: Wiegel J, Adams MWW. 1998, Philadelphia: Taylor & Francis, 165-176.Google Scholar
- Olendzenski L, Hilario E, Gogarten JP: Horizontal gene transfer and fusing lines of descent: the archaebacteria - a chimera?. In: Horizontal Gene Transfer. Edited by: Syvanen M, Kado C. 1998, London: Chapman and Hall, 349-362. 1Google Scholar
- Cammarano P, Gribaldo S, Johann A: Updating carbamoylphosphate synthase (CPS) phylogenies: occurrence and phylogenetic identity of archaeal CPS genes. J Mol Evol. 2002, 55: 153-160. 10.1007/s00239-002-2312-6.PubMedView ArticleGoogle Scholar
- Zillig W, Palm P, Klenk H-P: A model of the early evolution of organisms: the arisal of the three domains of life from the common ancestor. In: The Origin and Evolution of the Cell. Edited by: Hartman H, Matsuno K. 1992, Singapore: World Scientific Publishing, 163-182.Google Scholar
- Gupta RS, Golding GB: Evolution of HSP70 gene and its implications regarding relationships between archaebacteria, eubacteria, and eukaryotes. J Mol Evol. 1993, 37: 573-582. 10.1007/BF00182743.PubMedView ArticleGoogle Scholar
- Doolittle WF: You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet. 1998, 14: 307-311. 10.1016/S0168-9525(98)01494-2.PubMedView ArticleGoogle Scholar
- Hartman H: The origin of the eukaryotic cell. Speculations Sci Technol. 1984, 7: 77-81.PubMedGoogle Scholar
- Sogin ML: Early evolution and the origin of eukaryotes. Curr Opin Genet Dev. 1991, 1: 457-463. 10.1016/S0959-437X(05)80192-3.PubMedView ArticleGoogle Scholar
- Lake JA, Rivera MC: Was the nucleus the first endosymbiont?. Proc Natl Acad Sci USA. 1994, 91: 2880-2881.PubMedPubMed CentralView ArticleGoogle Scholar
- Lento GM, Hickson RE, Chambers GK, Penny D: Use of spectral analysis to test hypotheses on the origin of pinnipeds. Mol Biol Evol. 1995, 12: 28-52.PubMedView ArticleGoogle Scholar
- Integrated Genomics. [http://www.integratedgenomics.com]
- The Institute for Genomic Research. [http://www.tigr.org]
- DOE Joint Genome Institute. [http://www.jgi.doe.gov/JGI_microbial/html/index.html]
- National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.PubMedPubMed CentralView ArticleGoogle Scholar
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.PubMedView ArticleGoogle Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author: Department of Genetics, University of Washington, Seattle. 1993Google Scholar
- MathWord: circular permutations. [http://mathworld.wolfram.com/CircularPermutation.html]
- Goldberg DE: Genetic Algorithms in Search, Optimization and Machine Learning. 1989, Boston, MA: Addison-WesleyGoogle Scholar
- Holland JH: Adaptation in Natural and Artificial Systems. 1975, Ann Arbor: University of Michigan PressGoogle Scholar
- GNUPLOT central. [http://www.gnuplot.info]
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.PubMedView ArticleGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992, 8: 275-282.PubMedGoogle Scholar
- Walker DR, Koonin EV: SEALS: a system for easy analysis of lots of sequences. Proc Int Conf Intell Syst Mol Biol. 1997, 5: 333-339.PubMedGoogle Scholar
- Drummond A, Strimmer K: PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics. 2001, 17: 662-663. 10.1093/bioinformatics/17.7.662.PubMedView ArticleGoogle Scholar
- Wall M: GALIB: A C++ library of genetic algorithm components. [http://lancet.mit.edu/ga]
- Gogarten JP, Kibak H: The bioenergetics of the last common ancestor and the origin of the eukaryotic endomembrane systems. In: The Origin and Evolution of the Cell. Edited by: Hartman H, Matsuno K. 1992, Singapore: World Scientific Publishing, 131-154.Google Scholar
- Cavalier-Smith T: Origin of the cytoskeleton. In: The Origin and Evolution of the Cell. Edited by: Hartman H, Matsuno K. 1992, Singapore: World Scientific Publishing, 79-106.Google Scholar
- Sagan L: On the origin of mitosing cells. J Theor Biol. 1967, 14 (3): 255-274.PubMedView ArticleGoogle Scholar
- Martin W: Gene transfer from organelles to the nucleus: Frequent and in big chunks. Proc Natl Acad Sci USA. 2003, 100: 8612-8614. 10.1073/pnas.1633606100.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.