The imbalanced supertree of flowering-plant phylogeny
© BioMed Central Ltd 2004
Published: 15 July 2004
Skip to main content
© BioMed Central Ltd 2004
Published: 15 July 2004
Two contrasting approaches have been used to construct the overall tree of life from molecular data: one involves the analysis of single large datasets, while the other involves joining many independent smaller analyses into a supertree. A recent study uses the latter approach to produce the most complete phylogeny yet of flowering plant families.
Many questions in biology cannot be fully examined without a phylogenetic framework. Examples are developmental questions, such as the nature and origin of leaves; character-evolution questions, such as how many times leaves have evolved; and ecological questions, such as the correlation between function and morphology during leaf evolution. Thus, when large, relatively reliable phylogenies first became available for flowering plants - fueled by the same technical advances in computational and molecular biology that promoted the rise of the genomic sciences (see page 136 of ) - an explosion of new biology resulted. Answering the most interesting questions, particularly those concerning the origin of particular traits, require the phylogenies to be as large and as complete as possible. We are still far from completeness, however: millions of species of organism (many still uncollected) are thought to be on our planet, and only a fraction have been subject to comparative gene-sequence analysis for phylogenetic studies.
There are two basic approaches to adding species to the tree of life. One is to perform ever larger single analyses; this is, in theory at least, the most advantageous approach, as all species are analyzed using comparable data. But the analysis of even a few hundred taxa can pose serious computing challenges, although these may be overcome in the future by using gridded computer power . Another approach to generating a complete tree of life is to use existing data, which often take the form of numerous small independent analyses containing some overlap of species. In a type of 'meta-analysis', these independent analyses can be 'stitched together' into supertrees using various algorithms. This supertree approach may have the shortcoming that it is a composite of disparate analyses, but its main advantages are that it mirrors how molecular systematics is being done in practice, and that it can use datasets that already exist.
Methods for constructing supertrees were developed in the early 1990s [3, 4] and most commonly use matrix representation with parsimony (MRP). In the MRP approach, each tree is represented as a matrix, the matrices are combined, analyzed using parsimony, and the most parsimonious tree that fits all the matrix information is selected. The matrix representation may take different forms to accommodate various theoretical considerations [5, 6] and may be weighted to allow for differences in the reliability of the data. The MRP method, as implemented in a recent software program , has been used by Davies et al.  to build the most complete evolutionary tree of the families of flowering plants to date. The authors then use this tree to answer a comparative biology question: why have some lineages led to groups of very high diversity while other lineages of equal age have produced groups of very low diversity?
The 20th century biologist John Haldane is said to have mused about God's inordinate fondness for beetles. A similar predilection could be construed for the flowering plants (angiosperms), which number in the hundreds of thousands of species. The earliest discernible branch point in the lineage of flowering plants yields two branches; one seems to comprise almost all of the living species, but the other has only a single modern survivor, Amborella trichopoda . The discovery that this hitherto obscure South Pacific shrub represents a major branch arising from the deepest point of angiosperm phylogeny resulted in a flurry of exciting new research on its biology , and a massive re-evaluation of our understanding of the early evolution of modern angiosperms. Similar remarkable numerical disparities are also found scattered throughout the angiosperm portion of the tree of life. There are about 10,000 species of grass, for example, making Poaceae one of the largest families - it is also, of course, one of the most economically and ecologically important plant groups. The closest relatives of the grasses are the relatively obscure families Ecdeiocoleaceae (tussocky cord rush) and Joinvilleaceae (joinvillea), consisting of two species apiece.
Are these and other imbalances in the tree of life mere accidents of history? And why do some groups prosper, while others fade, or persist as 'living fossils' - faint echoes, perhaps, of what might have been? Davies et al.  address the puzzle of differential disparity by bringing to the table a new global estimate of angiosperm phylogeny. Since the first broad phylogenetic study of the gene encoding the large subunit of the Rubisco protein (rbcL) in 1993 , numerous broad-level angiosperm phylogenies have accumulated. Using the MRP method, Davies et al.  constructed an angiosperm supertree by stitching together a patchwork of approximately 50 overlapping phylogenetic studies - based on a variety of different gene combinations and morphological characters - into a single (nearly) complete family-level angiosperm supertree with almost 400 terminal 'twigs'.
Sister taxa at the top ten most imbalanced nodes of flowering-plant phylogeny
Less diverse clade
Lamiales I (mints)
Ecdeiocoleaceae (tussocky cord rush)
Acoraceae (sweet flag)
Old World and North America
New Zealand and New Caledonia
Lamiales II (mints)
New Zealand and Patagonia
Pan subtropical to tropical
Caryophyllales I (carnations)
Asteropeiaceae and Physenaceae
Caryophyllales II (carnations)
North and Central America
Cyperaceae and Juncaceae (sedges and rushes)
North and South America
The supertree constructed by Davies et al.  can be viewed as the first major family-level treatment of the angiosperm portion of the tree of life - something of a landmark event. But it is rather a coarse approximation; the 'pixels' of resolution are entire families of flowering plants, rather than individual species. Improving the resolution and accuracy of angiosperm phylogeny remains a major goal. A further goal is a robust species-level tree of all organisms, but this is a challenge substantially greater in scope than most genome projects, because of the number of species involved, the desperate need for taxonomic work to define what the units (species) are and the need to better characterize the degree to which the tree of life metaphor breaks down among closely related species as a result of lateral gene transfer and related processes. Addressing the latter question will ultimately require the fusion of two disparate fields: comparative genomics and tree of life studies. A 'tree of all genomes' would provide the most fundamental insights into the kinds of molecular evolutionary processes and patterns that underpin all of biology. Such a tree would be complex, however, as organellar and nuclear genomes from the same organism may have different histories, and the nuclear genome is a composite of elements that, to a greater or lesser extent, also have independent histories. In this context, supertree reconstruction, although a pragmatic option, will always be more problematic to interpret than large primary analyses in which the data are consistent across the tree.
More information is needed on the relative contributions that speciation and extinction make to species-diversification rates. Currently these two distinct processes are conflated in a single measure of diversity; teasing them apart will require substantial new evidence from the fossil record. More realistic short-term tasks include the use of large-scale phylogenies for explicit reconstructions of character evolution in order to assess better the circumstances under which differential diversification rates may occur. A recent study  that used a less complete phylogenetic framework of the angiosperms has demonstrated that a particular floral characteristic (bilateral symmetry) can play a key role in angiosperm diversification rates. This is in contrast to the absence of correlations found among the set of characters examined by Davies et al. . With the advent of large-scale trees and super-trees, addressing whether there is a detectable correlation between parameters of interest is now becoming a more tractable problem at the level of angiosperm phylogeny as a whole.