Rooting the eutherian tree: the power and pitfalls of phylogenomics
© Nishihara et al.; licensee BioMed Central Ltd. 2007
Received: 15 December 2006
Accepted: 21 September 2007
Published: 21 September 2007
Ongoing genome sequencing projects have led to a phylogenetic approach based on genome-scale data (phylogenomics), which is beginning to shed light on longstanding unresolved phylogenetic issues. The use of large datasets in phylogenomic analysis results in a global increase in resolution due to a decrease in sampling error. However, a fully resolved tree can still be wrong if the phylogenetic inference is biased.
Here, in an attempt to root the eutherian tree using genome-scale data with the maximum likelihood method, we demonstrate a case in which a concatenate analysis strongly supports a putatively wrong tree, whereas the total evaluation of separate analyses of different genes grossly reduced the bias of the phylogenetic inference. A conventional method of concatenate analysis of nucleotide sequences from our dataset, which includes a more than 1 megabase alignment of 2,789 nuclear genes, suggests a misled monophyly of Afrotheria (for example, elephant) and Xenarthra (for example, armadillo) with 100% bootstrap probability. However, this tree is not supported by our 'separate method', which takes into account the different tempos and modes of evolution among genes, and instead the basal Afrotheria tree is favored.
Our analysis demonstrates that in cases in which there is great variation in evolutionary features among different genes, the separate model, rather than the concatenate model, should be used for phylogenetic inference, especially in genome-scale data.
In the post-genomic era, genome-scale approaches to phylogenetic inference (phylogenomics) are being applied extensively to overcome the large sampling errors inherent in commonly used approaches based on a single or a small number of genes [1–3]. Sampling error diminishes as the number of genes provided for the analysis increases, but the fully resolved tree can still be wrong if the phylogenetic inference is biased (systematic error), and several such cases have been reported [4–11]. To estimate a reliable tree from large genomic datasets, it is imperative to establish how best to overcome such an error. Currently, genome projects of various mammalian species are ongoing at a rapid pace, and their genome-scale sequence data are now available. Therefore, an analysis of mammalian phylogeny based on such datasets is expected to be useful in evaluating problems that are inherent to phylogenomics.
Mammalian phylogenetics has developed rapidly during the past decade, and most of the higher order relationships have been resolved [12–16]. All eutherian (placental) mammals can be classified into 18 orders, which are grouped into the three higher groups: Afrotheria (for example, elephants, sirenians, hyraxes, and so on, which originated in Africa), Xenarthra (for example, armadillos, sloths, and anteaters, which originated in South America), and Boreotheria (all other eutherians, comprising 11 orders that originated in Laurasia of the Northern hemisphere). Phylogenetic relationships have been analyzed primarily using sequences of several nuclear or mitochondrial genes. However, the root of the eutherian tree remains unclear. Even extensive phylogenetic analyses based on several gene sequences failed to resolve the relationship among the three groups [17–21]. On the other hand, two retrotransposon inserted loci analyses have supported the basal Xenarthra hypothesis , whereas Murphy and coworkers  identified two loci that support the monophyly of Xenarthra and Afrotheria. However, the small number of loci does not provide conclusive evidence to resolve the relationship because of a possible ascertainment bias. The monophyly of Xenarthra + Afrotheria might be considered a reasonable hypothesis from a biogeographic point of view , because the South American and African continents - where Xenarthra and Afrotheria, respectively, originated - constituted the supercontinent Gondwana until about 105 million years ago . Indeed, the early split of eutherians is estimated to be about 100 million years ago , which is consistent with the biogeographic viewpoint. Thus, rooting the eutherian tree is important not only to clarify the origin of eutherians but also to elucidate the correlation between long-term continental drift and mammalian migration and diversification.
Although genome-scale approaches have become popular during the past few years, at most only a few hundreds of genes (a few hundred kilobases for each species) have thus far been used for phylogenetic inference [1, 3, 4, 8]. In the present study we collected 2,789 genes from ten mammalian genomic sequences by screening whole-genome data, providing 1 megabase (Mb) of sequence data for each species, and performed an extensive maximum likelihood (ML) analysis to determine the root of the eutherian tree.
Results and discussion
Megabase data collection to analyze the root of eutherian tree
Incongruent maximum likelihood tree provided by concatenate analyses
We mainly used the ML method because maximum parsimony and neighbor-joining analyses led to an apparently artificial tree with rodents at the basal position among eutherians, probably because of the long-branch attraction (see Additional data file 1 [Supplementary Text and Figure S1]). In contrast, the ML analyses supported the Boreotheria monophyly robustly. The concatenated dataset of the 2,789 gene sequences was analyzed at the nucleotide level with the GTR (General Time Reversible) + Γ8 and codon substitution  with Γ4 models, and at the amino acid level with the JTT-F (Jones-Tayor-Thornton (with the F-option)) + Γ8 model using the PAML version 3.15  by fixing the relationships within Boreotheria, as shown in Figure 1.
Comparison of the log-likelihood for the three hypotheses with each model
Concatenate or separate model
< ln L > (Δ ln L ± SE)
GTR + Γ8
-117.2 ± 31.1
-147.3 ± 29.7
< -4,076,316.3 >
Codon + Γ4
< -3,828,351.7 >
-77.8 ± 64.5
-142.7 ± 65.0
JTT-F + Γ8
< -1,905,933.9 >
-84.1 ± 37.4
-1.7 ± 41.9
Separate model (among 2,789 genes)
GTR + Γ8
< -3,963,489.9 >
-117.4 ± 72.3
-91.4 ± 72.7
Codon + Γ4
< -3,621,322.1 >
-128.0 ± 103.2
-527.9 ± 96.3
JTT-F + Γ8
< -1,799,245.4 >
-134.9 ± 88.5
-317.6 ± 85.5
ML analysis using the separate method
Because our dataset was composed of a large number of genes, variations in the tempos and modes of evolution among genes were expected to be very large. Therefore, we next carried out ML analyses with the separate model, which takes account of this variety by assigning different parameters to different genes . Interestingly, the nucleotide, amino acid, and codon substitution models all consistently supported tree 1 (Table 1). The separate model was superior to the concatenate model based on the Akaike Information Criterion (AIC) , except for the codon substitution model, in which separation into 2,789 genes might have introduced too many parameters.
Comparison of BPs among trees 1 to 3 analyzed with concatenate and separate models
Nucleotide (GTR + Γ8)
Codon (+ Γ4)
Amino acid (JTT-F + Γ8)
Removal of fast-evolving gene data
Additionally, for each of the 56 datasets, we used the separate method so that a category includes 50 genes, and monitored the BPs as well. The shift of BPs for each tree was very similar to those of concatenate analysis with any model (Figure 2d-f). In the amino acid analysis, the separate analysis for this categorization (50 genes per category), using all of the 2,789 genes, showed ambiguous support for tree 1 and 3 with the smallest AIC, but removal of rapidly evolving genes was associated with decline in support for tree 3 (Figure 2f).
Furthermore, we conducted the separate analysis with separation into each gene along with the nucleotide, amino acid, and codon substitution models for each of the 56 datasets (Figure 2g-i). Note that the separate analysis among each gene showed the smallest AICc in the nucleotide analysis (Table 2). In this analysis, tree 3 was not supported in any model.
Therefore, our large dataset exhibit serious incongruence among models; tree 3 is strongly supported (100% BP) by a conventional method with a concatenate model of nucleotide analysis, whereas the separate model among each gene with the smallest AICc supported tree 1. Overall, tree 1 (basal Afrotheria) appeared to be the most likely tree by comparing BPs (Figure 2 and Table 1), but the alternative hypotheses cannot be dismissed. Hallstrom and coworkers  recently analyzed a dataset of 2,840 genes (> 2 Mb) with the concatenate model to resolve the root of the eutherian tree, and concluded that the most likely tree supports the monophyly of Xenarthra and Afrotheria (tree 3 in the present study). Based on our results, however, we believe that further analysis of their dataset with the separate model is necessary to take heterogeneity among the genes into account.
Possible cause of the misled tree
There are several factors that can lead to an incorrect tree, even with use of genome-scale data: nucleotide or amino acid compositional bias [1, 5, 9]; long-branch attraction caused by unequal evolutionary rates among lineages [2, 7, 8, 34]; sparse taxon sampling [2, 4, 8]; and heterotachy (the shift of position specific evolutionary rates) [8, 32, 35–39]. If the long branch attraction artifact was operating, then large differences among the relevant branch lengths would have been seen in the tree. In the tree 3 analyzed with concatenate GTR + Γ8 model (Additional data file 1 [Figure S2]), large differences in branch lengths are observed only in the rodents (mouse/rat) and cow lineages, which are within densely sampled Boreotheria. Concerning the compositional bias, significant differences are remarkable also in rodents and cow among eutherians (Additional data file 1 [Table S3]).
To examine whether the misled support for tree 3 resulted from the long branch attraction or compositional biases of the rodents and cow sequences, we performed a concatenate analysis with GTR + Γ8 model excluding the rodents (mouse and rat) and/or cow data. If the rodents and cow data provided such misleading effects as in our concatenate analysis shown in Table 1 and 2, then support for tree 3 should be reduced when we remove these sequences. Contrary to this expectation, however, tree 3 was still supported robustly (100% BP; Additional data file 1 [Table S4]). Therefore, we conclude that either the long branch attraction or the composition bias did not cause the misled support for tree 3. Furthermore, if they had actually caused the problem, it is not expected that the separate model could drastically improve the situation, as demonstrated in this work. We therefore expect that the heterogeneity among genes caused the problem.
If the inclusion of paralogous genes is causing the problem in our case, then it is expected that tree 3 supporting genes will tend to contain more paralogous comparisons, and accordingly their TBLs tend to be longer than average. We therefore investigated the distribution of TBLs of 848 genes that prefer tree 3, and compared the distribution with that of all 2,789 genes (Additional data file 1 [Figure S3]). The TBL was calculated using PAML 3.15 , with GTR + Γ8 model for each gene. However, no sign of more paralogs in the tree 3 supporting genes than others was observed (Additional data file 1 [Figure S3]). Therefore, the specific cause of the misled support for tree 3 remains unclear.
The number of genes that can be used for phylogenetic analysis becomes large when genome-scale data are used. We showed here an extreme case in which an analysis of a large concatenated dataset of genes yields different results depending on the substitution model used. In our analysis, the differing results were not due to long branch attraction and compositional bias, but probably to large variation in tempos and modes of evolution among genes. This serious pitfall is more difficult to detect than long branch attraction or compositional bias. Furthermore, we demonstrated that this hidden but probably common problem can be overcome using the separate model. Therefore, given that increasing the sequence length certainly reduces sampling error and that large amounts of data are very powerful in phylogenetic analyses, it must be noted that a simple concatenated dataset carries with it the possibility of a seriously misleading artifact. To estimate a true phylogenetic relationship, it is necessary to give close attention to the data analysis and to improve the method by explicitly taking into account variation in tempo and mode of evolution among different genes.
Root of the eutherian tree
Rooting the eutherian tree is important in order to clarify when and where early eutherians evolved in association with ancient large-scale continental drift. With the best available models (the separate and concatenated codon substitution + Γ models), although tree 1 was preferred, we could not completely exclude the alternative hypotheses. Given that even the genome-scale sequence analyses with the best available model could not provide a definitive conclusion, as demonstrated in this paper, it is important to increase the species sampling and the number of genes in the phylogenetic analyses of sequence data with improved models of molecular evolution. Recently, it was demonstrated that extensive phylogenetic analysis with increased taxon sampling tends to prefer the concatenate model over the separate one based on AICc in the case of plant phylogeny . Therefore, because dozens of mammalian genome sequencing projects are currently in progress, it may be possible that increased sampling will allow the root of the eutherian tree to be resolved without application of the completely separate model (among 2,789 genes). It is also important to apply more extensive and multilateral analyses such as retrotransposon insertion analysis [15, 16, 22, 41] in order to maximize the explosively developing genomic data. In the near future, evolutionary history of mammals and its association with ancient continental drift will be resolved.
The availability of large genomic sequence datasets for various mammals allows us to perform an extensive ML analysis of the phylogenetic relationship among Boreotheria, Xenarthra, and Afrotheria, in order to determine the root of eutherian tree based on 2,789 genes collected from ten mammalian species. Although a conventional method of concatenate analysis with a GTR + Γ model suggests the monophyly of Afrotheria and Xenarthra with 100% BP, this tree is rejected by ML analyses with the separate model, which takes into account the different tempos and modes of evolution among genes. We demonstrate that the separate model should be used for phylogenetic inference in cases of large variation in evolutionary features among different genes, such as for genome-scale data.
Materials and methods
Collection of the gene dataset
A large sequence dataset was collected using the following five steps: extraction of all exon sequences of greater than 200 bp from the human genome database; removal of duplicated (paralog) sequences from the human data; search of the armadillo and elephant genomic data for homologs of the human exons; collection of the homologous exons from other mammalian genomic data; and alignment of all of the sequences and removal of ambiguous nucleotide sites. Details for each step are shown below.
Step 1: extraction of all exon sequences of greater than 200 bp from the human genome database
We obtained human whole-genomic sequence data (version hg17) and an annotation data file (refFlat) for gene positions from the University of California, Santa Cruz Genome Bioinformatics database . Protein-coding exon sequences of above 200 bp, identified from the annotation file, were used because it is difficult to evaluate the homology of short exon sequences by BLAST search.
Step 2: removal of duplicated (paralog) sequences from the human data
To find and remove duplicated sequence data from the human exon data, we performed a pair-wise homology search among the exon sequences using the local Basic Local Alignment Search Tool (BLAST) program . In this step, an exon sequence was removed from the sequence collection if a similar sequence, excepting the exon itself, was detected by the search in the human sequence data. The criterion for the similarity was set at an E-value of 1 × 10-11. Thus, each of the resulting 50,527 exons was regarded as a single-copy sequence in the human genome.
Step 3: search of the armadillo and elephant genomic data for homologs of the human exons
We obtained whole-genome shotgun sequences of the nine-banded armadillo (Dasypus novemcinctus) and the African elephant (Loxodonta africana) from the DNA Data Bank of Japan. We next performed a local BLAST search with a cut-off of 1 × 10-11 to obtain homologs of the human single-copy exon sequences from the two species. To avoid comparing paralogous exons, we removed the exon information from the collection if multiple sequences were detected in either of the two genomic datasets. However, failure to detect duplicated sequences does not guarantee that only orthologous comparisons were made, both because whole-genome data were not always available and because one of the duplicated genes in a genome may have been lost during evolution. Next, the regions shared among human, armadillo, and elephant were extracted for each of the 7,068 exons obtained.
Step 4: collection of the homologous exons from other mammalian genomic data
Whole-genome pair-wise alignment data of human versus various animals are available in the University of California, Santa Cruz Genome Bioinformatics database. The seven mammalian species used for our data collection were chimpanzee (Pan troglodytes; data ver. panTro1), rhesus macaque (Macaca mulatta; rheMac1), mouse (Mus musculus; mm7), rat (Rattus norvegicus; rn3), dog (Canis familiaris; canFam2), cow (Bos Taurus; bosTau1), and opossum (Monodelphis domestica; monDom1). The orthologs of the human exons were obtained from the seven species by referring to the alignment data, and ten sequences that included sequences from human, armadillo, and elephant were obtained for each exon. To exclude possible pseudogenes from the analysis, we removed from the dataset any exon for which any of the species contained a stop codon in the middle of the sequence. The remaining 4,782 exons were used for the subsequent alignment and analysis.
Step 5: alignment of all of the sequences and removal of ambiguous nucleotide sites
All of the exon sequences were concatenated for each species to avoid the technical difficulty of alignment. We aligned the sequences using the blastz  and multiz  programs. Phylogenetic information can be taken into account in the alignment program, and thus, with the exception of the three hypotheses shown in Figure 1, we fixed the relationships of the mammalian species analyzed as follows: ((((((human, chimpanzee), macaque), (mouse, rat)), (dog, cow)), armadillo, elephant), opossum). Next, we divided the concatenated sequences into each exon and removed codons in which insertions and deletions were found for any species. When multiple exons were parts of the same gene in our dataset, we concatenated the exons and used the resulting concatenation as one gene sequence, thereby obtaining 3,148 genes in total. Because very short sequences of homologous exons were detected in the BLAST search (step 3) for some genes, such sequences (< 120 bp) were removed in the phylogenetic analysis that followed. We finally collected a 2,789 gene dataset composed of 1,011,870 bp (337,290 codons) for each species. Therefore, these gene sequences were different from the actual gene sequences because of removal of exons and codons that were ambiguous in the alignment. Our dataset is suitable for phylogenetic analysis in terms of both quality (exclusion of missing/ambiguous alignment codons, paralogs, and pseudogenes) and the quantity (> 1 Mb per species).
Phylogenetic analysis with the ML method
ML analyses were carried out using Phylogenetic Analysis by Maximum Likelihood (PAML) version 3.15 package  at the nucleotide and amino acid levels with both the concatenate and separate models. The data were analyzed as nucleotide sequences with the GTR + Γ8 model and the codon-substitution + Γ4 model, or as amino acid sequences with the JTT-F + Γ8 model. The rate parameters of the GTR model, parameters of the codon substitution model, and the shape parameter (α) of the Γ distribution were optimized. In the concatenate analyses, the concatenated sequences (1,011,870 bp from 2,789 genes) were regarded as homogeneous, whereas in the separate analyses the differences among the gene categories or among the 2,789 genes were taken into account by assigning different parameters (branch lengths and other parameters of the substitution model, such as the shape parameter of the Γ model) to different categories or to different genes.
We performed the analyses by separating the 2,789 genes into 5, 10, 56, 100, 200, 558, 930, 1395, or 2789 (each gene) categories according to TBL estimated from ML analyses for each gene. In the latter analyses, log-likelihood scores for respective genes were estimated with PAML and then the total log-likelihood of the whole dataset was calculated with TotalML program in the MOLPHY  package. The test of Kishino and Hasegawa  and the wSH  were performed using the CONSEL program . BPs shown in Tables 1 and 2 and in Additional data file 1 (Table S4) were calculated using the resampling estimated log-likelihood method  with 10,000 replications. The AIC  and the AICc were applied to evaluate the fitting of the model to the data.
Removal of rapidly evolving gene data
In our data, rapidly evolving genes might cause artificial effects more extensively than slowly evolving genes , and paralogous genes might still be included among seemingly 'rapidly evolving' genes. To evaluate the influence of such genes, we constructed datasets by successively removing the 50 more rapidly evolving genes starting from the 2,789 gene dataset, producing 56 concatenated datasets. In this procedure, the evolutionary rate of each gene was evaluated from the estimated total branch length of the ML tree. We applied both the concatenate model and the separate model to each of the 56 datasets. In the concatenate model, ML analyses with the nucleotide (GTR + Γ8), amino acid (JTT-F + Γ8), and codon (with Γ4) substitution models were performed, and changes in relative BPs among the three hypotheses were monitored, as shown in Figure 2. In the concatenate analysis with the codon substitution model, we analyzed 28 datasets produced by removing 100 fast-evolving genes at a time. Because the number of replications for the BP calculation is changed in the default setting of the PAML package  depending on the length of the sequence analyzed, 500 and 10,000 replications were applied when 2,450 or fewer genes were removed and more than 2,450 genes were removed, respectively. We also used the nucleotide (GTR + Γ8), amino acid (JTT-F + Γ8), and codon (with Γ4) substitution models in the separate model analysis, in which different parameters were provided to each category (a category includes 50 genes; Figure 2d-f) or each gene (Figure 2g-i), and the total evidence was evaluated with the TotalML program in the MOLPHY package . BPs in the separate model were calculated using the resampling estimated log-likelihood method with 10,000 replications.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 includes additional explanatory text and several additional tables and figures.
Akaike Information Criterion
second order correction of AIC
Basic Local Alignment Search Tool
General Time Reversible
Jones-Tayor-Thornton (with the F-option)
total branch length
weighted test of Shimodaira and Hasegawa.
This work was supported by research grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to NO). This study was also supported in part by grants from Japanese Society for the Promotion of Science (to MH), and from TRIC, Research Organization of Information and Systems (to HN).
- Rokas A, Williams BL, King N, Carroll SB: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003, 425: 798-804. 10.1038/nature02053.PubMedView ArticleGoogle Scholar
- Soltis DE, Albert VA, Savolainen V, Hilu K, Qiu YL, Chase MW, Farris JS, Stefanovic S, Rice DW, Palmer JD, et al: Genome-scale data, angiosperm relationships, and 'ending incongruence': a cautionary tale in phylogenetics. Trends Plant Sci. 2004, 9: 477-483. 10.1016/j.tplants.2004.08.008.PubMedView ArticleGoogle Scholar
- Delsuc F, Brinkmann H, Chourrout D, Philippe H: Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature. 2006, 439: 965-968. 10.1038/nature04336.PubMedView ArticleGoogle Scholar
- Blair JE, Ikeo K, Gojobori T, Hedges SB: The evolutionary position of nematodes. BMC Evol Biol. 2002, 2: 7-10.1186/1471-2148-2-7.PubMedPubMed CentralView ArticleGoogle Scholar
- Phillips MJ, Delsuc F, Penny D: Genome-scale phylogeny and the detection of systematic biases. Mol Biol Evol. 2004, 21: 1455-1458. 10.1093/molbev/msh137.PubMedView ArticleGoogle Scholar
- Delsuc F, Brinkmann H, Philippe H: Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005, 6: 361-375. 10.1038/nrg1603.PubMedView ArticleGoogle Scholar
- Dopazo H, Dopazo J: Genome-scale evidence of the nematode-arthropod clade. Genome Biol. 2005, 6: R41-10.1186/gb-2005-6-5-r41.PubMedPubMed CentralView ArticleGoogle Scholar
- Philippe H, Lartillot N, Brinkmann H: Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol Biol Evol. 2005, 22: 1246-1253. 10.1093/molbev/msi111.PubMedView ArticleGoogle Scholar
- Jeffroy O, Brinkmann H, Delsuc F, Philippe H: Phylogenomics: the beginning of incongruence?. Trends Genet. 2006, 22: 225-231. 10.1016/j.tig.2006.02.003.PubMedView ArticleGoogle Scholar
- Gadagkar SR, Rosenberg MS, Kumar S: Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J Exp Zoolog B Mol Dev Evol. 2005, 304: 64-74. 10.1002/jez.b.21026.View ArticleGoogle Scholar
- Seo TK, Kishino H, Thorne JL: Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data. Proc Natl Acad Sci USA. 2005, 102: 4436-4441. 10.1073/pnas.0408313102.PubMedPubMed CentralView ArticleGoogle Scholar
- Madsen O, Scally M, Douady CJ, Kao DJ, DeBry RW, Adkins R, Amrine HM, Stanhope MJ, de Jong WW, Springer MS: Parallel adaptive radiations in two major clades of placental mammals. Nature. 2001, 409: 610-614. 10.1038/35054544.PubMedView ArticleGoogle Scholar
- Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O'Brien SJ: Molecular phylogenetics and the origins of placental mammals. Nature. 2001, 409: 614-618. 10.1038/35054550.PubMedView ArticleGoogle Scholar
- Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, et al: Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001, 294: 2348-2351. 10.1126/science.1067179.PubMedView ArticleGoogle Scholar
- Kriegs JO, Churakov G, Kiefmann M, Jordan U, Brosius J, Schmitz J: Retroposed elements as archives for the evolutionary history of placental mammals. PLoS Biol. 2006, 4: e91-10.1371/journal.pbio.0040091.PubMedPubMed CentralView ArticleGoogle Scholar
- Nishihara H, Hasegawa M, Okada N: Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. Proc Natl Acad Sci USA. 2006, 103: 9929-9934. 10.1073/pnas.0603797103.PubMedPubMed CentralView ArticleGoogle Scholar
- Waddell PJ, Okada N, Hasegawa M: Towards resolving the interordinal relationships of placental mammals. Syst Biol. 1999, 48: 1-5. 10.1080/106351599260391.PubMedView ArticleGoogle Scholar
- Delsuc F, Scally M, Madsen O, Stanhope MJ, de Jong WW, Catzeflis FM, Springer MS, Douzery EJ: Molecular phylogeny of living xenarthrans and the impact of character and taxon sampling on the placental tree rooting. Mol Biol Evol. 2002, 19: 1656-1671.PubMedView ArticleGoogle Scholar
- Waddell PJ, Shelley S: Evaluating placental inter-ordinal phylogenies with novel sequences including RAG1, gamma-fibrinogen, ND6, and mt-tRNA, plus MCMC-driven nucleotide, amino acid, and codon models. Mol Phylogenet Evol. 2003, 28: 197-224. 10.1016/S1055-7903(03)00115-5.PubMedView ArticleGoogle Scholar
- Amrine-Madsen H, Koepfli KP, Wayne RK, Springer MS: A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol Phylogenet Evol. 2003, 28: 225-240. 10.1016/S1055-7903(03)00118-0.PubMedView ArticleGoogle Scholar
- Springer MS, Stanhope MJ, Madsen O, de Jong WW: Molecules consolidate the placental mammal tree. Trends Ecol Evol. 2004, 19: 430-438. 10.1016/j.tree.2004.05.006.PubMedView ArticleGoogle Scholar
- Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W: Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res. 2007, 17: 413-421. 10.1101/gr.5918807.PubMedPubMed CentralView ArticleGoogle Scholar
- Smith AG, Smith DG, Funnell BM: Atlas of Cenozoic and Mesozoic Coastlines. 2004, New York: Cambridge University PressGoogle Scholar
- Kumar S, Hedges SB: A molecular timescale for vertebrate evolution. Nature. 1998, 392: 917-920. 10.1038/31927.PubMedView ArticleGoogle Scholar
- Yang Z, Nielsen R, Hasegawa M: Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol. 1997, 15: 1600-1611.View ArticleGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Shimodaira H, Hasegawa M: Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol. 1999, 16: 1114-1116.View ArticleGoogle Scholar
- Kishino H, Hasegawa M: Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol. 1989, 29: 170-179. 10.1007/BF02100115.PubMedView ArticleGoogle Scholar
- Akaike H: Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory: 1973. Edited by: Petrov BN, Csaki F. 1973, Budapest, Hungary: Akademiai Kiado, 267-281.Google Scholar
- Burnham KP, Anderson DR: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach 2003. 2003, New York, NY: Springer, 2Google Scholar
- Posada D, Buckley TR: Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol. 2004, 53: 793-808. 10.1080/10635150490522304.PubMedView ArticleGoogle Scholar
- Brinkmann H, van der Giezen M, Zhou Y, Poncelin de Raucourt G, Philippe H: An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol. 2005, 54: 743-757. 10.1080/10635150500234609.PubMedView ArticleGoogle Scholar
- Hallstrom B, Kullberg M, Nilsson M, Janke A: Phylogenomic data analyses provide evidence that Xenarthra and Afrotheria are sistergroups. Mol Biol Evol. 2007, 24: 2059-2068. 10.1093/molbev/msm136.PubMedView ArticleGoogle Scholar
- Felsenstein J: Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool. 1978, 27: 401-410. 10.2307/2412923.View ArticleGoogle Scholar
- Kolaczkowski B, Thornton JW: Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature. 2004, 431: 980-984. 10.1038/nature02917.PubMedView ArticleGoogle Scholar
- Lopez P, Casane D, Philippe H: Heterotachy, an important process of protein evolution. Mol Biol Evol. 2002, 19: 1-7.PubMedView ArticleGoogle Scholar
- Spencer M, Susko E, Roger AJ: Likelihood, parsimony, and heterogeneous evolution. Mol Biol Evol. 2005, 22: 1161-1164. 10.1093/molbev/msi123.PubMedView ArticleGoogle Scholar
- Lockhart P, Novis P, Milligan BG, Riden J, Rambaut A, Larkum T: Heterotachy and tree building: a case study with plastids and eubacteria. Mol Biol Evol. 2006, 23: 40-45. 10.1093/molbev/msj005.PubMedView ArticleGoogle Scholar
- Shalchian-Tabrizi K, Skanseng M, Ronquist F, Klaveness D, Bachvaroff TR, Delwiche CF, Botnen A, Tengs T, Jakobsen KS: Heterotachy processes in rhodophyte-derived secondhand plastid genes: implications for addressing the origin and evolution of dinoflagellate plastids. Mol Biol Evol. 2006, 23: 1504-1515. 10.1093/molbev/msl011.PubMedView ArticleGoogle Scholar
- Rodriguez-Ezpeleta N, Philippe H, Brinkmann H, Becker B, Melkonian M: Phylogenetic analyses of nuclear, mitochondrial, and plastid multigene data sets support the placement of Mesostigma in the Streptophyta. Mol Biol Evol. 2007, 24: 723-731. 10.1093/molbev/msl200.PubMedView ArticleGoogle Scholar
- Shedlock AM, Okada N: SINE insertions: powerful tools for molecular systematics. Bioessays. 2000, 22: 148-160. 10.1002/(SICI)1521-1878(200002)22:2<148::AID-BIES6>3.0.CO;2-Z.PubMedView ArticleGoogle Scholar
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, et al: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006, D590-D598. 10.1093/nar/gkj144. 34 Database
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W: Human-mouse alignments with BLASTZ. Genome Res. 2003, 13: 103-107. 10.1101/gr.809403.PubMedPubMed CentralView ArticleGoogle Scholar
- Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004, 14: 708-715. 10.1101/gr.1933104.PubMedPubMed CentralView ArticleGoogle Scholar
- Adachi J, Hasegawa M: MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput Sci Monogr. 1996, 28: 1-150.Google Scholar
- Shimodaira H, Hasegawa M: CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. 2001, 17: 1246-1247. 10.1093/bioinformatics/17.12.1246.PubMedView ArticleGoogle Scholar
- Kishino H, Miyata T, Hasegawa M: Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol. 1990, 31: 151-160. 10.1007/BF02109483.View ArticleGoogle Scholar
- Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*And Other Methods). Version 4. 2003, Sunderland, Massachusetts: Sinauer AssociatesGoogle Scholar
- Kumar S, Tamura K, Nei M: MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 2004, 5: 150-163. 10.1093/bib/5.2.150.PubMedView ArticleGoogle Scholar
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.