Metastats: an improved statistical method for analysis of metagenomic data

Paulson, Joseph N; Pop, Mihai; Bravo, Hector Corrada

doi:10.1186/1465-6906-12-S1-P17

Poster presentation
Published: 19 September 2011

Metastats: an improved statistical method for analysis of metagenomic data

Joseph N Paulson^1,2,
Mihai Pop^2,3 &
Hector Corrada Bravo^2,3

Genome Biology volume 12, Article number: P17 (2011) Cite this article

3268 Accesses
86 Citations
Metrics details

Metagenomic studies were originally focused on exploratory/validation projects but are rapidly being applied in a clinical setting. In this setting, researchers are interested in finding characteristics of the microbiome that correlate with the clinical status of the corresponding sample. Comparatively few computational/statistical tools have been developed that can assist in this process. Rather, most developments in the metagenomics community have focused on methods that compare samples as a whole. Specifically, the focus has been on developing robust methods for determining the level of similarity or difference between samples, rather than on identifying the specific characteristics that distinguish different samples from each other. Metastats [1] was the first statistical method developed specifically to address the questions asked in clinical studies. Metastats allows a comparison of metagenomic samples (represented as counts of individual features such as organisms, genes and functional groups) from two treatment populations (for example, healthy versus disease) and identifies those features that statistically distinguish the two populations.

Here, we present major improvements to the Metastats software and the underlying statistical methods. First, we describe new approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization. These normalization techniques are also of interest for time-series analyses or in the estimation of microbial networks. A second extension of Metastats is a mixed-model zero-inflated Gaussian distribution that allows Metastats to account for a common characteristic of metagenomic data: the presence of many features with zero counts owing to undersampling of the community. The number of ‘missing features’ (zero counts) correlates with the amount of sequencing performed, thereby biasing abundance measurements and the differential abundance statistics derived from them.

Using simulated and real data, we show that these methods significantly improve the accuracy of Metastats. We also describe the addition of several new statistical tests to our code (including presence/absence and the corresponding odds ratio, and penetrance calculations) that improve the usability of our software in clinical practice.

References

White JR, Nagarajan N, Pop M: Statistical methods for detecting differentially abundant features in clinical metagenomic samples.PLoS Comput Biol 2009, 5:e1000352.
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Applied Mathematics and Scientific Computing Program, University of Maryland, College Park, MD, 20742, USA
Joseph N Paulson
Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
Joseph N Paulson, Mihai Pop & Hector Corrada Bravo
Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
Mihai Pop & Hector Corrada Bravo

Authors

Joseph N Paulson
View author publications
You can also search for this author in PubMed Google Scholar
Mihai Pop
View author publications
You can also search for this author in PubMed Google Scholar
Hector Corrada Bravo
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paulson, J.N., Pop, M. & Bravo, H.C. Metastats: an improved statistical method for analysis of metagenomic data. Genome Biol 12 (Suppl 1), P17 (2011). https://doi.org/10.1186/1465-6906-12-S1-P17

Download citation

Published: 19 September 2011
DOI: https://doi.org/10.1186/1465-6906-12-S1-P17

Metastats: an improved statistical method for analysis of metagenomic data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Metastats: an improved statistical method for analysis of metagenomic data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us