Skip to main content

Viral diversity in children with diarrhea in Gambia


Despite a decrease in the rate of mortality due to diarrhea in the past few decades, diarrhea remains one of the leading causes of childhood deaths worldwide, especially in developing countries. The known causes of disease include infection with bacteria (for example, Salmonella or Shigella), viruses (for example, rotaviruses, noroviruses or hepatitis viruses) or parasites (for example, Giardia lamblia or Cryptosporidium); however, the true agent remains unknown in up to 40% of clinical cases [1].

Recent advances in sequencing technologies allow us to explore microbial diversity in a sample, making metagenomic analysis a promising technique to characterize the viral spectrum (that is, the viral sequences and their abundances) in stool samples. By studying the genomes of particular viruses that are present in vivo, we may obtain a complete picture of the causes of diarrhea and potentially identify unknown viral pathogens.


In this project, we explored viral communities present in diarrheal samples from 40 Gambian children of 18 months of age or younger. Each sample contained 4,829 to 57,778,454 pyrosequencing shotgun reads with read lengths varying from 50 to 930 bp.

In our pipeline, we first assembled the genomes of known diarrhea-causing viruses by aligning the reads with the available references in the National Center for Biotechnology Information database and reconstructing the haplotypes from the mapped reads. Additional care needs to be taken for RNA viruses because they exist as a set of closely related but nonidentical genomes (quasispecies). We therefore reconstructed the set of the most plausible haplotypes [2] rather than the consensus genome. Next, we estimated the abundances of the assemblies by employing an expectation-maximization algorithm that takes into account sequencing error, as well as mark reads that are not adequately covered by the assemblies. Then, we focused on assembling the uncovered reads and identifying them. Finally, we analyzed the viral spectrum across all of the samples to decide whether specific genomes are responsible for causing diarrhea.


We were able to detect and assemble sequences from known diarrhea-causing viruses (such as rotaviruses, adenoviruses and noroviruses), known human viruses (such as herpesviruses and enteroviruses) and potential diarrhea-causing viruses (such as bocaviruses, astroviruses and parechoviruses). These findings were consistent with independent virology results.

In some clinical cases, sequences from classic viruses were found, but the virology results were negative.


Annually, diarrhea causes about 1.8 million deaths worldwide. Although many causative agents are known, as many as 40% of clinical cases are attributed to unknown viral pathogens. The metagenomic analysis of pyrosequencing data allows us to investigate the role of viruses in causing diarrhea.


  1. Finkbeiner SR, Allred AF, Tarr PI, Klein EJ, Kirkwood CD, Wang D: Metagenomic analysis of human diarrhea: viral detection and discovery.PLoS Pathog 2008, 4:e1000011.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Astrovskaya I, Tork B, Mangul S, Westbrooks K, Mandoiu II, Balfe P, Zelikovsky A: Inferring viral quasispecies spectra from 454 pyrosequencing reads.BMC Bioinformatics 2011, (Suppl 6):S1.

Download references

Author information

Authors and Affiliations


Rights and permissions

Reprints and Permissions

About this article

Cite this article

Astrovskaya, I., Liu, B. & Pop, M. Viral diversity in children with diarrhea in Gambia. Genome Biol 12 (Suppl 1), P2 (2011).

Download citation

  • Published:

  • DOI:


  • Diarrhea
  • Clinical Case
  • Childhood Death
  • Metagenomic Analysis
  • Human Virus