Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling

Fig. 5

Case studies, detection of highly pathogenic viruses (a–c). To simulate viral detection and discovery in public health emergencies by Taxonomer, we removed all viral target protein sequences (as per corresponding publications [41–43]) from the reference database and analyzed published RNA-seq data with Taxonomer. The predicted viruses were detected in all cases: (a) novel Rhabdovirus in RNA-Seq data (SRR533978) from serum of a patient with hemorrhagic fever in the Democratic Republic of Congo (DRC), now known as Bas Congo Virus [41]; approximately 13 % of target reads from this highly divergent virus were classified at the family level (Rhabdoviridae) with genus-level assignments of Lyssavirus (1), Ephemerovirus (2), unassigned Rhabdoviridae (3), Tibrovirus (4), Sigmavirus (5); (b) avian influenza virus H7N9 in RNA-Seq data (SRR900273) from a throat swab of a patient in Shanghai with H7N9 infection [42]; (c) Ebola virus, strain Zaire 1995, in RNA-Seq data (SRR1553464) from serum of a patient with suspected Ebola virus disease in Sierra Leone [43]. Detection of previously unrecognized infections. d Taxonomer detected a previously unrecognized Chlamydophila psittaci infection (psittacosis) in plasma from a patient with suspected Ebola virus disease in Sierra Leone (SRR1564804) [43]. The 16S rRNA gene was covered a mean of 7035-fold with the consensus 16S rRNA sequence from this isolate sharing 99.9 % identity with the type strain (6BC, ATCC VR-125, CPU68447) enabling reliable identification. Positions of two single nucleotide polymorphisms are highlighted in red. e Taxonomer detected a novel Anellovirus in a nasopharyngeal swab. Forty-four reads were classified at the family level (Anelloviridae) or below. Mapping reads back to a manually constructed viral consensus genome sequence showed 14-fold mean coverage, 68.5 % pairwise nucleotide-level identity and 44–60 % predicted protein identity with TTV-like mini virus isolate LIL-y1 (EF538880.1). f Identification of Mycoplasma yeatsii contamination in RNA-seq data from cultured iPS cell (right) compared to non-contaminated iPS cell culture (left) based on read binning (top). High expression of rRNA is demonstrated by 32 % of RNA-Seq reads mapping to the M. yeatsii 16S rRNA gene (245,000X coverage, 99.4 % sequence identity with type strain GIH (MYU67946)

Back to article page