Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT

Fig. 1

Contig and MAG classification with CAT and BAT. a, b Step 1: ORF prediction with Prodigal. CAT analyses all ORFs on a contig, BAT analyses all ORFs in a MAG. c Step 2: predicted ORFs are queries with DIAMOND to the NCBI non-redundant protein database (nr). d Step 3: ORFs are individually classified based on the LCA of all hits falling within a certain range of the top hit (parameter r), and the top-hit bit-score is assigned to the classification. Bit-scores of hits are depicted within brackets. Hits in gray are not included in final annotation of the ORF. Parameter f defines minimal bit-score support (mbs). e Step 4: contig or MAG classification is based on a voting approach of all classified ORFs, by summing all bit-scores from ORFs supporting a certain classification. The contig or MAG is classified as the lowest classification reaching mbs. The example illustrates the benefit of including multiple ORFs when classifying contigs or MAGs; a best-hit approach might have selected Bacteroides vulgatus or Bacteroidetes if an LCA algorithm was applied as its classification, as this part has the highest score to proteins in the database in a local alignment-based homology search. In the example, only six taxonomic ranks are shown for brevity; in reality, CAT and BAT will interpret the entire taxonomic lineage

Back to article page