Prediction results on four real metagenomic datasets only using viral abundances. Boxplots of random forest AUC scores obtained using different viral features are provided. “Viral known” refers to only using known viral abundances to perform the classification while “Viral combined” means using both known and unknown viral abundances. Each random forest classification model was repeatedly trained and tested 30 times. Student’s t test p values are given

