Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC

Fig. 2

Comparison of EukCC and BUSCO using simulated data. We compared EukCC to BUSCO (versions 3.1 and 4.0) using a set of 19 genomes from RefSeq belonging to Alveolates, Amoebozoa, Apusozoa, Fungi, Rhizaria, Stramenopiles and Viridiplantae. We fragmented the genomes and added varying amounts of contamination from another genome in the same clade. We then ran BUSCO and EukCC to estimate completeness and contamination. The red line highlights 0% deviation from the ground truth. a We defined completeness in BUSCO as 100% minus missing BUSCOs. For genomes with a contamination between 0 and 5%, EukCC underestimated completeness with a median of 2.74%, while BUSCO 3.1 underestimates the completeness across all genomes with a median above 20%. BUSCO 4.0 underestimates completeness on average by 5.75%. With increasing amounts of contamination, EukCC underestimates more rarely. Only when genome completeness falls below 50% and/or contamination exceeds 15% does EukCC consistently overestimate completeness. b To evaluate contamination we counted the number of duplicated BUSCOs or marker genes (in the case of EukCC). For genomes with 0–5% contamination and high completeness (> 90%), EukCC overestimates contamination, but by below 5%. With increasing amounts of contamination, EukCC tends to underestimate contamination, but outperforms BUSCO 4.0, which consistently underestimates contamination by a larger fraction

Back to article page