Skip to main content

Table 3 Performance comparison of “partial seed” and regular k-means++ in terms of recovered high-quality bins in CAMI Airways dataset

From: MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities

Feature

Methods

Metrics

  

#bins (>50% comp <10% cont)

#bins (>70% comp <10% cont)

#bins (>90% comp <10% cont)

#bins (>50% comp <5% cont)

#bins (>70% comp <5% cont)

#bins (>90% comp <5% cont)

 

k-means++ average (no length weighting)

32.67

26.33

9.33

27.67

21.67

6.67

\(X_{combo}\)

k-means++ average

75.33

60.33

39.00

65.67

52.67

35.33

 

seed k-means average

100.33

91.33

63.00

74.67

67.33

49.00

 

partial seed average

109.33

96.00

65.67

81.67

71.33

50.67

 

k-means++ average (no length weighting)

30.67

26.67

20.33

18.00

16.00

11.33

\(X_{cov}\)

k-means++ average

57.00

51.00

45.33

43.67

38.67

35.00

 

seed k-means average

89.00

82.67

75.00

66.00

61.33

57.00

 

partial seed average

93.00

86.33

78.67

71.00

66.33

61.00

 

k-means++ average (no length weighting)

14.33

7.33

1.00

11.33

7.00

1.00

\(X_{com}\)

k-mean++ average

30.33

22.67

13.33

21.00

16.00

7.33

 

seed k-means average

40.67

29.67

15.67

24.00

16.00

7.67

 

partial seed average

41.00

29.00

16.67

23.67

16.33

8.67

  1. The best results based on each feature matrix are in bold. \(X_{combo}\) denotes the feature matrix combining coverage and composition information, \(X_{cov}\) denotes the feature matrix using coverage information, and \(X_{com}\) denotes the feature matrix using composition information (see the “Methods” section for more details). “#bins (>50% comp <10% cont)” denotes that the number of recovered bins that have >50% completeness and <10% contamination. “no length weighting” denotes that all contigs are assigned equal weight while running k-means++