Skip to main content

Table 3 Estimating the number of clusters in simulated data

From: A prediction-based resampling method for estimating the number of clusters in a dataset

Method

Number of clusters,

Model 1

      
 

1*

2

3

4

5

>5

   Clest

48

2

0

0

0

0

   gap

48

0

1

1

0

0

   gapPC

50

0

0

0

0

0

   sil

-

37

6

4

3

0

   ch

-

42

7

1

0

0

   kl

-

12

14

11

13

0

   hart

0

5

22

16

7

0

Model 2

      
 

1

2

3*

4

5

>5

   Clest

0

1

49

0

0

0

   gap

0

0

50

0

0

0

   gapPC

0

0

50

0

0

0

   sil

-

5

45

0

0

0

   ch

-

0

50

0

0

0

   kl

-

0

41

2

7

0

   hart

0

0

0

2

2

46

Model 3

      
 

1

2

3

4*

5

>5

   Clest

0

1

20

29

0

0

   gap

0

1

16

33

0

0

   gapPC

0

1

12

37

0

0

   sil

-

17

24

9

0

0

   ch

-

8

20

22

0

0

   kl

-

3

11

35

1

0

   hart

0

0

8

42

0

0

Model 4

      
 

1

2

3

4*

5

>5

   Clest

0

0

1

49

0

0

   gap

0

0

0

50

0

0

   gapPC

0

0

1

49

0

0

   sil

-

5

8

37

0

0

   ch

-

5

7

38

0

0

   kl

-

0

1

49

0

0

   hart

0

0

0

50

0

0

Model 5

      
 

1

2*

3

4

5

>5

   Clest

0

44

0

6

0

0

   gap

0

0

0

19

31

0

   gapPC

0

50

0

0

0

0

   sil

-

50

0

0

0

0

   ch

-

3

0

47

0

0

   kl

-

50

0

0

0

0

   hart

0

0

0

0

0

50

Model 6

      
 

1

2*

3

4

5

>5

   Clest

0

43

7

0

0

0

   gap

47

3

0

0

0

0

   gapPC

43

5

1

1

0

0

   sil

-

41

5

4

0

0

   ch

-

43

5

2

0

0

   kl

-

16

9

17

8

0

   hart

0

1

0

5

14

30

Model 7

      
 

1

2*

3

4

5

>5

   Clest

26

15

6

3

0

0

   gap

25

22

2

1

0

0

   gapPC

31

17

2

0

0

0

   sil

-

42

6

1

1

0

   ch

-

39

10

0

1

0

   kl

-

13

15

10

12

0

   hart

6

39

5

0

0

0

Model 8

      
 

1

2*

3

4

5

>5

   Clest

0

16

34

0

0

0

   gap

0

22

28

0

0

0

   gapPC

0

28

21

1

0

0

   sil

-

50

0

0

0

0

   ch

-

50

0

0

0

0

   kl

-

25

17

4

4

0

   hart

0

3

43

4

0

0

  1. For each simulation model, the distribution of the estimated number of clusters is recorded for each method. The true number of clusters is denoted by the asterisk and the modes for the distribution of the 50 estimates are indicated in bold for each method. Note that sil, ch, and kl do not have the ability to estimate = 1 cluster.