Skip to main content

Table 1 Predictor performance

From: ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes

  GB - IS + IS Manual
A. dehalogenans2CPC (NC_007760)     
   Total IS ORF 1 4 4 2
Complete ORF - 0 0 0
Partial ORF - 1 1 1
Pseudogene 1 2 2 1
Unknown ORF - 1 1 0
   Total IS - 4 4 2
   Different IS - 4 4 2
Anaeromyxobacter sp. Fw109 5 (NC_009675)     
   Total IS ORF 15 22 24 19
Complete ORF - 4 12 12
Partial ORF - 1 2 6
Pseudogene 1 4 4 1
Unknown ORF - 13 6 0
   Total IS - 20 21 16
   Different IS - 16 17 12
Anaeromyxobacter sp. K (NC_011145)     
   Total IS ORF 14 25 28 27
Complete ORF - 12 26 26
Partial ORF - 2 0 0
Pseudogene - 1 1 1
Unknown ORF - 10 1 0
   Total IS - 19 19 18
   Different IS - 10 10 9
A. dehalogenans 2CP1 (NC_011891)     
   Total IS ORF 15 33 35 35
Complete ORF - 18 24 27
Partial ORF - 4 2 3
Pseudogene - 8 8 5
Unknown ORF - 3 1 0
   Total IS - 25 25 23
   Different IS - 12 12 14
A. aeolicus VF5 (NC_000918)     
   Total IS ORF - 7 7 3
Complete ORF - 0 2 2
Partial ORF - 1 1 1
Pseudogene - 0 0 0
Unknown ORF - 6 4 0
   Total IS - 7 7 3
   Different IS - 6 6 2
C. thermocellum 27405 (NC_009012)     
   Total IS ORF 75 143 144 160
Complete ORF - 81 123 125
Partial ORF - 43 11 27
Pseudogene - 7 7 8
Unknown ORF - 12 3 0
   Total IS - 115 115 119
   Different IS - 27 27 26
S. maltophilia R5513 (NC_011071)     
   Total IS ORF 11 21 22 20
Complete ORF - 13 19 19
Partial ORF - 7 1 1
Pseudogene - 1 1 0
Unknown ORF - 0 1 0
   Total IS - 18 19 16
   Different IS - 6 7 4
S. maltophilia K279a (NC_010943)     
   Total IS ORF 49 53 54 57
Complete ORF - 18 45 47
Partial ORF - 27 5 9
Pseudogene - 3 3 1
Unknown ORF 3 5 1 0
   Total IS - 38 39 36
   Different IS - 18 19 18
  1. The table shows a comparison of IS annotations of eight bacterial genomes contained in the corresponding GenBank files (GB) with those obtained by manual annotation (Manual) and using the ISsaga predictor with two different IS reference databases. In one database (-IS) the reference ISs contained in the genome under test were removed while in the other these ISs were included (+IS). The total number of IS-associated ORFs (Total IS ORF) are divided into four categories: Complete ORFs, Partial ORFs, Pseudogenes and Unknown. The category 'Unknown' includes all examples that cannot be distinguished by the predictor as complete or partial due to the absence of sufficient numbers of closely related examples in the reference database. The categories 'Total IS' and 'Different IS' are based on nucleotide predictions. In these predictions the number of ORFs carried by the IS are taken into account. For example, if an IS includes two ORFs, this will be counted as two examples in 'Complete ORF' but as a single IS in 'Total IS'.