ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes

Table 1 Predictor performance

	GB	- IS	+ IS	Manual
*A. dehalogenans*2CPC (NC_007760)
Total IS ORF	1	4	4	2
Complete ORF	-	0	0	0
Partial ORF	-	1	1	1
Pseudogene	1	2	2	1
Unknown ORF	-	1	1	0
Total IS	-	4	4	2
Different IS	-	4	4	2
*Anaeromyxobacter* sp. Fw109 5 (NC_009675)
Total IS ORF	15	22	24	19
Complete ORF	-	4	12	12
Partial ORF	-	1	2	6
Pseudogene	1	4	4	1
Unknown ORF	-	13	6	0
Total IS	-	20	21	16
Different IS	-	16	17	12
*Anaeromyxobacter* sp. K (NC_011145)
Total IS ORF	14	25	28	27
Complete ORF	-	12	26	26
Partial ORF	-	2	0	0
Pseudogene	-	1	1	1
Unknown ORF	-	10	1	0
Total IS	-	19	19	18
Different IS	-	10	10	9
*A. dehalogenans* 2CP1 (NC_011891)
Total IS ORF	15	33	35	35
Complete ORF	-	18	24	27
Partial ORF	-	4	2	3
Pseudogene	-	8	8	5
Unknown ORF	-	3	1	0
Total IS	-	25	25	23
Different IS	-	12	12	14
*A. aeolicus* VF5 (NC_000918)
Total IS ORF	-	7	7	3
Complete ORF	-	0	2	2
Partial ORF	-	1	1	1
Pseudogene	-	0	0	0
Unknown ORF	-	6	4	0
Total IS	-	7	7	3
Different IS	-	6	6	2
*C. thermocellum* 27405 (NC_009012)
Total IS ORF	75	143	144	160
Complete ORF	-	81	123	125
Partial ORF	-	43	11	27
Pseudogene	-	7	7	8
Unknown ORF	-	12	3	0
Total IS	-	115	115	119
Different IS	-	27	27	26
*S. maltophilia* R5513 (NC_011071)
Total IS ORF	11	21	22	20
Complete ORF	-	13	19	19
Partial ORF	-	7	1	1
Pseudogene	-	1	1	0
Unknown ORF	-	0	1	0
Total IS	-	18	19	16
Different IS	-	6	7	4
*S. maltophilia* K279a (NC_010943)
Total IS ORF	49	53	54	57
Complete ORF	-	18	45	47
Partial ORF	-	27	5	9
Pseudogene	-	3	3	1
Unknown ORF	3	5	1	0
Total IS	-	38	39	36
Different IS	-	18	19	18

The table shows a comparison of IS annotations of eight bacterial genomes contained in the corresponding GenBank files (GB) with those obtained by manual annotation (Manual) and using the ISsaga predictor with two different IS reference databases. In one database (-IS) the reference ISs contained in the genome under test were removed while in the other these ISs were included (+IS). The total number of IS-associated ORFs (Total IS ORF) are divided into four categories: Complete ORFs, Partial ORFs, Pseudogenes and Unknown. The category 'Unknown' includes all examples that cannot be distinguished by the predictor as complete or partial due to the absence of sufficient numbers of closely related examples in the reference database. The categories 'Total IS' and 'Different IS' are based on nucleotide predictions. In these predictions the number of ORFs carried by the IS are taken into account. For example, if an IS includes two ORFs, this will be counted as two examples in 'Complete ORF' but as a single IS in 'Total IS'.

ISSN: 1474-760X