Skip to main content

Advertisement

Table 1 Accuracy of amosvalidate mis-assembly signatures and suspicious regions summarized for 16 bacterial genomes assembled with Phrap

From: Genome assembly forensics: finding the elusive mis-assembly

     Mis-assembly signatures Suspicious regions
Species Len Ctgs Errs Num Valid Sens Num Valid Sens
B. anthracis 5.2 87 2 1,336 21 100.0 127 2 100.0
B. suis 3.4 120 10 1,047 30 80.0 158 9 90.0
C. burnetii 2.0 55 22 1,375 70 100.0 124 19 100.0
C. caviae 1.4 270 12 625 16 83.3 50 8 66.7
C. jejuni 1.8 53 5 290 11 80.0 61 3 60.0
D. ethenogenes 1.8 632 12 688 22 91.7 88 9 100.0
F. succinogenes 4.0 455 21 1,670 27 95.2 266 14 66.7
L. monocytogenes 2.9 172 1 1,381 5 100.0 201 1 100.0
M. capricolum 1.0 17 3 83 0 0.0 16 0 0.0
N. sennetsu 0.9 16 0 91 0 NA 13 0 NA
P. intermedia 2.7 243 21 1,655 57 100.0 201 20 100.0
P. syringae 6.4 274 64 2,841 200 98.4 366 55 98.4
S. agalactiae 2.1 127 21 687 53 95.2 112 18 85.7
S. aureus 2.8 824 41 1,850 69 97.6 227 18 75.6
W. pipientis 3.3 2017 31 761 92 100.0 132 30 100.0
X. oryzae 5.0 50 151 2,569 379 100.0 100 69 100.0
Totals 46.8 5412 417 18,949 1,052 96.9 2,242 275 92.6
  1. Species name, genome length (Len), number of assembled contigs (Ctgs), and alignment inferred mis-assemblies (Errs) are given in the first four columns. Number of mis-assembly signatures output by amosvalidate (Num) is given in column 5, along with the number of signatures coinciding with a known mis-assembly in column 6 (Valid), and percentage of known mis-assemblies identified by one or more signatures in column 7 (Sens). The same values are given in columns 8-10 for the suspicious regions output by amosvalidate. The suspicious regions represent at least two different, coinciding lines of evidence, whereas the signatures represent a single line of evidence. A signature or region is deemed 'validated' if its location interval overlaps a mis-assembled region identified by dnadiff. Thus, a single signature or region can identify multiple mis-assemblies, and vice versa, a single mis-assembly can be identified by multiple signatures or regions.