Skip to main content

Table 2 Genome assembly continuity and correctness using hybrid and self-correction approaches

From: Reducing assembly complexity of microbial genomes with single-molecule sequencing

Organism

Corrected by

Assembly bp

Number of contigs (expected)

Number of contigs (actual)

N50 (expected)

N50 (actual)

LAP

Number of discordant bases

QV

E. coli K12

Reference

4,639,675

 

1

4,639,675

NA

-9.65E + 07

4

>60

 

MiSeq 100×

4,647,253

1

2

 

2,367,319

-9.64E + 07

3

>60

 

454 50×

4,649,004

1

1

 

4,649,004

-9.64E + 07

3

>60

 

CCS 25X

4,653,267

1

1

 

4,653,267

-9.64E + 07

3

>60

 

Self

4,653,486

1

1

 

4,653,486

-9.64E + 07

3

>60

E .coli O157:H7

Near neighbor

5,594,477

 

3

3,776,951

NA

-3.82E + 07

1,282

36.40

 

MiSeq 100×

5,624,394

10

10

 

3,089,011

-3.66E + 07

4

>60

 

454 40×

5,613,057

10

12

 

927,294

-3.67E + 07

13

56.35

 

Self

5,611,389

10

9

 

4,324,437

-3.66E + 07

0

>60

B. trehalosi

MiSeq 100×

2,402,545

 

6

 

1,603,511

-3.28E + 07

1

>60

 

454 50×

2,413,761

 

4

 

1,051,672

-3.27E + 07

2

>60

 

CCS 25X

2,411,501

 

1

 

2,411,501

-3.27E + 07

0

>60

 

Self

2,411,068

 

1

 

2,411,068

-3.27E + 07

0

>60

M. haemolytica

MiSeq 100×

2,712,467

 

1

 

2,712,467

-3.31E + 07

0

>60

 

CCS 25X

2,739,949

 

2

 

2,686,992

-3.31E + 07

0

>60

 

Self

2,736,037

 

1

 

2,736,037

-3.31E + 07

0

>60

F. tularensis

Near neighbor

1,895,727

 

1

965,253

NA

-1.33E + 07

113

42.25

 

MiSeq 100×

1,879,071

3

10

 

357,518

-1.33E + 07

0

>60

 

454 50×

1,863,947

3

15

 

201,203

-1.33E + 07

0

>60

 

Self

1,828,135

3

8

 

401,731

-1.33E + 07

0

>60

 

Self (300×)

1,877,407

3

3

 

573,021

-1.33E + 07

0

>60

S. enterica Newport

Near neighbor

5,007,719

 

2

4,827,641

NA

-2.26E + 07

20

53.99

 

MiSeq 56X

5,027,784

4

2

 

4,918,796

-2.24E + 07

2

>60

 

454 25X

5,034,500

4

3

 

4,095,943

-2.24E + 07

2

>60

 

CCS 22X

5,030,885

4

2

 

4,921,886

-2.24E + 07

2

>60

 

Self

5,029,197

4

2

 

4,919,684

-2.24E + 07

2

>60

  1. Organism: the genome being assembled. Corrected by: the short-read data used for correction. Assembly bp: the total number of base pairs in all contigs (only contigs containing at least 100 reads are included in all results). Number of contigs (expected): predicted number of contigs for a known reference (or near-neighbor). Number of contigs (actual): the number of contigs comprising the assembly. N50: N such that 50% of the genome is contained in contigs of length ≥N. LAP: the assembly likelihood score. A score closer to zero indicates a better assembly. Number of discordant bases: the number of SNPs and indels identified by mapping MiSeq sequences back to the assembly and recording discrepancies. Each incorrect base is counted (that is, an indel that is a deletion of two bases from the assembly counts as two in this column). QV: estimated from the number of discordant bases as log 10 assembly length # incorrect bases * 10 . The QV can be converted to an error probability P=10^(-QV/10). Assemblies were generated by Celera Assembler [31] followed by post-processing with Quiver [32]. NA, not available.