A cost-effective and universal strategy for complete prokaryotic genome sequencing proposed by computer simulation

Jiang, Jingwei; Li, Jun; Leung, Frederick C

doi:10.1186/gb-2011-12-s1-p6

Volume 12 Supplement 1

Beyond the Genome 2011

Poster presentation
Published: 19 September 2011

A cost-effective and universal strategy for complete prokaryotic genome sequencing proposed by computer simulation

Jingwei Jiang¹,
Jun Li¹ &
Frederick C Leung¹

Genome Biology volume 12, Article number: P6 (2011) Cite this article

1058 Accesses
Metrics details

Background

Pyrosequencing techniques allow scientists to perform prokaryotic genome sequencing and achieve draft sequences within a few days. However, the sequencing results always turn out to contain several hundred contigs. A multiplex PCR procedure is then needed to fill all of the gaps and to link the contigs into one full-length genome sequence [1–10]. The full-length prokaryotic genome sequence is the gold standard for comparative prokaryotic genome analysis. This study assessed pyrosequencing strategies by using a simulation with 100 prokaryotic genomes.

Results

Our simulation shows the following: first, a single-end 454 Jr Titanium run combined with a paired-end 454 Jr Titanium run may assemble about 90% of 100 genomes into <10 scaffolds and 95% of 100 genomes into <150 contigs; second, the average contig N50 size is more than 331 kb (Table 1); third, the average single base accuracy is >99.99% (Table 1); fourth, the average false gene duplication rate is <0.7% (Table 1); fifth, the average false gene loss rate is <0.4% (Table 1); sixth, the total size of long repeats (both repeat length >300 bp and >700 bp) is significantly correlated to the number of contigs (Table 4); and, seventh, increasing the read length of a pyrosequencing run could improve the assembly quality significantly (Table 1, 2, 3).

Table 1 Main average indices for different sequencing strategies for 100 genomes (400-bp read length)

Full size table

Table 2 Main average indices for different sequencing strategies for 100 genomes (100-bp read length)

Full size table

Table 3 Main average indices for different sequencing strategies for 100 genomes (200-bp read length)

Full size table

Table 4 Linear regression results for 100 genomes, between the genome quality indicators and, for various read lengths, the number of repeats in the genome, the total repeat length of the genome and the percentage of the total repeat length of the genome

Full size table

Conclusions

A single-end 454 Jr run combined with a paired-end 454 Jr run is a good strategy for prokaryotic genome sequencing. This strategy provides a solution to producing a high-quality draft genome sequence of almost any prokaryotic organism, selected at random, within days. It could be the first step to achieving the full-length genome sequence. It also makes the subsequent multiplex PCR procedure (for gap filling) much easier, aided by the knowledge of the orders/orientations of most of the contigs. As a result, large-scale full-length prokaryotic genome-sequencing projects could be finished within weeks.

References

Arnold IC, Zigova Z, Holden M, Lawley TD, Rad R, Dougan G, Falkow S, Bentley SD, Müller A: Comparative whole genome sequence analysis of the carcinogenic bacterial model pathogen Helicobacter felis. Genome Biol Evol. 2011, 3: 302-308. 10.1093/gbe/evr022.
Article PubMed CAS PubMed Central Google Scholar
Stephan R, Lehner A, Tischler P, Rattei T: Complete genome sequence of Cronobacter turicensis LMG 23827, a food-borne pathogen causing deaths in neonates. J Bacteriol. 2011, 193: 309-310. 10.1128/JB.01162-10.
Article PubMed CAS PubMed Central Google Scholar
Wibberg D, Blom J, Jaenicke S, Kollin F, Rupp O, Scharf B, Schneiker-Bekel S, Sczcepanowski R, Goesmann A, Setubal JC, Schmitt R, Pühler A, Schlüter A: Complete genome sequencing of Agrobacterium sp. H13-3, the former Rhizobium lupini H13-3, reveals a tripartite genome consisting of a circular and a linear chromosome and an accessory plasmid but lacking a tumor-inducing Ti-plasmid. J Biotechnol. 2011, 155: 50-62. 10.1016/j.jbiotec.2011.01.010.
Article PubMed CAS Google Scholar
Song JY, Jeong H, Yu DS, Fischbach MA, Park HS, Kim JJ, Seo JS, Jensen SE, Oh TK, Lee KJ, Kim JF: Draft genome sequence of Streptomyces clavuligerus NRRL 3585, a producer of diverse secondary metabolites. J Bacteriol. 2010, 192: 6317-6318. 10.1128/JB.00859-10.
Article PubMed CAS PubMed Central Google Scholar
Gao F, Wang Y, Liu YJ, Wu XM, Lv X, Gan YR, Song SD, Huang H: Genome sequence of Acinetobacter baumannii MDR-TJ. J Bacteriol. 2011, 193: 2365-2366. 10.1128/JB.00226-11.
Article PubMed CAS PubMed Central Google Scholar
Powney R, Smits THM, Sawbridge T, Frey B, Blom J, Frey JE, Plummer KM, Beer SV, Luck J, Duffy B, Rodoni B: Genome sequence of an Erwinia amylovora strain with pathogenicity restricted to Rubus plants. J Bacteriol. 2011, 193: 785-786. 10.1128/JB.01352-10.
Article PubMed CAS PubMed Central Google Scholar
Nam SH, Choi SH, Kang A, Kim DW, Kim RN, Kim A, Kim DS, Park HS: Genome sequence of Lactobacillus farciminis KCTC 3681. J Bacteriol. 2011, 193: 1790-1791. 10.1128/JB.00003-11.
Article PubMed CAS PubMed Central Google Scholar
Chen C, Kittichotirat W, Chen W, Downey JS, Si Y, Bumgarner R: Genome sequence of naturally competent Aggregatibacter actinomycetemcomitans serotype a strain D7S-1. J Bacteriol. 2010, 192: 2643-2644. 10.1128/JB.00157-10.
Article PubMed CAS PubMed Central Google Scholar
Seth-Smith HMB, Harris SR, Rance R, West AP, Severin JA, Ossewaarde JM, Cutcliffe LT, Skilton RJ, Marsh P, Parkhill J, Clarke IN, Thomson NR: Genome sequence of the zoonotic pathogen Chlamydophila psittaci. J Bacteriol. 2011, 193: 1282-1283. 10.1128/JB.01435-10.
Article PubMed CAS PubMed Central Google Scholar
Lyons E, Freeling M, Kustu S, Inwood W: Using genomic sequencing for classical genetics in E. coli K12. PLoS ONE. 2011, 6: e16717-10.1371/journal.pone.0016717.
Article PubMed CAS PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

School of Biological Sciences, Faculty of Science, The University of Hong Kong, China
Jingwei Jiang, Jun Li & Frederick C Leung

Authors

Jingwei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Frederick C Leung
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, J., Li, J. & Leung, F.C. A cost-effective and universal strategy for complete prokaryotic genome sequencing proposed by computer simulation. Genome Biol 12 (Suppl 1), P6 (2011). https://doi.org/10.1186/gb-2011-12-s1-p6

Download citation

Published: 19 September 2011
DOI: https://doi.org/10.1186/gb-2011-12-s1-p6

Beyond the Genome 2011

A cost-effective and universal strategy for complete prokaryotic genome sequencing proposed by computer simulation

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Beyond the Genome 2011

A cost-effective and universal strategy for complete prokaryotic genome sequencing proposed by computer simulation

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us