Skip to main content

Advertisement

A cost-effective and universal strategy for complete prokaryotic genome sequencing proposed by computer simulation

Background

Pyrosequencing techniques allow scientists to perform prokaryotic genome sequencing and achieve draft sequences within a few days. However, the sequencing results always turn out to contain several hundred contigs. A multiplex PCR procedure is then needed to fill all of the gaps and to link the contigs into one full-length genome sequence [[110]]. The full-length prokaryotic genome sequence is the gold standard for comparative prokaryotic genome analysis. This study assessed pyrosequencing strategies by using a simulation with 100 prokaryotic genomes.

Results

Our simulation shows the following: first, a single-end 454 Jr Titanium run combined with a paired-end 454 Jr Titanium run may assemble about 90% of 100 genomes into <10 scaffolds and 95% of 100 genomes into <150 contigs; second, the average contig N50 size is more than 331 kb (Table 1); third, the average single base accuracy is >99.99% (Table 1); fourth, the average false gene duplication rate is <0.7% (Table 1); fifth, the average false gene loss rate is <0.4% (Table 1); sixth, the total size of long repeats (both repeat length >300 bp and >700 bp) is significantly correlated to the number of contigs (Table 4); and, seventh, increasing the read length of a pyrosequencing run could improve the assembly quality significantly (Table 1, 2, 3).

Table 1 Main average indices for different sequencing strategies for 100 genomes (400-bp read length)
Table 2 Main average indices for different sequencing strategies for 100 genomes (100-bp read length)
Table 3 Main average indices for different sequencing strategies for 100 genomes (200-bp read length)
Table 4 Linear regression results for 100 genomes, between the genome quality indicators and, for various read lengths, the number of repeats in the genome, the total repeat length of the genome and the percentage of the total repeat length of the genome

Conclusions

A single-end 454 Jr run combined with a paired-end 454 Jr run is a good strategy for prokaryotic genome sequencing. This strategy provides a solution to producing a high-quality draft genome sequence of almost any prokaryotic organism, selected at random, within days. It could be the first step to achieving the full-length genome sequence. It also makes the subsequent multiplex PCR procedure (for gap filling) much easier, aided by the knowledge of the orders/orientations of most of the contigs. As a result, large-scale full-length prokaryotic genome-sequencing projects could be finished within weeks.

References

  1. 1.

    Arnold IC, Zigova Z, Holden M, Lawley TD, Rad R, Dougan G, Falkow S, Bentley SD, Müller A: Comparative whole genome sequence analysis of the carcinogenic bacterial model pathogenHelicobacter felis.Genome Biol Evol 2011, 3:302–308.

  2. 2.

    Stephan R, Lehner A, Tischler P, Rattei T: Complete genome sequence ofCronobacter turicensisLMG 23827, a food-borne pathogen causing deaths in neonates.J Bacteriol 2011, 193:309–310.

  3. 3.

    Wibberg D, Blom J, Jaenicke S, Kollin F, Rupp O, Scharf B, Schneiker-Bekel S, Sczcepanowski R, Goesmann A, Setubal JC, Schmitt R, Pühler A, Schlüter A: Complete genome sequencing ofAgrobacteriumsp. H13–3, the formerRhizobium lupiniH13–3, reveals a tripartite genome consisting of a circular and a linear chromosome and an accessory plasmid but lacking a tumor-inducing Ti-plasmid.J Biotechnol 2011, 155:50–62.

  4. 4.

    Song JY, Jeong H, Yu DS, Fischbach MA, Park HS, Kim JJ, Seo JS, Jensen SE, Oh TK, Lee KJ, Kim JF: Draft genome sequence ofStreptomyces clavuligerusNRRL 3585, a producer of diverse secondary metabolites.J Bacteriol 2010, 192:6317–6318.

  5. 5.

    Gao F, Wang Y, Liu YJ, Wu XM, Lv X, Gan YR, Song SD, Huang H: Genome sequence ofAcinetobacter baumanniiMDR-TJ.J Bacteriol 2011, 193:2365–2366.

  6. 6.

    Powney R, Smits THM, Sawbridge T, Frey B, Blom J, Frey JE, Plummer KM, Beer SV, Luck J, Duffy B, Rodoni B: Genome sequence of anErwinia amylovorastrain with pathogenicity restricted toRubusplants.J Bacteriol 2011, 193:785–786.

  7. 7.

    Nam SH, Choi SH, Kang A, Kim DW, Kim RN, Kim A, Kim DS, Park HS: Genome sequence ofLactobacillus farciminisKCTC 3681.J Bacteriol 2011, 193:1790–1791.

  8. 8.

    Chen C, Kittichotirat W, Chen W, Downey JS, Si Y, Bumgarner R: Genome sequence of naturally competentAggregatibacter actinomycetemcomitansserotype a strain D7S-1.J Bacteriol 2010, 192:2643–2644.

  9. 9.

    Seth-Smith HMB, Harris SR, Rance R, West AP, Severin JA, Ossewaarde JM, Cutcliffe LT, Skilton RJ, Marsh P, Parkhill J, Clarke IN, Thomson NR: Genome sequence of the zoonotic pathogenChlamydophila psittaci.J Bacteriol 2011, 193:1282–1283.

  10. 10.

    Lyons E, Freeling M, Kustu S, Inwood W: Using genomic sequencing for classical genetics inE. coliK12.PLoS ONE 2011, 6:e16717.

Download references

Author information

Correspondence to Jingwei Jiang.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • Titanium
  • Genome Sequence
  • Genome Analysis
  • Good Strategy
  • Sequencing Result