The properties and applications of single-molecule DNA sequencing
Genome Biology volume 12, Article number: 217 (2011)
Single-molecule sequencing enables DNA or RNA to be sequenced directly from biological samples, making it well-suited for diagnostic and clinical applications. Here we review the properties and applications of this rapidly evolving and promising technology.
Classical DNA sequencing (sometimes referred to as first generation sequencing) was developed in the late 1970s and evolved from a low-throughput, almost 'artisan' approach, in which the same radiolabeled DNA sample was run on a gel with one lane for each nucleotide [1, 2], to an automated method in which all four fluorescently labeled dye terminators for a single sample  were loaded onto individual capillaries. These capillary-based instruments, introduced in 1998, could handle hundreds of individual samples per week, in a manner sufficiently powerful that the first draft sequence of a human genome was finished in 2001 using this technology. In the intervening years, incremental improvements have been made in dye chemistry, DNA polymerases, and electrophoresis conditions, pushing read lengths up to 1,000 bp; however, the underlying technology has remained the same, sequencing individual clones or samples.
After more than 25 years of steady improvements in first generation sequencing technology, the next generation of sequencing technology (now called second generation; see Box 1 for discussion of third generation sequencing terminology) emerged in 2005 with an immediate 100-fold increase in sequencing throughput using the 454 pyrosequencing approach . This advance was followed by introductions of other technologies (such as Solexa/Illumina and ABI SOLiD) that varied in their technological details but increased sequencing throughput and reduced costs by additional orders of magnitude (reviewed in [5–7]). These second generation technologies drastically increased throughput because the sequencing target had changed from single clones or samples to many independent DNA fragments, enabling large sets of DNAs to be sequenced in parallel. Until recently, all second generation technologies achieved massively parallel sequencing by imaging light emission from the sequenced DNA, although the new sequencing system from Ion Torrent will probably be the first commercial system to change that paradigm by detecting hydrogen ions instead of light . However, the key advance in all second generation technologies has been the avoidance of the bottleneck that resulted from the individual preparation of DNA templates that first generation approaches required. When coupled with powerful new bioinformatic tools and computational capabilities optimized for these new technologies, a prodigious increase in data output has resulted. This is highlighted in Figure 1, where the accumulation of sequence in classical GenBank from its inception in 1982 is compared with data in the Sequence Read Archive (originally known as the Short Read Archive, both abbreviated SRA). Less than a year after its initiation, the SRA had already surpassed classical GenBank and it now accounts for over 95% of all new sequence deposits. Furthermore, this is likely to be an under-representation of the level of new sequencing results because of the challenges of incorporating the new data types and difficulties in transferring the large volume of data.
Although the second generation technologies were initially inferior to classical sequencing in terms of read length (about 35 nucleotides (nt) for Illumina versus about 700 nt for classical sequencing) and single-read error rate (about 2% versus less than 0.1%), these shortcomings could be overcome by the sheer volume of data. Furthermore, continuous improvements in sequencing chemistry have narrowed the gap with respect to read length and errors, as exemplified by Roche 454 now routinely achieving read lengths of 400 nt at >99% accuracy  and Illumina moving from an initial read length of 36 nt to the current 76 nt or more and raw error rates well below 1%. These technologies have allowed DNA sequencing to move beyond a method for accumulating genomic information to another level at which sequencing has become the digital measuring stick for a host of important biological processes, including gene expression, splicing, characterizing complex mixed populations of organisms, detecting protein binding, and defining genome methylation sites .
Single-molecule sequencing provides solutions to some of the most vexing problems that face second generation sequencing by simplifying sample preparation, reducing sample mass requirements, and eliminating amplification of DNA templates. Every sample manipulation and especially amplification can cause quantitative and qualitative artifacts ; these have especially detrimental impacts on quantitative applications, such as chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA/cDNA sequencing. Amplification also places limitations on the size of the DNA being sequenced because molecules that are too short or too long will not be amplified well. The simplified sample preparation and higher consistency caused by eliminating amplification makes single-molecule sequencing well suited for diagnostic and clinical applications . Thus, the need for continuing advances in sequencing technology is apparent.
Because the properties of different single-molecule sequencing technologies vary so much from each other and from other generations of sequencing technologies, it is important to understand those properties and their impact on experimental design and output. Some advantages of single-molecule sequencing may be universal, such as the ability to resequence the same molecule multiple times for improved accuracy and the ability to sequence molecules that cannot be readily amplified because of extremes of GC content, secondary structure, or other reasons. The ability to make use of some of these advantages, such as long read length and quantitative superiority, depends on the details of the technology, as not all single-molecule technologies have the same performance characteristics - in terms of long read length or the throughput in terms of read count - to be appropriate for all applications. Other reviews have examined various aspects of different single-molecule technologies [5–7, 13–17]. Here, we focus on the unique attributes of each technology (Figure 2) and how each might be best used to answer questions of biological interest that are not well addressed by current first and second generation sequencing.
Single-molecule sequencing technologies
When considering the properties of single-molecule sequencing technologies, the focus is most frequently on read length, error rate, and throughput (Figure 3); however, input sample quantity and quality requirements, simplicity and parallelizability of sample preparation, and data analysis are also important components that must be factored in when considering whether a technology, single-molecule or otherwise, is appropriate for a given problem. Some of the applications frequently undertaken with current sequencing technologies and the relative importance of various properties of different sequencing methods are shown in Table 1. Important properties of single-molecule technologies that relate to these various applications are discussed below.
Sequencing by synthesis
The first commercially available single-molecule sequencing system was developed by our colleagues at Helicos BioSciences . In this system, individual molecules are hybridized to a flow cell surface containing covalently attached oligonucleotides. Fluorescently labeled nucleotides and a DNA polymerase are added sequentially and incorporation events detected by laser excitation and recording with a charge coupled device (CCD) camera. The fluorescent 'Virtual Terminator' nucleotide prevents the incorporation of any subsequent nucleotide until the nucleotide dye moiety is cleaved . The images from each cycle are assembled to generate an overall set of sequence reads. On a standard run, 120 cycles of nucleotide addition and detection are carried out. Well over a billion molecules can be followed simultaneously in this approach. Because there are two 25-channel flow cells in a standard run, 50 different samples can be sequenced simultaneously, with the additional possibility of significantly greater throughput of samples through multiplexing. Sample requirements are the simplest of all technologies: sub-nanogram amounts are necessary and very poor quality DNA, including degraded or modified DNA, can be sequenced [20, 21]. Average read lengths are relatively short (about 35 nt) with raw individual nucleotide error rates currently about 3 to 5%, occurring randomly throughout the sequence reads and predominantly in the form of a 'dark base' or deletion error, which is accounted for in the alignment algorithm . This error rate is not an issue when detecting polymorphisms because 30× coverage is typically used for diploid genomes with second generation systems to overcome the uneven coverage induced by amplification. Over-sampling is needed to overcome the stochastic nature of heterozygote detection, with 30× coverage advisable to ensure that nearly all heterozygotes are called correctly. At this coverage level, accurate consensus sequences are generated regardless of error rates within this range. Single-molecule systems have a much more even coverage and thus do not require as much depth for complete detection of heterozygotes. The even coverage relative to second generation systems was shown with ChIP experiments, in which sequence reads were relatively constant with respect to GC content with single-molecule sequencing, whereas significant deviations were observed at both high and low GC content with amplification-based sequencing  and with whole-genome sequencing of a human sample .
The Helicos Sequencer system can also sequence RNA molecules directly, thus avoiding the many artifacts associated with reverse transcriptase and providing unparalleled quantitative accuracy for RNA expression measurements . The very high read count per sample allows precise expression measurements to be made with either RNA or cDNA [26–29], a feature not yet possible with other single-molecule technologies. Indeed, whole classes of RNA molecules that cannot be visualized using other technologies can be detected using a single-molecule approach [30, 31]. As with many single-molecule systems, repeated reads of the same molecule can markedly improve the error rate and also allow detection of very rare variants in a mixed sample. For example, a rare variant in a sample containing a mixture of few tumor cells among many normal cells might not be detectable with amplified DNA. With repeat sequencing of the same molecule, the error rate can be driven sufficiently low that mutations in heterogeneous samples such as tumors can be readily detected. Because of the minimal sample preparation needs, the ability to use exceptionally small starting quantities, and the high read count, this technology is ideal for quantitative applications such as ChIP, RNA expression, and copy number variation, and situations in which sample quantity is limiting or degraded [20, 23]. Standard, whole human genome resequencing is readily accomplished , but it is currently less expensive on second generation systems.
Pacific Biosciences has developed another sequencing-by-synthesis approach using fluorescently labeled nucleotides. In this system, DNA is constrained to a very small volume in a zero-mode wave guide  and the presence of a fluorescently labeled cognate nucleotide near the DNA polymerase is measured. The dimensions of the wave guide are so small that light can penetrate only the region very close to the edge, where the polymerase used for sequencing is constrained. Only nucleotides in that small volume near the polymerase can be illuminated and fluoresce for detection. Because the nucleotide that is being incorporated in the extending DNA strand spends a longer time near the polymerase, it can, to a large extent, be distinguished from non-cognate nucleotides. All four potential nucleotides are included in the reaction, each labeled with a different color fluorescent dye so that they can be distinguished from each other. Each nucleotide has a characteristic incorporation time that can further aid in improving base calls. Sequence reads of up to thousands of bases, longer than possible with second generation systems, are obtained in real time for each individual molecule [33–36]. However, the current throughput is less than 100,000 reads per run, so the overall sequence yield is much lower than second generation systems and the Helicos system. In addition, the raw error rate, currently 15 to 20% [37, 38], is significantly higher than with any other current sequencing technology, creating challenges in using the data for some applications, such as variant detection.
Much longer reads, referred to as 'strobe reads' , can be generated by turning off the laser for periods of time during sequencing, which prevents premature termination caused by laser-induced photodamage to the polymerase and nucleotides. If long reads are not necessary, the high raw error rate can be overcome by ligating a hairpin oligonucleotide to each end of the DNA, creating a circular template (called SMRTbell for single molecule real time), and then repeatedly sequencing the same molecule . This procedure works when the molecules are relatively short but it cannot be used with long reads, so those retain the high raw error rate. Even with a high error rate, the very long reads can be productively used for joining sequence contigs. An additional benefit for this system is the ability to potentially detect modified bases. It is possible to detect 5-methylcytosine , although the role of sequence context and other factors in affecting the accuracy of such assignments remains to be clarified. In principle, direct RNA sequencing should also be possible with this system, but this has not been reported yet for natural RNA molecules because nucleotides bind repeatedly to the reverse transcriptase before nucleotide incorporation, thereby giving false signals with multiple insertions that prevent determination of a meaningful sequence. In addition, the low read count of this system will limit it to the identification of common mRNA isoforms rather than quantitative expression profiling or complete transcriptome coverage, both of which require a much higher read count than possible in the foreseeable future. In general, the long reads and short turnaround time make this system most useful for helping to assemble genomes, assessing the analysis of structural variation, haplotyping, metagenomics, and identification of splicing isoforms.
Life Technologies, a major provider of both first and second generation sequencing systems, is developing the fluorescence resonance energy transfer (FRET)-based single-molecule sequencing-by-synthesis technology initially introduced by Visigen . Substantial advances have been made, with commercial release of the 'Starlight' system expected in the near future. The current technology consists of a quantum-dot-labeled polymerase that synthesizes DNA using four distinctly labeled nucleotides in a real-time system . Quantum dots, which are fluorescent semiconducting nanoparticles, have an advantage over fluorescent dyes in that they are much brighter and less susceptible to bleaching, although they are also much larger and more susceptible to blinking. The genomic sample to be sequenced is ligated to a surface-attached oligonucleotide of defined sequence and then read by extension of a primer complementary to the surface oligonucleotide. When a fluorescently labeled nucleotide binds to the polymerase, it interacts with the quantum dot, causing an alteration in the fluorescence of both the nucleotide and the quantum dot. The quantum dot signal drops, whereas a signal from the dye-labeled phosphate on each nucleotide rises at a characteristic wavelength. The real-time sequence is captured for each extending primer. Because each sequence is bound to the surface, it can be reprimed and sequenced again for improved accuracy. It is not clear what the sequence specifications will be but its similarity to the Pacific Biosciences technology make that a likely reference point. If so, it will have the same strengths in terms of applications (genome assembly, structural variation, haplotyping, metagenomics) whereas potentially being challenged with quantitative applications requiring a high read count (such as ChIP or RNA expression).
Optical sequencing and mapping
There are other technologies that enable very long reads to be produced but at the cost of significantly lower throughput. For example, it is possible to adhere very long DNA molecules, up to hundreds of kilobases long, to surfaces and interrogate them for particular sequences by cutting them with various restriction enzymes or labeling them after treatment with sequence-specific nicking enzymes. The lengths of the examined molecules are dependent on the ability to handle such long DNA without mechanically shearing it. Complete restriction digests that allow ordering of sequence contigs have been generated for human and other genomes from collections of single molecules spanning entire genomes . Highly repetitive and duplicated genomes, such as maize, are particularly difficult to assemble with traditional sequencing but have been successfully analyzed with this single-molecule system . The restriction sites provide sequence landmarks on the DNA and thus long repeat regions and other intricate structural variations can be assigned in an unambiguous manner. Specialized applications such as genome-wide methylation mapping can also be undertaken .
Similarly, DNA molecules can be constrained to nanotubes and specifically labeled for viewing . Single molecules of RNA have been visualized using scanning tip Raman spectroscopy . In an alternative method also using adsorption of long DNA molecules to a surface, guanines could be distinguished from all other bases and the partial sequence read with a scanning electron microscope . Possibilities for reading other bases through insertion of heavy atoms such as bromine or iodine on particular nucleotides have been suggested by ZS Genetics . Although the low strand throughput and incomplete sequence reading are currently limiting, there is potential for reads that are hundreds of kilobases long, again limited primarily by the ability to handle the DNA without shearing it. Other technologies using direct reading of stretched DNA have been reviewed elsewhere . These optical sequencing technologies provide a powerful view of genome structure, but they cannot provide the detailed sequence data or access to many other sequencing applications that require high read counts, such as gene expression measurements.
All of the sequencing techniques described so far require some kind of label on the DNA or nucleotide substrates to detect the individual base for sequencing. However, nanopore approaches generally do not require an exogenous label but rely instead on the electronic or chemical structure of the different nucleotides for discrimination. The advantages and potential means of using nanopores have been reviewed [14, 50]. Nanopores of greatest interest thus far include those assembled with solid-state systems constructed of materials such as carbon nanotubes or thin films [51–54] and the biologically based α-hemolysin [55–59] or MspA [60, 61]. These bacterial pore proteins have been extensively studied and engineered to optimize the detection of specific bases and the translocation rate of DNA through the pore. Although sequencing native DNA based on its natural properties would eliminate the labeling step and potentially allow very long reads with minimal sample preparation, thus reducing costs, the differences among nucleotides are very modest and their detection is compounded by difficulties in controlling the pace and directionality of the DNA through the nanopore. Specific detection and unidirectional flow are required for high accuracy sequencing.
A variety of methods have been used to slow the pace of DNA through nanopores, including attachment of polystyrene beads , salt concentrations , viscosity , magnetic fields , and the introduction of regions of double-stranded DNA on a single-stranded target [54, 58]. At the high translocation speeds typically found (potentially millions of bases per second), detecting a signal over background noise from each nucleotide can be a challenge, and this has been overcome in some cases by reading groups of nucleotides (such as by using hybridization of known sequences as is being developed by NabSys ) or encoding the original sequence in a more complex manner by converting the nucleotide sequence using a binary code of molecular beacons (as is being developed by NobleGen ). Maintaining a unidirectional flow of DNA has been enhanced by coupling an exonuclease to the process and reading the cleaved nucleotides (as developed by Oxford Nanopore ).
Although nanopore sequencing technologies continue to advance, simply showing the ability to sequence DNA, something not yet demonstrated by nanopores with natural DNA, is not sufficient. There needs to be a path to lower costs, longer reads, or higher accuracy relative to other technologies that will provide nanopores with a unique advantage relative to other methods. Even if reagent costs can be significantly reduced, sample preparation and informatic costs remain and these may become the dominant costs of sequencing and will vary depending on the technology being used. The ever-rising hurdles created by extant technology will not be easy to overcome. With the variety of second generation and single-molecule technologies already commercialized and others on the horizon, there will need to be substantial advances on many fronts to make these technologies commercially viable.
The development of single-molecule sequencing in which individual DNA or RNA molecules, derived directly from biological samples, are sequenced not only in a massively parallel manner but also without any type of amplification before or during the sequencing reaction promises yet another inflection point in terms of technology. It offers the potential for lower costs, higher throughput, improved quantitative accuracy, increased read lengths, and the ability to directly sequence RNA and detect methylation and other nucleotide modifications. Just as second generation sequencing did not completely displace first generation sequencing, single-molecule sequencing will not immediately displace earlier technologies. First generation, second generation, and single-molecule sequencing will each be used for the biological problems to which they are most suited, with that range of problems changing over time as each technology is improved.
Although no one mode of single-molecule sequencing can yet provide all the advantages potentially attainable, rapid progress is being made to achieve these goals. Technologies such as the Helicos system using fluorescent detection are now available commercially, and others such as Pacific Biosciences and Life Technologies Starlight will be available soon. New methods that rely on the natural chemical properties of native DNA are still in their infancy but raise the hope that sequencing might at some point become even simpler and cheaper by avoiding the need for labeling DNA and the use of enzymatic activities. Because experimenters have such widely varying requirements for the problems that need to be addressed, each should consider whether the best and most complete answer can be generated with older, sequencing-by-committee approaches or whether a true understanding of their biological questions requires the exquisite quantitative accuracy and in-depth sequencing power of a single-molecule approach. Single-molecule sequencing will continue to advance and offer researchers a variety of options, especially in the diagnostic and clinical fields.
The logical term for the next round of sequencing technology advances would be 'third generation sequencing' and this has frequently been used to describe single-molecule sequencing. However, third generation sequencing has also been defined by some as real-time sequencing or solid-state sequencing, so the term has achieved Alice in Wonderland status of meaning whatever its user wants it to mean, and it will therefore not be used here. Instead, the more precise term of single-molecule sequencing will be used and only for those technologies that actually generate a sequencing signal from a single nucleic acid molecule. The definition of single-molecule sequencing has been stretched by some to include systems that start with a single molecule but then make multiple copies of the DNA before sequencing or detection . However, the properties of any sequencing system could be stretched to assert that the sequencing process actually started with a single molecule, even though the unique advantages of single-molecule sequencing would be lost. On the other hand, there are also technologies that do not strive to generate sequence information from every nucleotide but only from a subset of positions. When such partial sequences are generated from a single molecule, this single-molecule mapping data can be combined with sequence data for a complete genomic view. Indeed, no current technology can provide individual read lengths sufficient for whole genome coverage of even the smallest organisms, so methods for combining partial reads into a complete coverage map are an important component of the overall sequencing process for whole genomes.
Maxam AM, Gilbert W: A new method for sequencing DNA. Proc Natl Acad Sci USA. 1977, 74: 560-564. 10.1073/pnas.74.2.560.
Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA. 1977, 74: 5463-5467. 10.1073/pnas.74.12.5463.
Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ, Jensen MA, Baumeister K: A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science. 1987, 238: 336-341. 10.1126/science.2443975.
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437: 376-380.
Voelkerding KV, Dames SA, Durtschi JD: Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009, 55: 641-658. 10.1373/clinchem.2008.112789.
Metzker ML: Sequencing technologies - the next generation. Nat Rev Genet. 2010, 11: 31-46. 10.1038/nrg2626.
Pettersson E, Lundeberg J, Ahmadian A: Generations of sequencing technologies. Genomics. 2009, 93: 105-111. 10.1016/j.ygeno.2008.10.003.
Delseny M, Han B, Hsing YL: High throughput DNA sequencing: the new sequencing revolution. Plant Sci. 2010, 179: 407-422. 10.1016/j.plantsci.2010.07.019.
454 Sequencing: Products & Solutions. [http://454.com/products-solutions/system-benefits.asp]
Kahvejian A, Quackenbush J, Thompson JF: What would you do if you could sequence everything?. Nat Biotechnol. 2008, 26: 1125-1133. 10.1038/nbt1494.
Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008, 36: e105-10.1093/nar/gkn425.
Milos PM: Emergence of single-molecule sequencing and potential for molecular diagnostic applications. Expert Rev Mol Diagn. 2009, 9: 659-666. 10.1586/erm.09.50.
Gupta PK: Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 2008, 26: 602-611. 10.1016/j.tibtech.2008.07.003.
Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S, Hibbs A, Huang X, Jovanovich SB, Krstic PS, Lindsay S, Ling XS, Mastrangelo CH, Meller A, Oliver JS, Pershin YV, Ramsey JM, Riehn R, Soni GV, Tabard-Cossa V, Wanunu M, Wiggin M, Schloss JA: The potential and challenges of nanopore sequencing. Nat Biotechnol. 2008, 26: 1146-1153. 10.1038/nbt.1495.
Xu M, Fujita D, Hanagata N: Perspectives and challenges of emerging single-molecule DNA sequencing technologies. Small. 2009, 5: 2638-2649. 10.1002/smll.200900976.
Efcavitch JW, Thompson JF: Single-molecule DNA analysis. Annu Rev Anal Chem (Palo Alto Calif). 2010, 3: 109-128. 10.1146/annurev.anchem.111808.073558.
Treffer R, Deckert V: Recent advances in single-molecule sequencing. Curr Opin Biotechnol. 2010, 21: 4-11. 10.1016/j.copbio.2010.02.009.
Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z: Single-molecule DNA sequencing of a viral genome. Science. 2008, 320: 106-109. 10.1126/science.1150427.
Bowers J, Mitchell J, Beer E, Buzby PR, Causey M, Efcavitch JW, Jarosz M, Krzymanska-Olejnik E, Kung L, Lipson D, Lowman GM, Marappan S, McInerney P, Platt A, Roy A, Siddiqi SM, Steinmann K, Thompson JF: Virtual terminator nucleotides for next-generation DNA sequencing. Nat Methods. 2009, 6: 593-595. 10.1038/nmeth.1354.
Hart C, Lipson D, Ozsolak F, Raz T, Steinmann K, Thompson J, Milos PM: Single-molecule sequencing: sequence methods to enable accurate quantitation. Methods Enzymol. 2010, 472: 407-430. full_text.
Thompson JF, Steinmann KE: Single molecule sequencing with a HeliScope genetic analysis system. Curr Protoc Mol Biol. 2010, Chapter 7 (Unit7): 10-
Giladi E, Healy J, Myers G, Hart C, Kapranov P, Lipson D, Roels S, Thayer E, Letovsky S: Error tolerant indexing and alignment of short reads with covering template families. J Comput Biol. 2010, 17: 1397-1411. 10.1089/cmb.2010.0005.
Goren A, Ozsolak F, Shoresh N, Ku M, Adli M, Hart C, Gymrek M, Zuk O, Regev A, Milos PM, Bernstein BE: Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA. Nat Methods. 2010, 7: 47-49. 10.1038/nmeth.1404.
Pushkarev D, Neff NF, Quake SR: Single-molecule sequencing of an individual human genome. Nat Biotechnol. 2009, 27: 847-852. 10.1038/nbt.1561.
Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P, Thompson JF, Bowers J, Jarosz M, Milos PM: Direct RNA sequencing. Nature. 2009, 461: 814-818. 10.1038/nature08390.
Lipson D, Raz T, Kieu A, Jones DR, Giladi E, Thayer E, Thompson JF, Letovsky S, Milos P, Causey M: Quantification of the yeast transcriptome by single-molecule sequencing. Nat Biotechnol. 2009, 27: 652-658. 10.1038/nbt.1551.
Ozsolak F, Goren A, Gymrek M, Guttman M, Regev A, Bernstein BE, Milos PM: Digital transcriptome profiling from attomole-level RNA samples. Genome Res. 2010, 20: 519-525. 10.1101/gr.102129.109.
Ozsolak F, Ting DT, Wittner BS, Brannigan BW, Paul S, Bardeesy N, Ramaswamy S, Milos PM, Haber DA: Amplification-free digital gene expression profiling from minute cell quantities. Nat Methods. 2010, 7: 619-621. 10.1038/nmeth.1480.
Ozsolak F, Kapranov P, Foissac S, Kim SW, Fishilevich E, Monaghan AP, John B, Milos PM: Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell. 2010, 143: 1018-1029. 10.1016/j.cell.2010.11.020.
Kapranov P, Ozsolak F, Kim SW, Foissac S, Lipson D, Hart C, Roels S, Borel C, Antonarakis SE, Monaghan AP, John B, Milos PM: New class of gene-termini-associated human RNAs suggests a novel RNA copying mechanism. Nature. 2010, 466: 642-646. 10.1038/nature09190.
Kapranov P, St Laurent G, Raz T, Ozsolak F, Reynolds CP, Sorensen PH, Reaman G, Milos P, Arceci RJ, Thompson JF, Triche TJ: The majority of total nuclear-encoded non-ribosomal RNA in a human cell is 'dark matter' unannotated RNA. BMC Biol. 2010, 8: 149-10.1186/1741-7007-8-149.
Korlach J, Marks PJ, Cicero RL, Gray JJ, Murphy DL, Roitman DB, Pham TT, Otto GA, Foquet M, Turner SW: Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures. Proc Natl Acad Sci USA. 2008, 105: 1176-1181. 10.1073/pnas.0710982105.
Levene MJ, Korlach J, Turner SW, Foquet M, Craighead HG, Webb WW: Zero-mode waveguides for single-molecule analysis at high concentrations. Science. 2003, 299: 682-686. 10.1126/science.1079700.
Korlach J, Bjornson KP, Chaudhuri BP, Cicero RL, Flusberg BA, Gray JJ, Holden D, Saxena R, Wegener J, Turner SW: Real-time DNA sequencing from single polymerase molecules. Methods Enzymol. 2010, 472: 431-455. full_text.
Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, et al: Real-time DNA sequencing from single polymerase molecules. Science. 2009, 323: 133-138. 10.1126/science.1162986.
Korlach J, Bibillo A, Wegener J, Peluso P, Pham TT, Park I, Clark S, Otto GA, Turner SW: Long, processive enzymatic DNA synthesis using 100% dye-labeled terminal phosphate-linked nucleotides. Nucleosides Nucleotides Nucleic Acids. 2008, 27: 1072-1083. 10.1080/15257770802260741.
Travers KJ, Chin CS, Rank DR, Eid JS, Turner SW: A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 2010, 38: e159-10.1093/nar/gkq543.
Chin CS, Sorenson J, Harris JB, Robins WP, Charles RC, Jean-Charles RR, Bullard J, Webster DR, Kasarskis A, Peluso P, Paxinos EE, Yamaichi Y, Calderwood SB, Mekalanos JJ, Schadt EE, Waldor MK: The origin of the Haitian cholera outbreak strain. N Engl J Med. 2010, 364: 33-42. 10.1056/NEJMoa1012928.
Ritz A, Bashir A, Raphael BJ: Structural variation analysis with strobe reads. Bioinformatics. 2010, 26: 1291-1298. 10.1093/bioinformatics/btq153.
Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, Korlach J, Turner SW: Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods. 2010, 7: 461-465. 10.1038/nmeth.1459.
Hardin SH: Real-time DNA sequencing. Next-Generation Genome Sequencing: Towards Personalized Medicine. Edited by: Janitz M. 2008, Weinheim: Wiley-VCH, 97-102.
Pennisi E: Genomics. Semiconductors inspire new sequencing technologies. Science. 2010, 327: 1190-10.1126/science.327.5970.1190.
Teague B, Waterman MS, Goldstein S, Potamousis K, Zhou S, Reslewic S, Sarkar D, Valouev A, Churas C, Kidd JM, Kohn S, Runnheim R, Lamers C, Forrest D, Newton MA, Eichler EE, Kent-First M, Surti U, Livny M, Schwartz DC: High-resolution human genome structure by single-molecule analysis. Proc Natl Acad Sci USA. 2010, 107: 10848-10853. 10.1073/pnas.0914638107.
Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, Goldstein S, Pape L, Mehan MR, Churas C, Pasternak S, Forrest DK, Wise R, Ware D, Wing RA, Waterman MS, Livny M, Schwartz DC: A single molecule scaffold for the maize genome. PLoS Genet. 2009, 5: e1000711-10.1371/journal.pgen.1000711.
Ananiev GE, Goldstein S, Runnheim R, Forrest DK, Zhou S, Potamousis K, Churas CP, Bergendahl V, Thomson JA, Schwartz DC: Optical mapping discerns genome wide DNA methylation profiles. BMC Mol Biol. 2008, 9: 68-10.1186/1471-2199-9-68.
Das SK, Austin MD, Akana MC, Deshpande P, Cao H, Xiao M: Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Res. 2010, 38: e177-10.1093/nar/gkq673.
Bailo E, Deckert V: Tip-enhanced Raman spectroscopy of single RNA strands: towards a novel direct-sequencing method. Angew Chem Int Ed Engl. 2008, 47: 1658-1661. 10.1002/anie.200704054.
Tanaka H, Kawai T: Partial sequencing of a single DNA molecule with a scanning tunnelling microscope. Nat Nanotechnol. 2009, 4: 518-522. 10.1038/nnano.2009.155.
ZS Genetics, Inc: [http://www.zsgenetics.com/thetech/prep/index.html]
Griffiths J: The realm of the nanopore. Interest in nanoscale research has skyrocketed, and the humble pore has become a king. Anal Chem. 2008, 80: 23-27. 10.1021/ac085995z.
Schneider GF, Kowalczyk SW, Calado VE, Pandraud G, Zandbergen HW, Vandersypen LM, Dekker C: DNA translocation through graphene nanopores. Nano Lett. 2010, 10: 3163-3167. 10.1021/nl102069z.
Merchant CA, Healy K, Wanunu M, Ray V, Peterman N, Bartel J, Fischbein MD, Venta K, Luo Z, Johnson AT, Drndic M: DNA translocation through graphene nanopores. Nano Lett. 2010, 10: 2915-2921. 10.1021/nl101046t.
Balagurusamy VS, Weinger P, Ling XS: Detection of DNA hybridizations using solid-state nanopores. Nanotechnology. 2010, 21: 335102-10.1088/0957-4484/21/33/335102.
Garaj S, Hubbard W, Reina A, Kong J, Branton D, Golovchenko JA: Graphene as a subnanometre trans-electrode membrane. Nature. 2010, 467: 190-193. 10.1038/nature09379.
Cockroft SL, Chu J, Amorin M, Ghadiri MR: A single-molecule nanopore device detects DNA polymerase activity with single-nucleotide resolution. J Am Chem Soc. 2008, 130: 818-820. 10.1021/ja077082c.
Stoddart D, Heron AJ, Mikhailova E, Maglia G, Bayley H: Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore. Proc Natl Acad Sci USA. 2009, 106: 7702-7707. 10.1073/pnas.0901054106.
Purnell RF, Schmidt JJ: Discrimination of single base substitutions in a DNA strand immobilized in a biological nanopore. ACS Nano. 2009, 3: 2533-2538. 10.1021/nn900441x.
Stoddart D, Heron AJ, Klingelhoefer J, Mikhailova E, Maglia G, Bayley H: Nucleobase recognition in ssDNA at the central constriction of the alpha-hemolysin pore. Nano Lett. 2010, 10: 3633-3637. 10.1021/nl101955a.
Lathrop DK, Ervin EN, Barrall GA, Keehan MG, Kawano R, Krupka MA, White HS, Hibbs AH: Monitoring the escape of DNA from a nanopore using an alternating current signal. J Am Chem Soc. 2010, 132: 1878-1885. 10.1021/ja906951g.
Butler TZ, Pavlenok M, Derrington IM, Niederweis M, Gundlach JH: Single-molecule DNA detection with an engineered MspA protein nanopore. Proc Natl Acad Sci USA. 2008, 105: 20647-20652. 10.1073/pnas.0807514106.
Derrington IM, Butler TZ, Collins MD, Manrao E, Pavlenok M, Niederweis M, Gundlach JH: Nanopore DNA sequencing with MspA. Proc Natl Acad Sci USA. 2010, 107: 16060-16065. 10.1073/pnas.1001831107.
de Zoysa RS, Jayawardhana DA, Zhao Q, Wang D, Armstrong DW, Guan X: Slowing DNA translocation through nanopores using a solution containing organic salts. J Phys Chem B. 2009, 113: 13332-13336. 10.1021/jp9040293.
Kawano R, Schibel AE, Cauley C, White HS: Controlling the translocation of single-stranded DNA through alpha-hemolysin ion channels using viscosity. Langmuir. 2009, 25: 1233-1237. 10.1021/la803556p.
Peng H, Ling XS: Reverse DNA translocation through a solid-state nanopore by magnetic tweezers. Nanotechnology. 2009, 20: 185101-10.1088/0957-4484/20/18/185101.
McNally B, Singer A, Yu Z, Sun Y, Weng Z, Meller A: Optical recognition of converted DNA nucleotides for single-molecule DNA sequencing using nanopore arrays. Nano Lett. 2010, 10: 2237-2244. 10.1021/nl1012147.
Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H: Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol. 2009, 4: 265-270. 10.1038/nnano.2009.12.
McCaughan F, Dear PH: Single-molecule genomics. J Pathol. 2010, 220: 297-306.
GenBank Statistics. [http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html]
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2011, 39: D32-D37. 10.1093/nar/gkq1079.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2010, 38: D46-D51. 10.1093/nar/gkp1024.
Shumway M, Cochrane G, Sugawara H: Archiving next generation sequencing data. Nucleic Acids Res. 2010, 38: D870-D871. 10.1093/nar/gkp1078.
Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration: The sequence read archive. Nucleic Acids Res. 2010, D19-21. 39 Database
Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Souvorov A, Starchenko G, Tatusova TA, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009, 37: D5-D15. 10.1093/nar/gkn741.
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008, 36: D13-D21. 10.1093/nar/gkm1000.
The NCBI Short Read Archive (SRA), a New Primary Data Archive Resource. [http://www.ncbi.nlm.nih.gov/Traces/sra/static/AGBT_Poster_Final.pdf]
This publication was made possible by grants R01 HG004144 and RC2 HG005598 from the National Human Genetics Research Institute (NHGRI) at the National Institutes of Health. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NHGRI.
About this article
Cite this article
Thompson, J.F., Milos, P.M. The properties and applications of single-molecule DNA sequencing. Genome Biol 12, 217 (2011). https://doi.org/10.1186/gb-2011-12-2-217
- Sequencing Technology
- Read Length
- Pacific Bioscience
- Single Molecule Real Time
- High Read Count