The advantages of SMRT sequencing
© BioMed Central Ltd 2013
Published: 3 July 2013
Skip to main content
© BioMed Central Ltd 2013
Published: 3 July 2013
Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.
Pacific Biosciences' single molecule, real-time sequencing technology, SMRT, is one of several next-generation sequencing technologies that are currently in use. In the past, it has been somewhat overlooked because of its lower throughput compared with methods such as Illumina and Ion Torrent, and because of persistent rumors that it is inaccurate. Here, we seek to dispel these misconceptions and show that SMRT is indeed a highly accurate method with many advantages when used to sequence small genomes, including the possibility of facile closure of bacterial genomes without additional experimentation. We also highlight its value in being able to detect modified bases in DNA.
So-called next-generation technologies for sequencing DNA are penetrating every aspect of biology thanks to the immense amount of information that is encoded within nucleic acid sequences. However, today's next-generation sequencing technologies, such as Illumina, 454 and Ion Torrent, have several significant limitations, especially short read lengths and amplification biases, that restrict our ability to fully sequence genomes. Unfortunately, with the rise of next-generation sequencing, even less emphasis is being placed on trying to understand at the biological and biochemical levels just what functions newly discovered genes have and how those functions allow an organism to work, which is surely why we are sequencing DNA in the first place. Now a new technology, SMRT sequencing from Pacific Biosciences , has been developed that not only produces considerably longer and highly accurate DNA sequences from individual unamplified molecules, but can also show where methylated bases occur  (and thereby provide functional information about the DNA methyltransferases encoded by the genome).
SMRT sequencing is a sequencing-by-synthesis technology based on real-time imaging of fluorescently tagged nucleotides as they are synthesized along individual DNA template molecules. Because the technology uses a DNA polymerase to drive the reaction, and because it images single molecules, there is no degradation of signal over time. Instead, the sequencing reaction ends when the template and polymerase dissociate. As a result, instead of the uniform read length seen with other technologies, the read lengths have an approximately log-normal distribution with a long tail. The average read length from the current PacBio RS instrument is about 3,000 bp, but some reads may be 20,000 bp or longer. This is roughly 30 to 200 times longer than the read length from a next-generation sequencing instrument, and more than a four-fold improvement since the original release of the instrument two years ago. It is notable that the recently announced PacBio RS II platform claims to have a further four-fold improvement, with twice the mean read length and twice the throughput of the current machine.
Second, consider DNA methyltransferases. These can exist as solitary entities or as parts of restriction-modification systems. In both cases, they methylate relatively short sequence motifs that can easily be recognized from SMRT sequencing data because of the change in DNA polymerase kinetics, as it moves along the template molecule, that result from the presence of epigenetic modifications. The altered kinetics cause a change in the timing of when the fluorescent colors are observed, thus enabling direct detection of epigenetic modifications, which can ordinarily only be inferred, and bypassing the usual necessity of enrichment or chemical conversion. Often, thanks to bioinformatics, the gene responsible for any given modification can be matched to the sequence motif in which the modification lies [7, 8]. When it cannot, then simply cloning the gene into a plasmid, which is subsequently grown in a non-modifying host and re-sequenced, can provide the match . Moreover, SMRT sequencing has also been able to identify RNA base modifications through the same approach as DNA base modifications, but using an RNA transcriptase in place of the DNA polymerase . In fact, SMRT sequencing represents an important step toward uncovering the biology that happens between DNA and proteins, including not only the study of mRNA sequences but also the regulation of translation [11, 12]. Thus, functional information emerges directly from the SMRT sequencing approach.
Third, we must consider the persistent rumor that SMRT sequencing is much less accurate than other next-generation sequencing platforms, which has now been demonstrated to be untrue in several ways. First, a direct comparison of several approaches to determining genetic polymorphisms has shown that SMRT sequencing has comparable performance to other sequencing technologies . Second, the accuracy of assembling a complete genome using SMRT sequencing in combination with other technologies has proved to be as reliable and accurate as more traditional approaches [3, 6, 14]. Moreover Chin et al.  showed that an assembly using only long SMRT sequencing reads achieves comparable or even higher performance than other platforms (99.999% accuracy in three organisms with known reference sequences), including 11 corrections to the Sanger reference of these genomes. Koren et al.  showed that most microbial genomes could be assembled into a single contig per chromosome with this approach; it is by far the least expensive option for doing so.
Another approach that benefits from the stochastic nature of the SMRT error profile is the use of circular consensus reads, where a sequencing read produces multiple observations of the same base in order to generate high-accuracy consensus sequence from single molecules . This strategy trades read length for accuracy, which can be effective in some cases (targeted re-sequencing, small genomes) but is not necessary if one can achieve some redundancy in the sequencing data (8x is recommended). With this redundancy, it is preferable to benefit from the improved mapping of longer inserts than opt for circular consensus reads, because the longer reads will be able to span more repeats and high accuracy will still be achieved from their consensus.
The considerations above make a strong case for combining the more traditional, sequence-dense data from other technologies with at least moderate coverage of SMRT data so that genomes can be improved, their methylation patterns obtained, and the functional activity of their methyltransferase genes deduced. We would especially urge all groups currently sequencing bacterial genomes to adopt this policy. That said, SMRT sequencing has also substantially improved eukaryotic genome assemblies, and we expect it to become more widely applied in this context over time, in light of the greater read lengths and throughput of the PacBio RS II instrument.
Perhaps it would even be worth redoing many genomes so that existing shotgun dataset-based assemblies could be closed and their complete methylomes obtained. The resultant assembled (epi)genomes would be inherently more valuable: the usefulness of a closed genome with associated functional annotation of its methyltransferase genes is far greater than the uncertainties left with a shotgun data set. Whereas we currently know much about the importance of epigenetic phenomena for higher eukaryotes, very little is known about the epigenetics of bacteria and the lower eukaryotes. SMRT sequencing opens a new window that may have a dramatic effect on our understanding of this biology.
MOC acknowledges Ryan Poplin for kindly sharing data on error rates. RJR acknowledges support from New England Biolabs and NIH (4R44GM105125). MCS acknowledges support from NIH (R01-HG006677).