High-efficiency RNA cloning enables accurate quantification of miRNA expression by deep sequencing

Small RNA cloning and sequencing is uniquely positioned as a genome-wide approach to quantify miRNAs with single-nucleotide resolution. However, significant biases introduced by RNA ligation in current protocols lead to inaccurate miRNA quantification by 1000-fold. Here we report an RNA cloning method that achieves over 95% efficiency for both 5′ and 3′ ligations. It achieves accurate quantification of synthetic miRNAs with less than two-fold deviation from the anticipated value and over a dynamic range of four orders of magnitude. Taken together, this high-efficiency RNA cloning method permits accurate genome-wide miRNA profiling from total RNAs.


Background
MicroRNAs (miRNAs) are a class of small (approximately 20 to 22 nucleotides), non-coding RNA species that are broadly expressed in almost all eukaryotes. They regulate diverse biological processes by targeting a large number of protein-coding mRNA transcripts [1]. The widespread regulation of mRNAs by miRNAs is supported by the evidence obtained from computational analysis [2,3], transcriptome analysis [4,5] and proteome profiling [6,7]. With recent developments in quantitative mRNA profiling, it is estimated that the median copy number of an mRNA is 17 molecules, and the dynamic range for mRNA copy number is approximately 10 4 in a single mammalian cell [8,9]. Because miRNAs regulate mRNA expression in a stoichiometric manner [10,11] and a conserved miRNA is predicted to recognize approximately 400 target sites [1,12], such broad variation in mRNA expression suggests miRNAs could differentially regulate highly or lowly expressed mRNAs. Indeed, it was recently demonstrated that miRNAs could function either as a switch or a fine-tuner, depending on the copy number of both miRNAs and their cognate mRNA targets [10]. Adding to the complexity of miRNA-mediated regulation, other RNA species, including competing endogenous mRNAs or circular RNAs, were recently shown to function as a molecular decoy by titrating miRNAs from their functional mRNA targets [13][14][15][16]. Collectively, these studies suggest that a highly expressed miRNA is capable of regulating many more mRNA targets and/or repressing mRNA targets more potently than a lowly expressed miRNA within a cell. Therefore, these findings highlight the importance of accurately determining the relative expression levels of different miRNAs in a given cellular context.
Three major platforms, quantitative real-time reverse transcription PCR (qRT-PCR), microarray and small RNA cloning followed by deep sequencing (miRNA-Seq), have been developed for genome-wide miRNA profiling. However, both qRT-PCR and microarray platforms are hybridization-based techniques suitable only for known miRNAs and are unable to provide single-nucleotide resolution for miRNA detection. Furthermore, due to the limitation of probe design for such a short sequence of miRNAs, absolute quantification of individual miRNAs requires extensive optimization and individual internal controls [17]. In contrast, with the rapid development of next-generation sequencing techniques, miRNA-Seq has the potential to provide single-nucleotide resolution of miRNA species, facilitate de novo miRNA discovery, and offer nearly unlimited dynamic range for miRNA quantification [18]. The enthusiasm towards miRNA-Seq, however, has been severely dampened by the identification of significant biases introduced during small RNA cDNA library preparation, often more than three orders of magnitude for individual miRNAs [19,20]. Although the cloning biases are likely caused by many factors, the RNA ligation has been identified as one of the major sources [20]. Consistent with this view, recent studies with modifications to the ligation reactions have shown some promise to reduce the cloning bias [21][22][23]. However, a highly effective method for genome-wide miRNA quantification by deep sequencing has yet to be successfully developed.
In this study, we investigate the highly biased miRNA quantification by current miRNA-Seq methods with 29 representative miRNAs. We demonstrate that inefficient and biased RNA ligation for both 3′ and 5′ adapter ligation is the major contributing factor for the inaccurate quantification. To develop an unbiased RNA cloning protocol, we have performed extensive tests with multiple customized adapters, RNA ligases as well as ligation conditions. As a result, we have established an RNA cloning strategy that allows nearly 100% ligation efficiency for both 3′ and 5′ ligation steps. When applied to a pool of either equimolar or differentially mixed synthetic miRNAs, our method achieves less than two-fold deviation from the expected results for each miRNA. We further demonstrate that the unbiased cloning is applicable to complex RNA mixtures using Caenorhabditis elegans total RNA with the synthetic mix spiked in. Finally, we apply our method to biological samples isolated from multiple mouse tissues as well as HeLa cells. Our results provide a quantitative view of miRNA expression for each of these samples. These data also reveal that most cistronic miRNAs that are derived from a single transcript are expressed at similar levels, and that miRNAs repress their target expression in a dosedependent manner. Taken together, we have established a highly efficient miRNA-Seq method that accurately quantifies individual miRNAs whose expression levels are across four orders of magnitude. Furthermore, the optimized RNA ligation can be readily applied to other RNA ligation-based RNA-Seq methods to provide quantitative insights for RNA transcripts.

Results and discussion
Characterization of systematic bias in current miRNA-Seq method Since the original small RNA cloning technique was developed for de novo miRNA discovery, it has played an instrumental role in uncovering the fascinating world of small RNAs that includes miRNAs [24][25][26], piwi-interacting RNAs (piRNAs) [27,28] and endogenous small-interfering RNAs (siRNAs) [29,30]. To characterize the molecular identity of small RNAs, 3′ and 5′ adapters are sequentially attached to these tiny molecules by RNA ligase ( Figure 1A). The ligation steps provide anchor sequences for subsequent reverse transcription and PCR reactions that amplify and convert the pool of small RNA molecules into a double-stranded DNA library ( Figure 1A). To investigate the systematic biases and optimize the quantitative performance of the existing method, we synthesized a small but representative pool of 29 miRNAs (Table S1 in Additional file 1). This pool included closely related miRNA family members that differ by one, two or multiple nucleotides at different positions (for example, miR-19, miR-26, let-7 and miR-200 families), miRNAs that are clustered in the same genomic locus but with unrelated sequences (for example, miR-143/145), as well as miRNA isoforms that are derived from the same hairpin but with a single-nucleotide difference ( Figure 1B). In addition, we also included in the pool miR-31, one of the least cloned miRNAs from a previous study [20]. When tested with the current protocol, this pool of 29 synthetic miRNAs has recapitulated the >1,000-fold deviation in cloning frequency as reported previously with a large collection of synthetic miRNAs [20] (see below for details). Finally, with the exception of let-7a, which is identical to let-7 in C. elegans, 28 out of these 29 miRNAs are also specific to vertebrates. This design allows the synthetic miRNAs to be mixed with total RNAs obtained from C. elegans and, in doing so, simulates complex RNA substrates represented in biological samples (see below for details).
We first prepared an equimolar mixture of these 29 synthetic miRNAs and performed miRNA-Seq with the existing method [18,20,31]. Consistent with previous reports [19,20], the cloning frequency of individual miRNAs varied by more than 1,000-fold ( Figure 1C). We noted that miR-143, the second highest cloned miRNA in our pool, shows a much higher cloning efficiency (249-fold higher) than miR-145, its clustered, cistronic partner. Remarkably, such bias favoring miR-143 has been observed in many miRNA-Seq experiments using total RNA, including in our results from mouse hair follicles ( Figure 1D; Figure S1A in Additional file 1). In many independent studies, miR-143 is consistently reported as one of the most highly cloned miRNAs; in sharp contrast to this, miR-145 often falls outside of the top 50 most cloned miRNAs [32,33] ( Figure 1D). However, miR-143 and miR-145 show very similar expression levels when measured by northern blotting [34] or qPCR-based absolute quantification ( Figure S1B in Additional file 1). Similar to the results for miR-143/145, when we examined other clustered, cistronic miRNAs from the datasets generated by the current method, we observed significant variation in cloning frequency ( Figure 1D). Thus, significant and widespread biases in the current miRNA-Seq method severely compromise the ability to quantify individual miRNAs and hinder the application of miRNA-Seq for identifying highly expressed miRNAs in a given cellular context.

RNA folding contributes to the systematic bias
We noted that miRNAs with >50% GC content were consistently among the poorest cloned ones in our synthetic pool ( Figure 1C, red bars; Table S2 in Additional file 1). These results suggest the possibility that stable hybridization between 3′ ligated RNA substrates and the 5′ adapter may inhibit ligation. We then took two approaches to reduce the RNA folding during 5′ ligation. First, we employed a thermostable RNA ligase and performed the 5′ ligation at 60°C instead of room temperature as required for T4 RNA ligase 1. We reasoned that with the elevated temperature we would significantly reduce intermolecular base-pairing. The second approach made use of a splint adapter that is the reverse complement of the full-length 5′ adapter plus a randomized four-nucleotide 5′ overhang at the 3′ end of the 5′ adapter ( Figure 2A). This preannealed splinted adapter should largely eliminate any intermolecular hybridization between the 5′ adapter and 3′ ligated RNA substrates. Consistent with our hypothesis, both modifications achieved considerable improvement in quantitative representation of individual miRNAs by miRNA-Seq. Compared to >1,000-fold deviation from the anticipated value (each miRNA's cloning frequency is expected to be 3.45% from the equimolar mix) by the current method ( Figure 1C), the deviation was reduced to approximately 14.7-fold by the thermostable ligase ( Figure 2B) and to approximately 4.5-fold by the splint adapter ( Figure 2C). When we repeated the experiments under identical conditions for each modification, we obtained reproducible results (r = 0.98 and 0.77, respectively; Figure 2D,E). However, the correlation coefficient between these two approaches is low (r = 0.32; Figure 2F). This suggests that different ligation reactions have variable efficiency for individual miRNAs.

High-efficiency RNA ligation
We then examined RNA ligation efficiency for both 3′ and 5′ ligation steps with different ligases and RNA substrates. Because let-7d was routinely found in our miR-Seq dataset at expected or greater than expected levels and miR-205 was often under-represented, we selected these two miRNAs for the initial test. We first 5′ end-labeled let-7d and miR-205 with 32 P and examined 3′ ligation efficiency with a previously published 3′ linker [24]. We observed low ligation efficiency for both substrates. Whereas let-7d showed modest levels of ligation, miR-205 was a particularly poor substrate in the reaction ( Figure 3A). This result prompted us to include polyethylene glycol in subsequent reactions and to design several additional 3′ adapters, as previous reports indicate these factors influence ligation preference [21,35]. We synthesized DNA oligonucleotides beginning with T 10 , A 10 , and NN randomized adapters followed by Illumina 3′ adapter sequences (see Table S3 in Additional file 1 for details) and converted them into pre-adenylated adapters with a recently established reaction ( Figure S2 in Additional file 1) [22]. Interestingly, miR-205 and let-7d showed strong preferences for 5′-A and 5′-T, respectively ( Figure 3B). Importantly, both miR-205 and let-7d ligated most efficiently to the 5′-NN randomized adapter ( Figure 3B). These observations suggest that different miRNAs exhibit different preferences for the 5′ nucleotide(s) of the 3′ adapter, and that the randomized adapter has the best potential to be ligated to diverse RNA substrates with high efficiency. Furthermore, we reasoned if ligation efficiency can be enhanced to near 100% for all miRNAs, the bias introduced by RNA ligase in the miRNA-Seq method could be largely eliminated. Previous studies of individual RNA ligation reactions suggest that a high concentration of polyethylene glycol (PEG)8000 improves ligation efficiency [35], and that RNA ligation efficiency is correlated with the concentration of RNA ligase [36], and that dimethylsulfoxide (DMSO) stimulates ligation to highly structured tRNAs [37]. We therefore tested all of these conditions and determined that with 10% PEG8000 and gradually increased 3′ adapter concentration we could approach 100% ligation efficiency even for miR-205, a particularly poor substrate ( Figure 3C). To test whether our optimized condition is suitable for a broad range of substrates, we labeled a mix of all 29 synthetic miRNAs and performed the 3′ ligation. As shown in Figure 3D, all RNA substrates were effectively ligated by a 2-hour ligation reaction.
Having established an effective ligation method for the 3′ ligation, we re-visited the 5′ ligation. Similar to the 3′ ligation, the 5′ ligation efficiency in the current method was also very low. We observed 9% efficiency collectively for the mixed synthetic miRNAs ( Figure 3E). In addition, the ligation efficiency for the thermostable ligase and the 5′ splint adapter was also very low at 7% and 2%, respectively ( Figure 3E). These results prompted us to optimize the 5′ ligation efficiency. We took a similar approach based on our success in the 3′ ligation. This included: 1) the use of 3′ end NN randomized adapter; 2) further increasing the PEG8000 concentration to 20% in the reaction buffer; and 3) elevating the reaction temperature to 37°C. As shown in Figure 3F, these modifications optimized the 5′ ligation to approximately 97% efficiency. Taken together, these results demonstrate that we have established a highly efficient method for attaching both 3′ and 5′ adapters to miRNA substrates. High-efficiency RNA ligation allows quantitative measurement of synthetic miRNAs by miRNA-Seq We next applied our optimized ligation method to miRNA-Seq. We first tested with the equimolar mix of 29 synthetic miRNAs. As shown in Figure 4A, our method drastically improved the quantitative performance of miRNA-Seq. In sharp contrast to the >1,000-fold deviation obtained by the current method, each miRNA was cloned within 1.8-fold deviation by our optimized miRNA-Seq. The reproducibility of our method was high (r > 0.9) even when we used different amounts of input RNAs ( Figure S3 in Additional file 1). We further tested the dynamic range of our optimized method by mixing four individual let-7 members at concentrations across 4 orders of magnitude together with 25 other synthetic miRNAs. We observed a cloning ratio of 8.0:1.0:0.1:0.03 for the four let-7 members, compared with the anticipated ratio of 10:1.0:0.1:0.01 ( Figure 4B). This further validated that our method is quantitative over a broad dynamic range (approximately 10 4 ). Finally, to examine the performance of our approach with biological samples, we made use of a more complex RNA mixture where C. elegans total RNA was spiked with 28 vertebratespecifi miRNAs. When we examined the cloning frequency of these 28 miRNAs, they were indistinguishable from the results obtained from synthetic miRNA-only reactions ( Figure 4C). This result indicates that our optimized miRNA-Seq is applicable to total RNA samples. Collectively, these results demonstrate that our optimized miRNA-Seq method has achieved a highly quantitative miRNA measurement with a broad dynamic range.

Quantitative miRNA-Seq provides insights into miRNA expression in biological samples
We then applied our quantitative miRNA-Seq to biological samples obtained from neonatal mouse (postnatal day 7.5) heart and brain in which many individual miRNAs have been functionally characterized. Consistent with previous individual studies, we identified miR-125-5p, miR-124 and miR-9/9* as the most abundantly expressed miRNAs in the brain [38][39][40][41][42] and miR-1, miR-133a, miR-451, miR-208 and miR-126 as the most abundantly expressed miRNAs in the heart ( Figure 5A,B) [43][44][45][46]. Interestingly, for both brain and heart, the total miRNA pool is dominated by a small number (approximately 10) of highly expressed miRNAs, although more than 700 miRNAs were detectable in both samples ( Figure 5A,B). To further examine the quantitative performance of our method in these biological samples, we analyzed the cloning frequency of several clustered, cistronic miRNAs in both samples. We obtained very similar cloning frequency for many of these clustered miRNAs across a broad range of cloning frequencies ( Figure 5C). For example, two highly expressed miRNAs in the heart but not in the brain, miR-143 and miR-145, showed very similar read numbers within each sample despite their expression levels differing by 50-fold amongst the two samples ( Figure 5C). As an example of lowly expressed miRNAs, miR-546 and miR-201 were both cloned 51 times in the heart and yet only one time in the brain ( Figure 5C). The ability to clone these clustered miRNAs very similarly in each biological sample indicates that our method quantitatively represents miRNAs likely across four orders of magnitude in biological samples, similar to the benchmark with synthetic miRNAs (compared with Figure 4B). We also noted that among all of the clustered miRNAs examined, miR-451 and miR-144 consistently showed approximately 10-fold differential expression in both heart and brain ( Figure 5C). This is likely explained by the fact that miR-451's biogenesis is uniquely Ago2-dependent and Dicer-independent [47][48][49], unlike all other miRNAs known to date, including miR-144. Thus, such differential expression captured by our method is likely reflective of the differential processing efficiency between different miRNA biogenesis pathways.
Quantitative miRNA-Seq also provides the opportunity to accurately quantify miRNA isoforms differing by only one nucleotide. Although most miRNA isoforms vary at the 3′ end, these sequence changes are thought to have little impact on targeting as the 5′ seed region is the primary determinant for target recognition [1]. However, miRNA isoforms that differ at the 5′ end would cause a shift in seed sequence, likely altering their target recognition. In a recent study, we identified in vivo targets of miR-203, a highly expressed miRNA in the skin. We found that down-regulated mRNAs upon miR-203 induction showed enrichment for not only the 2-8 seed matches in their 3′ UTRs, but at positions 3-9 as well [50]. Although the 2-8 seed match is expected, the 3-9 seed match on the mRNA indicates a shifted seed region. We then examined the cloning results for miR-203 in the skin. Interestingly, we found a major miR-203 isoform that was trimmed by one nucleotide from the 5′ end ( Figure 6A). Importantly, this isoform was cloned at approximately 50% of the frequency of the canonical miR-203 (6.7% compared to 13.3% total cloning frequency). The apparent 3-9 seed match we observed previously was likely the result of targeting of the shortened isoform at position 2-8. To validate the expression of these two miR-203 isoforms, we performed 5′ end primer extension analysis that allows specific detection and quantification of each isoform. As expected, the canonical form was indeed expressed at approximately two-fold higher than the shortened isoform ( Figure 6B). As a control, we did not detect any signal in a Dicer knockout sample where all miRNAs, including miR-203, are depleted [31]. To further confirm the identity of these two isoforms, we performed northern blotting on RNA isolated from mouse total epidermis and again observed two differentially expressed miR-203 isoforms that differ by a single nucleotide ( Figure 6C). Taken together, these results further illustrate the application of our quantitative miRNA-Seq in the identification of functionally important miRNA species with singlenucleotide resolution.

Differentially expressed miRNAs repress their targets in a dose-dependent manner
Having demonstrated the quantitative capacity of our optimized miRNA-Seq, we asked whether we can leverage the information of relative expression level of individual miRNAs to determine their functional potency in a cellular context. Because miRNAs function in a dose-dependent manner [10], we reasoned that a highly expressed miRNA would exert a stronger degree of repression on its targets than lowly expressed ones. To test this hypothesis, we performed miRNA-Seq with HeLa cell total RNAs and determined miR-21 as the most highly expressed miRNA in these cells ( Figure 7A). In addition, independent experiments with these biological samples were also highly reproducible ( Figure 7B). To determine whether highly expressed miRNAs are indeed more potent in repressing their targets, we generated a series of heterologous luciferase reporters for three miRNAs expressed at different levels in HeLa cells ( Figure 7A). Each reporter contained 2× targeting sites for each miRNA with a central bulge located between 10 to 12 nucleotides ( Figure 7C). As expected, miR-21, the most highly expressed miRNA, showed strongest repression; miR-130a, a modestly expressed miRNA, showed intermediate repression; whereas miR-502, a lowly expressed miRNA, showed weak repression of the luciferase activity ( Figure 7D). This result supports the functional significance of highly expressed miRNAs in a specific cellular context and that our quantitative miRNA-Seq allows rapid identification of these miRNAs in a high-throughput manner.

Conclusions
In this study, we have developed a highly efficient miRNA-Seq method to quantify individual miRNAs within a biological sample ( Figure 7E). Although the RNA-ligaseintroduced bias has been widely recognized [19][20][21][22], a systematic solution generally applicable for a broad spectrum of small RNA substrates has not previously been developed. Such a bias, often leading to more than 1,000fold deviation, has significantly hindered the utilization of miRNA-Seq for quantitative miRNA profiling in a highthroughput manner. Although in some studies the sequencing depth has approached billions of reads [33], the power of accurate quantification of miRNAs based on the cloning frequency remains poor [20]. Because intermolecular RNA hybridization contributes to bias, the application of a thermostable RNA ligase or a 5′ splint adapter significantly reduces such bias. However, the ligation efficiency is low and the improvement by each strategy appears to vary for different RNA substrates. By examining individual steps during the small RNA cDNA library preparation, we have defined optimal conditions to achieve high ligation efficiency for both 3′ and 5′ ligation steps. We determined that the nucleotide composition of the linkers, the concentration of linkers, molecular crowding through the use of PEG8000, and ligation temperature are the primary determinants for highly efficient RNA ligation. We demonstrated with a wide spectrum of small RNA substrates that the ligation efficiency can approach 100%, a drastic improvement compared with the existing method. As a result, the high-efficiency RNA cloning leads to quantitative measurement of miRNAs by miRNA-Seq. To achieve such quantitative results for a pool of approximately 700 miRNAs for a single sample, we usually need approximately 4 to 5 million reads. With the data output routinely over 200 million reads from a single lane of a sequencer, in combination with the multiplexing capacity of the Illumina Hi-Seq platform, our miRNA-Seq method provides an extremely cost-effective and accurate approach for large scale miRNA analyses from numerous samples. The ability to measure relative expression levels of different miRNAs within a single sample also offers an important approach to identify functionally significant miRNAs. Conventional gene expression analysis by qRT-PCR and microarray usually characterizes differential expression levels of individual miRNAs and identifies miRNAs with the largest fold changes from two or more samples. With the detection sensitivity of qRT-PCR approaching single molecules, some very lowly expressed miRNAs could be identified as highly differentially expressed, though their biological importance may be limited due to low copy number. Since miRNAs function by guiding the RNA-induced silencing complex to their cognate mRNAs and sequestering the mRNAs from productive translation [51], the ratio between miRNAs and mRNA targets is a key factor in determining the potency of a miRNA [10]. In support of this, we demonstrate that the most highly expressed miRNA in HeLa cells, miR-21, has the highest capacity to repress the reporter expression, and the intermediately and lowly expressed ones (for example, miR-130a and miR-502) show much weaker repression. Thus, functionally important miRNAs could be identified as the most highly expressed miRNAs within a cellular context without the knowledge of differential expression patterns. Interestingly, in our analysis of miRNAs in both brain and heart, we observed that a small number (approximately 10) of miRNAs count for a major portion of the miRNA pool. This raises a possibility that the function of the miRNA pathway could be mediated by a handful of highly expressed miRNAs in each cell population.
Finally, our approach to enhance RNA ligation efficiency offers a general guide to improve all RNA ligation-based RNA-Seq methods. The enhanced ligation not only reduces the required amount of RNA input but also, more importantly, improves the representation of different RNA species/fragments in the sequencing libraries. These improvements will allow quantitative analysis of gene expression by fully harnessing the power of deep sequencing.

Materials and methods
Linker adenylation DNA oligonucleotides to serve as 3′ linkers for miRNA cloning were ordered from Integrated DNA Technologies Coralville, Iowa, USA(see Table S3 in Additional file 1 for sequences). All oligonucleotides were ordered with a 5′ phosphate group; those used in sequencing experiments also included a 3′ dideoxycytosine. Prior to ligation, 3′ linkers were adenylated at the 5′ end using the MTH adenylation kit (New England Biolabs #E2610L Ipswich, Massachusetts, USA). Oligonucleotides were phenol/chloroform extracted, precipitated with 2.5 volumes of ethanol, and quantified by A 260 . Adenlyation was confirmed by 18% PAGE (Figure S2 in Additional file 1).

5′ ligation reactions
Synthetic 5′ 32 P labeled RNA was subject to 3′ ligation and resolved on 15% PAGE-urea gels as described previously. The ligated product was excised and eluted and served as substrate for 5′ ligation tests. Initial 5′ ligation reactions were composed of the following: 50 mM Tris-HCl pH 7.5, 10 mM MgCl 2 , 1 mM dithiothreitol, 1 mM ATP, 20% (v/v) DMSO, 10 units T4 RNA Ligase 1 (New England Biolabs M0204S), 20 units RNase Inhibitor (Invitrogen N2611), 5 μM 5′ RNA cloning linker, and 25 femtomoles of eluted substrate RNA. Reactions were carried out at room temperature for 2 hours. Subsequent high efficiency ligations were identical with the following modifications: no DMSO, 20% (w/v) PEG8000, 5′ RNA cloning linker increased to 10 μM, and temperature increased to 37°C. Reactions were stopped, resolved, and visualized as described above. For the thermostable ligation, a thermostable RNA ligase (Epicentre TRL8101K Madison, Wisconsin, USA) was used following the manufacture's instruction.
Synthetic RNA library preparation RNA oligoribonucleotides were produced by Thermo Fisher Scientific (Lafayette, Colorado, USA)(for complete list and sequences see Table S1 in Additional file 1). Oligos were resuspended in distilled H 2 O and mixed at an equal volume to generate an equimolar mixture of the 29 synthetic RNAs.
Small RNA cloning RNA samples (5 to 1.25 picomoles for synthetic samples and 1 to 4 μg of total RNA for biological samples) were subject to high efficiency 3′ ligation as described above. Products were resolved on 15% PAGE-urea gels and stained with SybrGold (Molecular Probes S-11494 Carlsbad, California, USA). The region corresponding to ligated miRNAs (46 nucleotides) was excised, gel slices were thoroughly minced, and eluted in HSCB (400 mM NaCl, 25 mM Tris-HCl pH 7.5, 0.1% SDS) overnight at 4°C. Nucleic acids were precipitated in the presence of glycogen carrier via 2.5 volumes ethanol and 0.1 volumes sodium acetate. Pellets were washed in 70% ethanol, and resuspended in the 5′ ligation mix without enzyme. Samples were briefly heated to 70°C, snap chilled on ice, and enzyme added. Following ligation reactions cDNA was prepared using Superscript III RT (Invitrogen 18080-085) according to the manufacturer′s recommendation using 3′ linker-specific RT primer. Subsequent cDNA libraries served as templates for PCR amplification; amplicons were resolved on 8% native acrylamide gels. Bands of the correct molecular weight were isolated and used for high-throughput sequencing on the Illumina HiSeq2000.

Read mapping
A customized miRNA reference database was built with mature miRNA sequences obtained from miRBase [52]. Sequencing reads were mapped to the customized database with BLAST [53].

Primer extension
A 17-bp DNA primer reverse complementary with nucleotides 6 to 22 of the canonical sequences of miR-203 was used in the primer extension experiment to detect the 5′ length variants of miR-203. Each isoform was quantified using Image J software where the signal from the two bands was summed and each presented as a proportion of the total signal. The assay was performed following the manufacture's instructions (Promega #E3030 Madison, Wisconsin, USA).

Northern blotting
Total RNA (10 μg) was electrophoresed through 20% denaturing urea-PAGE gels and electrophoretically transferred to GeneScreen Plus® Nylon Membranes (Perkin-Elmer, Inc. Waltham, Massachusetts, USA). Radiolabeled oligonucleotides complementary to mature miR-203 sequence (miRbase accession MIMAT0000236) were hybridized to the membranes in ExpressHyb Hybridization Solution™ (Clontech Laboratories, Inc. Mountain View, California, USA). Synthetic miR-203 and miR-203iso were loaded as size standards. Bands were quantified as described in the primer extension experiment.

Luciferase assay
HeLa cell cultures were propagated according to ATCC guidelines. Cells were plated in 24-well trays at a density of 0.1 × 10 6 cells per well. Sixteen hours post-plating cells were transfected with LT1 (Mirrus Bio Madison, Wisconsin, USA) according to the manufacturer's instructions with 20 ng of the following constructs: pGL3 control, pGL3-miR502, pGL3-miR130a, and pGL3-miR21 along with 2 ng of Renilla (pRL-CMV), and 378 ng of 'filler' DNA (pLL3.7). Twenty-four hours post-transfection cells were assayed for luciferase activity using the Promega Dual Luciferase Assay Kit according to the manufacturer's instructions. Data are the ratio of firefly to renilla luciferase activity, normalized to the control construct. Shown is a representative result from an experiment done in triplicate; error bars represent standard deviations.

Data access
All miRNA-Seq data have been deposited to the National Center for Biotechnology Information Gene Expression Omnibus with accession number GSE47858.

Additional file
Additional file 1: Tables S1 to S3 and figures S1 to S3. Table S1: synthetic miRNA library. Complete list of 29 synthetic miRNAs and the primary sequences used in miRNA-Seq studies. Table S2: GC content of synthetic miRNA library. List of 29 synthetic miRNAs used in this study, their corresponding GC content listed as a percentage of total nucleotide number, and their observed cloning frequency listed as a percentage of total mapped miRNA reads. Table S3: oligonucleotides used in this study. List of oligonucleotides used in the ligation optimization and miRNA-Seq experiments. Figure S1: clustered miRNAs 143/145 exhibit differential quantification in a method-specific manner. (A) miRNA-Seq from mouse hair follicle. (B) qRT-PCR quantification of mouse epidermis determined by ΔΔCt method where sno25 serves as the reference gene. Figure S2: DNA oligos can be readily adenylated by MTH (Methanobacterium thermoautotrophicum) RNA ligase. 18% urea-PAGE of 3′ DNA linkers shows shift in electrophoretic mobility corresponding to 5′ adenylation where plus and minus signs indicate the presence or lack of MTH enzyme, respectively. Figure S3: enhanced miRNA-Seq approach is highly correlative for varying amounts of input RNA. Differing amounts of the 29 synthetic miRNA mix were subjected to enhanced miRNA-Seq.

Competing interests
EMA is an employee of Thermo Fisher Scientific, Inc.
Authors' contributions RY, JEL and ZZ designed the experiments. ZZ performed initial characterizations of the miRNA-Seq biases and initial optimizations, prepared libraries from biological samples, and performed the primer extension assay and all bioinformatics analyses. JEL optimized 5′ and 3′ ligation reactions, prepared the optimized synthetic and HeLa libraries and performed the luciferase assays. KR performed the northern blotting. EMA provided the synthetic miRNAs. RY, JEL and ZZ wrote the manuscript. All authors read and approved the final manuscript.