New gene expression pipelines gush lncRNAs
© BioMed Central Ltd 2013
Published: 24 May 2013
Skip to main content
© BioMed Central Ltd 2013
Published: 24 May 2013
Genome-wide techniques provide robust and comprehensive identification of lncRNAs in adult mouse neural stem cells and their derivatives, illuminating the functions of these underappreciated transcripts.
Mammalian genomes have unexpectedly few (20,000 or so) protein-encoding genes . Our view of the mammalian genome has additionally been revolutionized with new knowledge that the number of non-protein-coding genes and their product, non-coding RNAs (ncRNAs), has long been underestimated. The finding that more than 70% of the mammalian transcriptome consists of ncRNAs  has promoted a search for ncRNA functions. In particular, one category of ncRNAs that has been the target of recent intensive research consists of long ncRNAs (lncRNAs). lncRNAs are defined as transcripts of at least 200 nucleotides that possess little theoretical protein-coding potential. These intriguing RNAs resemble mRNAs in many ways: they are transcribed by RNA polymerase II and capped, they can undergo splicing and polyadenylation, and a small fraction of those exported to the cytoplasm can associate with ribosomes, although with uncertain consequences . However, unlike mRNAs, most lncRNAs are primarily retained within nuclei . Moreover, lncRNAs manifest rates of sequence shifts that throughout evolution have surpassed those of mRNAs .
Since lncRNAs generally do not encode proteins, many researchers had relegated them to transcriptional noise. However, increasing lines of evidence suggest that lncRNAs can function to regulate mammalian gene expression at multiple levels, and that they are responsible for a number of key cellular and developmental processes (see Rinn and Chang  and the references therein). Rapid advances in high-throughput techniques, especially RNA-seq, have enabled extensive efforts in identifying lncRNAs and the generation of lncRNA databases for various species [3, 6]. Nevertheless, since the expression of lncRNAs appears to be more cell type-specific than the expression of mRNAs, and as most lncRNA databases are derived from a mixture of cell types, there is currently a void of reliable lncRNA information for individual cell types.
To fully appreciate the functions of lncRNAs, one key task that remained to be undertaken was to construct accurately annotated cell type-specific lncRNA expression maps for a dynamic, developmental process in vivo. Towards this goal, a recent study by Alexander Ramos and colleagues  employed complementary high-throughput methods to identify more than 12,000 lncRNAs expressed during mouse brain development. The authors examined the expression patterns of lncRNAs during subventricular zone (SVZ) neurogenesis in adult mice. They subsequently established an online resource to predict regulatory roles of lncRNAs in the (1) SVZ, which contains neuronal stem cells (NSCs) that can migrate to the olfactory bulb (OB), (2) OB, where NSCs terminally differentiate into interneurons, and (3) dentate gyrus (DG), which harbors a complete neuronal lineage.
To investigate the relationship between lncRNAs and adult mouse-brain development, an issue of emerging interest, Ramos et al. sequenced cDNA libraries from microdissected SVZ, OB and DG . After including RNA-seq data from mouse embryonic stem cells (ESCs) and ESC-derived neural progenitor cells (ESC-NPCs) to increase coverage of potential lncRNAs, the authors used ab initio transcriptome reconstruction to identify 8,992 lncRNAs that derived from 5,731 genomic loci. To incorporate lncRNAs that might have been missed by short-read Illumina-based sequencing, the authors employed long-read RNA CaptureSeq to sequence SVZ cDNAs hybridized to probe libraries that tiled across 100 Mbp of putative lncRNA loci. The additional >3,500 lncRNAs brought the number of lncRNAs identified in neuronal lineages in vivo to an unprecedented >12,000, which is two- to three-fold more than previously known (see Mitchell Guttman et al. , for example). The surprising increase in the number of lncRNAs was explained by the focus of previous studies on only one or a combination of a few closely related cell or tissue types. This focus would inherently fail to capture certain sets of lncRNAs, given the finding that lncRNAs exhibit greater spatiotemporal expression specificity than mRNAs (, see below). Furthermore, previous studies were limited by the use of relatively insensitive techniques. For instance, custom microarrays do not cover the entire transcriptome, and Illumina-based RNA-seq rarely picks up lower abundance transcripts, many of which are lncRNAs.
Considering that lncRNA expression is highly specific to cell type and strictly regulated during development, those lncRNAs identified by Ramos et al.  are anticipated to be only part of the mouse lncRNA repertoire. Thus, it is likely that the number of lncRNAs in other organisms has been underestimated as well, since no other thorough genome- and developmental-wide analysis has been performed. It follows that existing underestimates of lncRNA numbers are accompanied by an under-appreciation of lncRNA functions, some of which have been conserved through evolution from zebrafish to humans .
The finding that lncRNAs exhibit greater spatiotemporal expression specificity than do mRNAs - a finding that derived in part from published RNA-seq data from different regions of the mouse brain and during different stages of mouse brain development - indicated that lncRNAs have specific spatiotemporal roles. Thus, equally as important as identifying lncRNAs is determining lncRNA expression patterns. To map the expression patterns of lncRNAs in distinct cell types in vivo, Ramos et al.  used specific markers and fluorescence-activated cell sorting (FACS) to sort SVZ-derived cells that represent the three main neurogenic cell types - namely, activated NSCs, transit-amplifying cells and migratory neuroblasts - and they did likewise for niche astrocytes. The authors then interrogated the cDNAs generated from these cells using a microarray of probes corresponding to the lncRNAs that they had previously identified. They found a unique lncRNA expression pattern for each of the three stages of neurogenesis analyzed that can be distinguished from the expression pattern in niche astrocytes. Thus, the differential expression of lncRNAs at different stages of the same lineage likely contributes to the specification of these stages.
In addition, Ramos and co-workers  found that lncRNAs are transcriptionally regulated in a manner analogous to mRNAs. Using ChIP-seq, they showed that, as for mRNAs dynamically regulated in neurogenesis, the transcription start sites (TSSs) of many of the identified lncRNAs had both an activating and a repressive histone mark (H3K4me3 and H3K27me3, respectively) in NSCs. With this bivalent mark, their promoters are held inactive but poised for either activation or repression upon differentiation via loss of either one of the histone modifications. The presence of these marks was consistent with the expression patterns of the particular lncRNAs as determined using microarrays. These findings enable the prediction of lncRNAs that may function in NSC maintenance and/or differentiation. The authors have incorporated their annotations of putative lncRNAs that derive from RNA-seq and RNA CaptureSeq, along with lncRNA expression patterns determined using microarray analyses, into an online database . This lncRNA identification and expression analysis pipeline constitutes an important resource for future analyses of lncRNA function in the mouse brain and during SVZ neurogenesis.
Ramos et al.  used their newly established pipeline to predict lncRNAs that may function in SVZ neurogenesis. One of these lncRNAs, Six3os, was expressed specifically in NSCs but not in subsequently differentiated cells. Downregulating Six3os using a short-hairpin RNA reduced by two-fold the number of NSCs that after differentiation stained positive for the neuron-specific class III beta-tubulin (TUJ1), indicating a Six3os short hairpin RNA (shRNA)-mediated defect in neurogenesis.
While only a few identified lncRNAs were functionally validated, results demonstrate the utility of the authors' workflow to predict with high confidence lncRNAs that function in the neurogenic process. Notably, when the authors generated different transcript modules, consisting of lncRNAs and known protein-coding transcripts whose variation in expression typify a brain region or brain developmental stage, they found that some modules are closely related to human neurodegenerative diseases such as Huntington's disease, Alzheimer's disease and so on. This suggests that lncRNAs classified into such modules are potentially associated with these diseases. For example, 88 lncRNAs were found in a module that correlates with a gene expression set that is misregulated in mouse models of Huntington's disease, implying potential roles for these lncRNAs in this neurodegenerative condition. More than solving the argument of whether these lncRNAs have functional relevance, it now becomes imperative to understand lncRNA function in order to understand how they contribute to neurodegenerative disorders.
chromatin immunoprecipitation followed by high-throughput DNA sequencing
embryonic stem cell
ESC-derived neural progenitor cell
fluorescence-activated cell sorting
histone containing a trimethylated lysine at position 4 or 27
long non-coding RNA
neuronal stem cell
a combination of cDNA capture on tiling arrays that enriches for particular transcripts or genomic regions followed by 454 sequencing-to-saturation to mine the depth of a particular part of the transcriptome
high-throughput whole-transcriptome shotgun sequencing of cellular cDNAs
transcription start site
neuron-specific class III beta-tubulin.
We thank Max Popp for comments on the manuscript. Work on lncRNAs in the Maquat laboratory is supported by R01 grant GM074593 from the NIH to LEM.