Mutations within lncRNAs are effectively selected against in fruitfly but not in human
© Haerty and Ponting; licensee BioMed Central Ltd. 2013
Received: 25 February 2013
Accepted: 27 May 2013
Published: 27 May 2013
Previous studies in Drosophila and mammals have revealed levels of long non-coding RNAs (lncRNAs) sequence conservation that are intermediate between neutrally evolving and protein-coding sequence. These analyses compared conservation between species that diverged up to 75 million years ago. However, analysis of sequence polymorphisms within a species' population can provide an understanding of essentially contemporaneous selective constraints that are acting on lncRNAs and can quantify the deleterious effect of mutations occurring within these loci.
We took advantage of polymorphisms derived from the genome sequences of 163 Drosophila melanogaster strains and 174 human individuals to calculate the distribution of fitness effects of single nucleotide polymorphisms occurring within intergenic lncRNAs and compared this to distributions for SNPs present within putatively neutral or protein-coding sequences. Our observations show that in D.melanogaster there is a significant excess of rare frequency variants within intergenic lncRNAs relative to neutrally evolving sequences, whereas selection on human intergenic lncRNAs appears to be effectively neutral. Approximately 30% of mutations within these fruitfly lncRNAs are estimated as being weakly deleterious.
These contrasting results can be attributed to the large difference in effective population sizes between the two species. Our results suggest that while the sequences of lncRNAs will be well conserved across insect species, such loci in mammals will accumulate greater proportions of deleterious changes through genetic drift.
Although protein coding sequence occupies a little over 1% of the human genome, approximately 10-fold more non-coding sequence is predicted to have been under purifying selection . For smaller genomes, larger proportions (for example, 50% of all Drosophila sequence) have been predicted to have been under selective constraints . These estimates are founded on the assumption that sequence conservation is caused not by low rates of mutation, but instead by the high rates at which deleterious alleles are purged from the population by natural selection, an assumption that is well supported .
A considerable fraction of conserved non-coding sequences in human and fruitfly genomes are transcribed [8, 9]. Non-coding transcripts can be classified into small RNAs (<200 nt, such as microRNA) and long RNAs (>200 nt, lncRNA). Many lncRNAs are spliced and/or polyadenylated , and they show tendencies to contain a smaller number of exons than protein coding genes and to be expressed in a tissue and/or developmental stage-specific manner -.
A handful of lncRNAs have been functionally characterised as being involved in dosage compensation in either human (Xist ) or Drosophila (roX1, roX2 ), or having roles in imprinting or chromatin modification (AIRN ; HOTAIR ), in alternative splicing regulation or in cell differentiation (MALAT1, Tug1 -). More broadly many lncRNAs appear to be involved in gene expression regulation in either cis or trans, through the local modification of chromatin and/or direct interaction with protein complexes, DNA or RNA sequences [11, 12, 21]-. Recently lncRNAs have also been associated with the maintenance of embryonic stem cell pluripotency [24, 25]. Furthermore, there is limited evidence to link some lncRNAs, such as ANRIL or HOTAIR, to human pathologies [26, 27]. However, the functional contribution to biology from the vast majority of long non-coding RNAs (lncRNAs) remains unknown.
If a lncRNA has retained functionality over a long evolutionary time-period then mutations that abolish or diminish the function would be deleterious and would preferentially be purged from the species lineage. This would be reflected in a greater level of sequence conservation between species. Indeed, lncRNAs have been found to be significantly better conserved between species than are putatively neutrally evolving sequences, such as ancestral repeats in mammals - or small introns in Drosophila . Furthermore, mammalian lncRNAs are enriched in conserved sequences identified either by elevated conservation (for example, phastCons ) scores or by applying a neutral model based on sequence insertions and deletions [28, 30]. Additionally, increased conservation of the dinucleotide splice sites and a suppressed transversion rate have also been reported for mammals . However, in each organism analysed thus far, lncRNA sequences have been shown to diverge far more rapidly than have protein-coding sequences [13, 28]-. These observations indicate an intermediate state in selective constraints between protein-coding sequences and neutrally evolving sequences. The rapid divergence of lncRNA sequences between species complicates the identification of orthologous sequences for many of the lncRNA loci. Therefore, instead of nucleotide conservation, the conservation of orientation and position relative to an orthologous protein coding-gene can be used to define positionally equivalent lncRNAs between species [13, 32].
To date, most evolutionary analyses on lncRNAs have been conducted at the interspecies level using species that diverged approximately 75 million (human - mouse ) or 5 million years (Drosophila melanogaster - D. simulans ) ago. Although there is mounting evidence for purifying selection acting on lncRNAs, we note that previous analyses have used only a single reference genome per species. Previous studies reported an increased conservation level relative to a neutral reference [13, 28]-, but they have not directly determined the strength of selection acting on these non-coding sequences nor do they provide an understanding of the fitness effects of mutations, in terms of the product of the effective population size (Ne) and selection coefficient (s), occurring within these transcripts.
It is important to compare interspecific indicators of constraint to intraspecific estimates of fitness effects since recent findings have demonstrated rapid evolution of lncRNAs that are specific to individual lineages . A comparison between species can inform on past events but rarely does it have the power to identify contemporaneous or lineage-specific selective constraints. Even when employing comparisons among multiple species it is challenging to ascertain, within a specific lineage, the nature and the strength of the selective pressures acting on rapidly evolving loci.
For instance, the HOTAIR locus has evolved rapidly since the last common ancestor of mouse and human and differences in the consequences of knockout in these species' cell lines have been interpreted as indicating the evolution of lineage specific biological functions . Additionally, it was recently demonstrated that expression of a large number of lncRNA loci has altered rapidly among murid lineages . Consequently, a low level of sequence conservation between two species could reflect, at one extreme, a historically low level of sequence constraint in both lineages, or, at the other extreme, it could reflect sequence that is constrained in only a portion of a single species lineage. Deciding among this range of possibilities relies on determining constraint within extant populations, for example by identifying whether derived low frequency alleles are enriched, relative to neutral sequence, within human or Drosophila lncRNA sequence . A recent study indicated that this was, indeed, the case for human lncRNAs identified by the ENCODE consortium .
In such studies we need to consider that most human variants are recent [7, 37], and there is a negative correlation between the age of the variant and its deleterious effect . Consequently the bulk of deleterious mutations within a species are less likely to be detected when comparing distantly-related species as they will not often reach fixation.Therefore inter-species comparison will focus on substitutions events that are at most weakly deleterious as deleterious mutations are rarely fixed. Once again this underscores the importance of analysing, at the population level, nucleotide variation occurring within lncRNA loci if we are to better understand the relationships linking their evolution and function. A potentially important confounding issue that needs to be considered in such analyses is that of background selection as well as selective sweeps, where selection at one site reduces genetic diversity, but not divergence, at linked sites . To account for this effect, variation at tested sites needs to be compared against variation in physically linked putatively neutral sites.
For this study, we have taken advantage of recent high-throughput sequencing projects win D. melanogaster  and humans , and the annotation of intergenic lncRNAs in both species [13, 41]. The availability of these large population datasets permits polymorphism and divergence distributions to be investigated in both species across both coding and non-coding gene models. If the function of a lncRNA locus is mediated through the act of transcription rather than through the RNA transcript itself [42, 43] then we expect no difference in nucleotide conservation between exons and introns. In contrast, if the spliced transcript primarily has a RNA sequence-dependent function then its exonic sequence is expected to be well-conserved relative to its introns, as has been observed for protein-coding genes .
Our results reveal hitherto unappreciated distinctions in constraint between lncRNA exons and introns which are abundantly evident for Drosophila but are far less so for humans. In Drosophila striking differences in conservation between exons and introns suggest that the spliced transcript is often important in mediating the biological functions of lncRNA loci. Our analysis of site frequency spectra indicates that purifying selection has been effective on D. melanogaster lncRNA sequence but, importantly, not on human lncRNAs. Selection on mutations within human lncRNAs appear to be effectively neutral as a consequence of our species' unusually low effective population size.
Conservation of intergenic lncRNA exons in Drosophila
Our previous evolutionary rate analyses of Drosophila  or mammalian [28, 30, 45] intergenic lncRNAs considered the degree of constraint associated with transcribed lncRNA sequence under the assumption that small introns and preserved transposable element sequences ('ancestral repeats') evolve neutrally [3, 46–48].
For Drosophila lncRNAs, we observed a strong contrast in median phastCons scores between their exons and their introns (Figure 1). While protein-coding exons exhibit the greatest degree of conservation, as expected lncRNA exons are associated with intermediate conservation levels, greater than those for protein-coding or lncRNA introns or indeed randomly sampled intergenic sequence (P<0.001, Figure 1A). Strong purifying selection in exonic, but not intronic, sequence implies that the molecular functions of these multi-exonic fruitfly lncRNAs are predominantly RNA-sequence specific rather than requiring only the process of transcription, for example during chromatin remodelling [11, 42, 43].
Performing the identical analysis on a set of human lncRNAs  revealed their median phastCons scores to be low not just for introns but also for exons (Figure 1B). There is a significantly greater conservation for lncRNA exons compared with introns (P < 0.05) except for the 3′ last-most exon whose conservation is not significantly different to that of introns (P >0.05 in all comparisons, Additional File 1). Moreover, sequence conservation in human lncRNA exons or introns is little different from conservation of intergenic sequence. We found similar results when using different human lncRNA sets as well as a set of positionally equivalent lncRNAs between human and mouse (Additional File 2).
Interestingly, when, instead of median values, mean phastCons scores for human lncRNA exons are considered, these are marginally higher than intronic scores (Additional File 1). We conclude from these observations that there is substantial heterogeneity in conservation among human lncRNA loci, yet sequence for the majority of such loci shows little or no conservation.
We noted that D. melanogaster lncRNAs exhibit no elevation of phastCons scores at their 5′ or 3′ splice sites using either the median or mean conservation scores (Figure 1A, Additional File 1). To investigate this further we compared the conservation of splice site dinucleotides ('GT' and 'A'G') across five species with randomly selected 'GT' and 'AG' dinucleotides yet found no significant difference in their levels of conservation (Additional File 3). One conceivable explanation is that across the approximate 300 million years of evolution represented in the Diptera and Coleoptera phastCons scores, splice site dinucleotides have been conserved less than over the approximate 450 million years represented in the vertebrate phastCons scores.
Lowered polymorphism levels within intergenic lncRNA exons relative to introns
The conservation analysis that we present above illustrates qualitatively the relative conservation between exons or introns, and differences in constraint between fruitfly and mammalian lncRNA sequences. This analysis is based on aligned sequences from highly divergent species and therefore provides us with evidence on past selection but unfortunately not on more contemporary evolutionary processes. To address this, we looked to DNA polymorphism data from both D. melanogaster and human populations.
Number of polarised polymorphic sites among 162 D.
Average (standard deviation) polymorphism estimates for 1ncRNA loci and their flanking protein coding genes (within 5 kb) in D .
4.8 × 10-3 (4.4 × 10-3)
5.39 × 10-3 (3.8 × 10-3)
4.94 × 10-3 (3.2 × 10-3)
5.88 × 10-3 (3.2 × 10-3)
1.01 × 10-2 (1.12 × 10-2)
8.96 × 10-3 (8.16 × 10-3)
Average (standard deviation) polymorphism estimates for 1ncRNA and their flanking protein coding genes in human.
1.05 × 10-3 (1 × 10-3)
1.03 × 10-3 (0.07 × 10-4)
1.51 ×-3 (1.74 × 10-3)
1.06 × 10-3 (8.85 × 10-4)
1.16 × 10-3 (6.91 × 10-4)
1.59 ×-3 (1.58 × 10-3)
1.09 × 10-3 (1.07 × 10-3)
1.19 × 10-3 (8.26 × 10-4)
1.64 ×-3 (1.79 × 10-3)
PE lncRNA exons
9.73 × 10-4 (7.87 × 10-4)
1.13 × 10-3 (6.61 × 10-4)
1.46 ×-3 (1.41 × 10-3)
PE lncRNA introns
1.04 × 10-3 (7.62 × 10-4)
1.08 × 10-3 (4.67 × 10-4)
1.42 ×-3 (9.2 × 10-4)
Controls lncRNA exons
1.04 × 10-3 (8.62 × 10-4)
1.15 × 10-3 (6.57 × 10-4)
1.46 ×-3 (1.54 × 10-3)
Controls lncRNA introns
9.84 × 10-4 (6.48 × 10-4)
1.08 × 10-3 (5.09 × 10-4)
1.47 ×-3 (1.33 × 10-3)
1.51 × 10-3 (1.81 × 10-3)
1.68 × 10-3 (1.14 × 10-3)
2.34 ×-3 (3.48 × 10-3)
Evidence for strong purifying selection on intergenic lncRNAs in Drosophila
Next, to test for the strength of selection within exons or introns from fruitfly or human lncRNA loci, we compared the nucleotide variation within lncRNAs and protein coding exons and introns to putatively neutral sequences using the average number of pairwise nucleotide differences per sites (πT , θW [49, 50]), and Tajima's D  which tests for departures from neutrality. We also assessed the nucleotide divergence between D. melanogaster-D. simulans, and human-macaque using the Jukes-Cantor corrected divergence (k ).
As expected, we inferred stronger selective constraints on the protein-coding exons and introns of fruitfly genes, owing to their lower Tajima's D and divergence (k), than for small introns, our neutral evolution proxy (Kruskal-Wallis test, P <0 .05 in all comparisons, Table ). Likewise, D. melanogaster lncRNA exons and introns were associated with lower Tajima's D and k values relative to our neutral sequence proxy, namely small introns (P <0.001 in both comparisons). Greater selective constraint on Drosophila lncRNA exonic sequence was observed: values for lncRNA exons were significantly lower than for lncRNA introns (P ≤0.01 in both comparisons). Although we found no difference in πT , θW or Tajima's D values between lncRNAs and protein coding upstream sequences (P >0.05 in all comparisons), we found lncRNA upstream sequences to be less diverged than those of protein coding sequences (P <0.001). This observation of lower interspecific divergence is likely to be the consequence of lncRNA gene models being incomplete, which in turn is a consequence of their low expression levels.
Like fruitflies, human protein coding exons are under stronger selective constraints than either lncRNA exons, introns or protein-coding introns as indicated by lower πT , Tajima's D and k values (P <0.001, Table ). In contrast to Drosophila, we found no significant difference in Tajima's D values computed for human lncRNA exons, introns and their flanking ancestral repeats. Additionally intergenic lncRNAs that are positional equivalents between human and mouse do not show a significant reduction of polymorphism or Tajima's D value relative to a control set of intergenic lncRNAs (P >0.05, Table , Additional Files 5 and 6).
Excess of low frequency variants in Drosophila intergenic lncRNAs relative to neutral sequences
An equivalent analysis on the set of human lncRNAs, using data from the 1000 Genomes Project , revealed no enrichment of rare variants within human lncRNA exons relative to candidate neutrally evolving sequences such as four fold degenerate sites, introns or ancestral repeats (P >0.05, Figure 3). This result is important in allowing us to extend from our previous observation of a low degree of conservation between species, to effectively neutral or weak negative selection occurring since the emergence of modern humans. We similarly found that the derived allele frequency (DAF) of SNPs within positionally conserved lncRNAs does not depart significantly from the distribution observed for neighbouring ancestral repeats. While we observe a departure in the human lncRNA SNP DAF with respect to that for ancestral repeats sampled genome-wide, this is likely attributable to the effects of background selection: negative selection acting on the genomically proximal protein-coding genes.
Deleterious effect of mutations within intergenic lncRNAs in fruitfly but not in human
In our final analysis we estimated the distribution of fitness effects of new mutations within D. melanogaster or human lncRNA exons from their respective site frequency spectra. Because the DAF spectra can be influenced by past variation in effective population size, we employed the method of Keightley and Eyre-Walker  that estimates the distribution of fitness effect of new mutations and demographic parameters from the folded frequency spectrum.
Our estimates of the distribution of fitness effects of newly arising mutations within non-degenerate sites are in agreement with previous analyses conducted in human. Boyko et al.  as well as Keightley and Eyre-Walker  identified between 22% and 34% of newly arising mutations within the African population as being selectively effectively neutral (our estimate: 26.69%).
Previous between species comparisons predict lncRNAs to have evolved under a regime of purifying selection that is considerably weaker than for protein-coding sequences [13, 28]-. Because of their design, virtually all of these experiments consider evolutionarily ancient selective events. However by taking advantage of available sequenced genomes of individuals from within the same species, we can now: (1) infer the evolution of these sequences at a considerably shorter time scale; (2) quantify more precisely the strength of recent or contemporaneous selection acting on lncRNAs; and (3) assess the distribution of fitness effect of new deleterious mutations occurring within these sequences. From the reported importance of a limited subset of lncRNAs in gene regulation [23, 25, 26], it might have been expected that human lncRNAs would exhibit a weak signature of purifying selection at the population level.
D. melanogaster intergenic lncRNA evolution
Our results show that D. melanogaster intergenic lncRNAs are subject to moderately strong selective constraints. SNPs occurring within fruitfly lncRNAs are characterised by an excess of rare variants relative to neutral sequences (either small introns or randomly sampled sites within flanking intergenic sequences), leading to a negative estimate of Tajima's D, and a L-shaped site frequency spectrum. We reached the same conclusion when considering the minor allele frequency or the derived allele frequency or when taking account of mutational biases (AT→GC, GC→AT). Although this effect could be explained by a recent population expansion , we reached identical conclusions when using an algorithm that estimates population parameters before testing for the distribution of fitness effect of newly arising mutations [54, 57].
Our findings of fruitfly lncRNA constraint at the population level are confirmed at the interspecific level by comparing nucleotide conservation between lncRNA exons and introns, an extension to our previous findings . LncRNA exons were shown to exhibit an intermediate level of conservation between protein-coding exons and intergenic sequences, while conservation of lncRNA introns does not differ significantly from that of intergenic sequence.
These differences in conservation between Drosophila lncRNA exons and introns, as well as the observation of a greater proportion of low frequency variants within lncRNA exons relative to lncRNA introns, argue strongly for spliced transcripts being important for the function of many fruitfly lncRNAs and not RNA sequence-independent biological function as found for some lncRNA loci such as HSI and Airn [42, 43].
In contrast to results for human lncRNAs (which confirm our previous observations [28, 31]) we found no significantly increased conservation for splice sites in Drosophila lncRNAs relative to randomly selected 'GT' and 'AG' dinucleotides within intergenic and intronic sequences. This lack of increased splice site conservation, despite an increased nucleotide conservation of the lncRNA exons, may indicate a rapid divergence of splicing elements within these long non-coding RNAs. This observation could, however, also result from the mis-annotation of splice sites as a consequence of typically low sequence coverage for lncRNA models in RNA-Seq experiments.
Human intergenic lncRNA evolution
In contrast to evidence in flies, we found no evidence from human population data for widespread purifying selection acting on lncRNA sequence, and only a weak signal of elevated sequence conservation between vertebrate species. Few human lncRNAs were as highly conserved as those from Drosophila (Additional File 9).
As evidence for lncRNA sequence conservation across species is scarce, potentially orthologous transcripts transcribed with the same orientation and syntenic position relative to an orthologous protein coding locus have been identified among human, mouse and zebrafish . If such positionally equivalent lncRNAs are orthologous and retain ancestral function then purifying selection acting on these loci might be expected to be stronger than for the remaining lncRNAs. However, these positionally equivalent lncRNAs' sequence conservation across vertebrates, as well as their site frequency spectra, were found not to differ from those of a control set of human lncRNAs. Once again this highlights the weak selective constraints that have acted both recently and more historically on vertebrate lncRNAs. Accordingly, zebrafish lncRNAs with positional equivalents in human or mouse were found not to exhibit sequence conservation between these species .
The lack of evidence for strong or widespread purifying selection or the weak selective effect of mutations within non-coding sequences in human has been reported previously, although not specifically for transcribed non-coding sequence. Torgerson et al.  compared polymorphisms in human within conserved intergenic sequences (>5 kb upstream and downstream of annotated transcripts) with synonymous site polymorphisms and found no evidence for selection on intergenic conserved sequences. Likewise, Krukyov et al.  and Chen et al.  found that despite purifying selection acting on the most conserved non-coding elements in human, of mutations within them have only weak effects on fitness.
Why might fly intergenic lncRNA evolution differ from human intergenic lncRNA evolution?
We estimated that an average of 35.82% of new mutations within D. melanogaster intergenic lncRNAs are effectively negatively selected. However, selection on all mutations within human intergenic lncRNAs, even those with a positional equivalent in mouse, was predicted to be effectively neutral.
Some of the observed differences in conservation and selection acting on lncRNAs between D. melanogaster and humans could be due to different origins of the two datasets. Our set of human lncRNAs was derived from adult tissues  whereas the fruitfly lncRNAs were identified from a developmental time course gene-expression analysis [9, 13] and could therefore be subject to stronger selective constraints. Previous studies showed increased purifying selection on protein-coding genes expressed early during development relative to genes expressed during the adult stage .
A second explanation for the observed differences between D. melanogaster and human lncRNAs in conservation and allele frequency distribution relates to differences in the effective population sizes of the two species. The influence of effective population size on the probability of fixation of a deleterious mutation is well documented . According to the nearly neutral theory of molecular evolution,the probability of fixation of such a mutation is a function of 4Neμs (μ: mutation rate, s: selection coefficient), and thus a weakly deleterious mutation will be effectively neutral if the product of its selection coefficient (s) and the effective population size (Ne) is near to one [63–65]. There is a considerable difference in estimated effective population sizes of D. melanogaster or H. sapiens: 1,450,000 versus 1,200-15,000, respectively [66–68]. This results in a wide range of low selection coefficients s for which deleterious mutations have widely varying fixation probabilities between the two species. A deleterious mutation with a small selection coefficient in human is likely to evolve essentially neutrally, while a mutation with the same selection coefficient in Drosophila will tend to be subject to stronger purifying selection. More formally any mutation with |s| > 1/Ne human will be under the scrutiny of selection in either species while any mutation with 1/Ne human > |s| > 1/Ne Drosophila will be under a selectively near neutral regime in human but will be under more effective negative selection in D. melanogaster. According to the effective population size estimates cited above, the minimum value of s for selection to act on deleterious variants ranges from approximately 7 × 10-5 in human to three orders of magnitude lower, 7 × 10-8 in D. melanogaster. This difference in effective population size between human and Drosophila is a likely explanation of the striking differences in the DAF distributions of variants within lncRNAs in D. melanogaster and human.
A third explanation might be that the repertoires of fruitfly or human lncRNA molecular mechanisms are very different, leading to differences in the signatures of selection in their lncRNA sequences. If this is indeed the case then we speculate that fruitfly lncRNA mechanisms will be more critical to its biology than are lncRNA mechanisms to human biology.
From these results testable predictions can be made regarding the evolution and conservation of lncRNA sequences. Deleterious mutations with a particular value of s within lncRNA in species with large effective population size, such as insects [59, 69], are more likely to be purged leading to a greater sequence conservation. In contrast within species with low effective population size, such as human, weakly to mildly deleterious mutations are more likely to be fixed leading to a greater turn-over of non-coding transcribed sequences . This effect explains the difference in the distribution of fitness effects of deleterious mutations at genes annotated with disease/lethal phenotypes in human and fruitflies.
Comparison with Ward and Kellis 
Our conclusion that negative selection is highly inefficient within human lncRNA variants appears to be at odds with evidence from Ward and Kellis that their variants exhibit a lower mean DAF than genomic samples . This apparent discrepancy could not be explained by the different lncRNA sets being considered. This was because results from our reanalysis of the Ward and Kellis lncRNA set from ENCODE were equivalent to those we report above. It could also not be explained by Ward and Kellis'  consideration only of SNPs of Yoruba origin, since when we re-ran our approach using only Yoruba SNPs, no substantive differences were found (Additional Files 10 and 11). Instead, we believe the discrepancy likely arises from the differences in the choice of proxy for neutral sequence. In our analysis, we account for the otherwise potentially confounding factors of background selection and mutational variation by considering sites either within ancestral repeats that flank lncRNA loci, or within flanking intergenic sequence that has been masked for conserved sequence. By contrast, the approach of Ward and Kellis  samples sites from concatenated unannotated intergenic sequences drawn from across all autosomes, and thus does not account for background selection or mutational rate variation.
Although interspecies sequence conservation over long evolutionary time is rightly considered as an indicator of functionality, the lack of conservation within lncRNAs does not necessarily imply their lack of functionality . Sequences encoding heart enhancers have been found to be as poorly conserved as randomly sampled sequence . The accumulation of weakly to mildly deleterious mutations within poorly conserved sequence, such as human lncRNA loci, raises the question of how a population can carry an ever increasing burden of deleterious variants within loci that regulate gene expression? Previous hypotheses proposed that such sequences interact with only a limited number of factors or that only a very restricted proportion of sequence is required to convey biological function . Others suggest that compensatory mutations within the locus maintain secondary structure  or similarly within the sequence of its interacting partner maintain molecular function. Such compensatory mechanisms  and network redundancy have been proposed to explain the rapid sequence evolution of lncRNAs and the absence of mutant phenotypes for some lncRNA knockout models. Finally, the accumulation of slightly deleterious mutations could also be explained by synergistic epistasis, when interactions between mutations produce a greater effect than expected from the sum of their independent effects. This hypothesis was first proposed to explain the mutational load paradox in species with low effective population sizes  but may also help to explain the accumulation of potentially deleterious mutations at synonymous sites  and within conserved non-coding sequences .
The inefficiency, or low degree, of selection acting on mutations within human lncRNAs suggests that for the great majority of these loci extensive phenotyping will be necessary to identify the potential deleterious effects of their disruption. Accordingly, several recent studies have reported that despite phenotypes being observed in cell-based assays for several lncRNA loci (HOTAIR, Malat1, Neat1), no overt phenotype (for example, litter size, body weight or viability) was found in the knockout mice under normal laboratory conditions ([34, 75–78]).
However an absence of overt phenotype in laboratory conditions does not necessarily imply that there is no deleterious effect of the knockout. Although the knockout mice did not differ from the wild-type individuals, further analyses found evidence for phenotypes for Evf2 , and Bc1 [80, 81] mutants. Analyses in yeast and in worm have revealed that despite the observation of a lack of phenotype for a vast majority of the knockout mutants, fitness effects measured as population growth under a wide range of conditions are apparent for up to 97% of Saccharomyces cerevisiae genes  and between 42% and 60% of genes assayed in Caenorhabditis elegans. Finally, because lncRNAs are most often expressed at low levels in a developmental stage and/or tissue specific manner this increases the difficulty of identifying potential phenotypes associated with their disruption.
Genetic drift appears to be the main driving force in the evolution of intergenic lncRNAs, at least in humans, as a consequence of our small effective population size. Therefore, weakly to mildly deleterious mutations are likely to have accumulated rapidly within intergenic lncRNAs. The consequences of such an accumulation on lncRNA function and on human biology have yet to be experimentally assessed. Our observations serve to highlight the pressing need for extending the study of these loci to in-vivo systems combined with extensive phenotyping. Our results support a less prominent biological role for many of these non-coding loci than has been proposed previously [83, 84].
Materials and Methods
In all analyses that we describe below, calculated P values were corrected for multiple testing using a Bonferroni correction .
Our analysis in D. melanogaster was conducted on the set of 1,115 long non-coding intergenic RNAs defined by Young et al.  using polyA+-selected transcriptome data from the ModEncode Project  having excluded four loci owing to their overlap with recently predicted small open reading frames . For comparison we also analysed a set of 4,662 human lncRNAs identified by Cabili et al.  from polyA+-selected libraries using conservative criteria, namely one isoform reconstructed in at least two tissues or by two assemblers .
Because mono-exonic lncRNAs models are not stranded, we limited our analysis to multi-exonic loci. Furthermore, in order to avoid the confounding effects arising from selection acting on protein-coding genes we focused our analysis on intergenic lncRNA loci, instead of intronic, antisense or lncRNAs that overlap untranslated regions of protein-coding genes.
We used the mouse lncRNAs annotated by Ensembl and by Belgard et al.  to identify positional equivalent lncRNAs between mouse and human. Using protein-coding genes with 1-to-1 orthologous relationships between human and mouse and flanking a lncRNA locus in both species, we defined as positional equivalents those lncRNAs that were found in the same transcriptional orientation and the same location relative to a protein-coding gene in both species. Furthermore, in order to take into account potential selection acting on the nearby protein-coding gene, we also identified a control set composed of lncRNAs flanking protein-coding genes with 1-to-1 orthologs but with different transcriptional orientations and/or positions relative to the protein coding gene. We identified 374 positional equivalents loci between human and mouse, and 802 control lncRNAs.
We collected 2,993 genes described as being involved in syndromes and genetic diseases from OMIM database [88, 89]. Using the FlyBase database , we collated 2,125 genes with lethal mutant phenotypes.
Polymorphism data for 162 D. melanogaster strains from Raleigh, North Carolina were downloaded from the Drosophila Genetic Reference Panel [39, 92, 93]. Sites covered by at least 10 reads and without base ambiguity in at least 150 strains were retained for further analysis. A total of 3,172,754 sites across the five major chromosomal elements were used for analysis. For the human dataset, we discarded SNPs within 10 bp of indel calls and chose a quality score threshold to give a 0.1% FDR. The allele frequencies for polymorphic sites were retrieved from the 1000 Genomes Project data. We collected 18,745,840 SNPs in 174 individuals of African origin (a highly polymorphic population) called by the 1000 Genomes Project Consortium [40, 94].
For both datasets, we polarised the alleles into ancestral or derived states using the pairwise alignments of D. melanogaster with D. simulans and D. yakuba, and of H. sapiens with the chimpanzee (Pan troglodytes) and macaque (Macaca mulatta) which are available from the UCSC genome database website . We used maximum parsimony to infer the ancestral state of each site, and ambiguous sites were removed from the final dataset. Using genome annotations, we collated sites found within exons and introns of protein-coding genes, lncRNA loci or intergenic sequences or ancestral repeats (transposable elements shared between human, mouse and rat) (Table ).
Evolutionary rates and sequence conservation
PhastCons scores  computed using the alignments of 11 Drosophila species, Anopheles gambiae, Tribolium castaneum and Apis mellifera (whose divergence spans approximately 300 Mya) were downloaded from the UCSC database . We computed the median phastCons scores for for each of 10 successive windows that each represents a 10% portion of lncRNA exon or intron sequence; exons or introns were further subdivided into 'first', 'middle', 'last' or 'unique' classes with respect to their genomic position. We also collected 1,000 nt of 5′ and 3′ flanking intergenic sequences for both lncRNAs and protein coding loci.
We computed, for each window, 95% confidence intervals using 10,000 bootstraps. As a control, we randomly selected intergenic sequences lying away (>1 kb) from any annotated gene whose size distribution matched that of the lncRNA exons or introns. One thousand such sets of control sequences were defined to permit confidence intervals to be calculated. For comparison this analysis was also performed on the set of protein-coding genes that flank lncRNA loci.
This procedure was repeated for human lncRNA loci and their neighbouring protein-coding genes using phastCons scores computed using the alignments of 46 vertebrate genomes from the UCSC database  (approximately 400My).
In order to assess the difference in nucleotide conservation between lncRNA exons and introns, we implemented a resampling analysis in which we randomly sampled a single site per feature (exon or intron) within a locus. In total, 1,000 resampling analyses were performed.
We estimated the conservation of the splice sites of both protein-coding and lncRNA loci in flies using the sequence alignments of 50 nucleotides upstream and downstream of the D. melanogaster splice sites with D. simulans, D. sechellia, D. yakuba and D. erecta. For 5′ and 3′ splice sites and the 20 adjacent intronic sites of protein coding genes and lncRNA loci we computed the information content using the Shannon-Weaver index.
As control, we randomly selected 'GT' and 'AG' dinucleotides within intergenic sequences flanking the lncRNA loci and applied the same procedure.
We used VariScan  to compute polymorphism indicators (πT , θW , Tajima's D). Genomic alignments with D. simulans and rhesus macaque for D. melanogaster and human, respectively, were used to compute the Jukes-Cantor corrected per site divergence (k). To avoid any potential bias arising from local variations in recombination rate, mutation rate, efficacy of selection or nucleotide composition, we limited our analysis to only those protein coding genes, small introns or ancestral repeats that are found in the neighbouring genomic regions of lncRNA loci (within 5 kb). Likewise in human we analysed lncRNA loci flanked by proximal (≤10 kb) ancestral repeats and their flanking protein-coding genes. Similar conclusions were reached from analyses with distance thresholds of 5 kb and 20 kb (Additional Files 5 and 6).
Similarly we compared the derived allele frequency of polymorphic sites within lncRNA exons or lncRNA introns to sites within small introns, non-degenerate sites and four-fold degenerate sites.
Because the putatively neutral sites we used are not interdigitated with our sites of interest (such as lncRNA exonic nucleotides), there remains the possibility that our indicators of purifying selection are artificially inflated . In order to take such biases into account, when considering N sites from each lncRNA locus associated with an intergenic flanking sequence (≥1,000 nt following the masking of conserved non-coding elements with nucleotide identity ≥90% over ≥20 nt), we randomly sampled this number N sites from this masked flanking sequence to be used as a neutral proxy. For the study of non-degenerate sites, we used four-fold degenerate sites within the same protein as a neutral proxy in human. However, because there is evidence for selection having acted on four-fold degenerate sites in Drosophila, we instead used small introns (≤86 nt) as our neutral proxy and limited our analysis to just those protein-coding genes which contain such small introns. This analysis permits the strength of selection acting on lncRNAs to be estimated while controlling for variations in the local mutation rate, as well as background selection associated with nearby functional elements including protein-coding genes and well conserved non-transcribed non-coding regulatory elements. We used this methodology to assess the degree of selective constraints acting on intergenic lncRNAs through a generalised McDonald-Kreitman test [98–100]. We compared the numbers of polymorphic over divergent sites within lncRNA exons and lncRNA introns to the numbers observed within sampled putatively neutral sites using a χ2 test with one degree of freedom.
For either D. melanogaster or human lncRNAs, we used the site frequency spectra of mutations occurring within the sampled putatively neutral sites to estimate the distribution of fitness effect of new deleterious mutations within lncRNAs (in terms of -Nes) using DFE-alpha [54, 57, 103]. Confidence interval values for the proportion of sites under the different Nes categories were estimated through 200 bootstraps per locus. This analysis should therefore also take into account the effects of background selection as for each locus a 'neutral' reference is drawn from the same region.
Comparisons between locus classes for the polymorphism estimators were performed using Kruskal-Wallis tests. The minor and derived allele frequencies distributions for each class were compared using Kolmogorov-Smirnov tests.
derived allele frequency
deleterious fitness effect
false discovery rate
long non-coding RNA
single nucleotide polymorphism.
We are extremely grateful to Dr. Peter Keightley for making DFE alpha available and for the use of his servers. We are grateful to Andrew Bassett, Ana Marques, Christoffer Nellåker and Chris Rands for their comments on early versions of this manuscript. WH and CPP are supported by the Medical Research Council (MRC). This work was also supported by an ERC Advanced Grant to CPP.
- Meader S, Ponting CP, Lunter G: Massive turnover of functional sequence in human and other mammalian genomes. Genome Res. 2010, 20: 1335-1343.PubMedPubMed CentralView ArticleGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15: 1034-1050.PubMedPubMed CentralView ArticleGoogle Scholar
- Halligan DL, Keightley PD: Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res. 2006, 16: 875-884.PubMedPubMed CentralView ArticleGoogle Scholar
- Asthana S, Noble WS, Kryukov G, Grant CE, Sunyaev S, Stamatoyannopoulos JA: Widely distributed noncoding purifying selection in the human genome. Proc Natl Acad Sci USA. 2007, 104: 12410-12415.PubMedPubMed CentralView ArticleGoogle Scholar
- Casillas S, Barbadilla A, Bergman CM: Purifying selection maintains highly conserved noncoding sequences in Drosophila. Mol Biol Evol. 2007, 24: 2222-2234.PubMedView ArticleGoogle Scholar
- Katzman S, Kern AD, Bejerano G, Fewell G, Fulton L, Wilson RK, Salama SR, Haussler D: Human genome ultraconserved elements are ultraselected. Science. 2007, 317: 915-PubMedView ArticleGoogle Scholar
- Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Altshuler D, Shendure J, Nickerson DA, Bamshad MJ, Akey JM: Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013, 493: 216-220.PubMedPubMed CentralView ArticleGoogle Scholar
- Carninci P, Yasuda J, Hayashizaki Y: Multifaceted mammalian transcriptome. Curr Opin Cell Biol. 2008, 20: 274-280.PubMedView ArticleGoogle Scholar
- modE N C O D E Consortium: Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 2010, 330: 1787-1797.View ArticleGoogle Scholar
- Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M: The transcriptional activity of human Chromosome 22. Genes Dev. 2003, 17: 529-540.PubMedPubMed CentralView ArticleGoogle Scholar
- Ponting CP, Oliver PL, Reik W: Evolution and functions of long noncoding RNAs. Cell. 2009, 136: 629-641.PubMedView ArticleGoogle Scholar
- Ponting CP, Belgard TG: Transcribed dark matter: meaning or myth?. Hum Mol Genet. 2010, 19: R162-168.PubMedPubMed CentralView ArticleGoogle Scholar
- Young RS, Marques AC, Tibbit C, Haerty W, Bassett AR, Liu JL, Ponting CP: Identification and properties of 1119 lincRNA loci in the Drosophila melanogaster genome. Genome Biol Evol. 2012, 4: 427-442.PubMedPubMed CentralView ArticleGoogle Scholar
- Brown CJ, Ballabio A, Rupert JL, Lafreniere RG, Grompe M, Tonlorenzi R, Willard HF: A gene from the region of the human × inactivation centre is expressed exclusively from the inactive × chromosome. Nature. 1991, 349: 38-44.PubMedView ArticleGoogle Scholar
- Franke A, Baker BS: The rox1 and rox2 RNAs are essential components of the compensasome, which mediates dosage compensation in Drosophila. Mol Cell. 1999, 4: 117-122.PubMedView ArticleGoogle Scholar
- Sleutels F, Zwart R, Barlow DP: The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature. 2002, 415: 810-813.PubMedView ArticleGoogle Scholar
- Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, Goodnough LH, Helms JA, Farnham PJ, Segal E, Chang HY: Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007, 129: 1311-1323.PubMedPubMed CentralView ArticleGoogle Scholar
- Young TL, Matsuda T, Cepko CL: The noncoding RNA taurine upregulated gene 1 is required for differentiation of the murine retina. Curr Biol. 2005, 15: 501-512.PubMedView ArticleGoogle Scholar
- Bernard D, Prasanth KV, Tripathi V, Colasse S, Nakamura T, Xuan Z, Zhang MQ, Sedel F, Jourdren L, Coulpier F, Triller A, Spector DL, Bessis A: A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. EMBO J. 2010, 29: 3082-3093.PubMedPubMed CentralView ArticleGoogle Scholar
- Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT, Freier SM, Bennett CF, Sharma A, Bubulya PA, Blencowe BJ, Prasanth SG, Prasanth KV: The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell. 2010, 39: 925-938.PubMedPubMed CentralView ArticleGoogle Scholar
- Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F, Zytnicki M, Notredame C, Huang Q, Guigo R, Shiekhattar R: Long noncoding RNAs with enhancer-like function in human cells. Cell. 2010, 143: 46-58.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang KC, Chang HY: Molecular mechanisms of long noncoding RNAs. Mol Cell. 2011, 43: 904-914.PubMedPubMed CentralView ArticleGoogle Scholar
- Guttman M, Rinn JL: Modular regulatory principles of large non-coding RNAs. Nature. 2012, 482: 339-346.PubMedPubMed CentralView ArticleGoogle Scholar
- Sheik Mohamed J, Gaughwin PM, Lim B, Robson P, Lipovich L: Conserved long noncoding RNAs transcriptionally regulated by Oct4 and Nanog modulate pluripotency in mouse embryonic stem cells. RNA. 2010, 16: 324-337.PubMedPubMed CentralView ArticleGoogle Scholar
- Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G, Young G, Lucas AB, Ach R, Bruhn L, Yang X, Amit I, Meissner A, Regev A, Rinn JL, Root DE, Lander ES: lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011, 477: 295-300.PubMedPubMed CentralView ArticleGoogle Scholar
- Huarte M, Rinn JL: Large non-coding RNAs: missing links in cancer?. Hum Mol Genet. 2010, 19: R152-R161.PubMedPubMed CentralView ArticleGoogle Scholar
- Wapinski O, Chang HY: Long noncoding RNAs and human disease. Trends Cell Biol. 2011, 21: 354-361.PubMedView ArticleGoogle Scholar
- Ponjavic J, Ponting CP, Lunter G: Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 2007, 17: 556-65.PubMedPubMed CentralView ArticleGoogle Scholar
- Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009, 458: 223-227.PubMedPubMed CentralView ArticleGoogle Scholar
- Marques AC, Ponting CP: Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness. Genome Biol. 2009, 10: R124-PubMedPubMed CentralView ArticleGoogle Scholar
- Chodroff RA, Goodstadt L, Sirey TM, Oliver PL, Davies KE, Green ED, Molnar Z, Ponting CP: Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes. Genome Biol. 2010, 11: R72-PubMedPubMed CentralView ArticleGoogle Scholar
- Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP: Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell. 2011, 147: 1537-1550.PubMedPubMed CentralView ArticleGoogle Scholar
- Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, Ponting CP, Odom DT, Marques AC: Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 2012, 8: e1002841-PubMedPubMed CentralView ArticleGoogle Scholar
- Schorderet P, Duboule D: Structural and functional differences in the long non-coding RNA hotair in mouse and human. PLoS Genet. 2011, 7: e1002071-PubMedPubMed CentralView ArticleGoogle Scholar
- Nielsen R: Molecular signatures of natural selection. Annu Rev Genet. 2005, 39: 197-218.PubMedView ArticleGoogle Scholar
- Ward LD, Kellis M: Evidence of abundant purifying selection in humans for recently acquired regulatory functions. Science. 2012, 337: 1675-1678.PubMedPubMed CentralView ArticleGoogle Scholar
- 1000 Genomes Project Consortium, Abecasis GR, auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491: 56-65.View ArticleGoogle Scholar
- Charlesworth B: The effects of deleterious mutations on evolution at linked sites. Genetics. 2012, 190: 5-22.PubMedPubMed CentralView ArticleGoogle Scholar
- Mackay TFC, Richards S, Stone EA, Barbadilla A, Ayroles JF, Zhu D, Casillas S, Han Y, Magwire MM, Cridland JM, Richardson MF, Anholt RRH, Barrón M, Bess C, Blankenburg KP, Carbone MA, Castellano D, Chaboub L, Duncan L, Harris Z, Javaid M, Jayaseelan JC, Jhangiani SN, Jordan KW, Lara F, Lawrence F, Lee SL, Librado P, Linheiro RS, Lyman RF, et al: The Drosophila melanogaster Genetic Reference Panel. Nature. 2012, 482: 173-178.PubMedPubMed CentralView ArticleGoogle Scholar
- 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature. 2010, 467: 1061-1073.View ArticleGoogle Scholar
- Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL: Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011, 25: 1915-1927.PubMedPubMed CentralView ArticleGoogle Scholar
- Yoo EJ, Cooke NE, Liebhaber SA: An RNA-independent linkage of non-coding transcription to long-range enhancer function. Mol Cell Biol. 2012, 32: 2020-2029.PubMedPubMed CentralView ArticleGoogle Scholar
- Latos PA, Pauler FM, Koerner MV, Senergin HB, Hudson QJ, Stocsits RR, Allhoff W, Stricker SH, Klement RM, Warczok KE, aumayr K, Pasierbek P, Barlow DP: Airn transcriptional overlap, but not its lncRNA products, induces imprinted Igf2r silencing. Science. 2012, 338: 1469-1472.PubMedView ArticleGoogle Scholar
- Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562.View ArticleGoogle Scholar
- Ponjavic J, Oliver PL, Lunter G, Ponting CP: Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain. PLoS Genet. 2009, 5: e1000617-PubMedPubMed CentralView ArticleGoogle Scholar
- Haddrill PR, Charlesworth B, Halligan DL, Andolfatto P: Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content. Genome Biol. 2005, 6: R67-PubMedPubMed CentralView ArticleGoogle Scholar
- Lunter G, Ponting CP, Hein J: Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol. 2006, 2: e5-PubMedPubMed CentralView ArticleGoogle Scholar
- Parsch J, Novozhilov S, Saminadin-Peter SS, Wong KM, Andolfatto P: On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila. Mol Biol Evol. 2010, 27: 1226-1234.PubMedPubMed CentralView ArticleGoogle Scholar
- Watterson GA: On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975, 7: 256-276.PubMedView ArticleGoogle Scholar
- Tajima F: Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983, 105: 437-460.PubMedPubMed CentralGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.PubMedPubMed CentralGoogle Scholar
- Jukes T, Cantor C: Mammalian protein metabolism III. 1969, New York NY: Academic PressGoogle Scholar
- Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, Excoffier L, Attar H, Antonarakis SE, Dermitzakis ET, Hirschhorn JN: Conserved noncoding sequences are selectively constrained and not mutation coldspots. Nat Genet. 2006, 38: 223-227.PubMedView ArticleGoogle Scholar
- Keightley PD, Eyre-Walker A: Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics. 2007, 177: 2251-61.PubMedPubMed CentralView ArticleGoogle Scholar
- Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, White TJ, Nielsen R, Clark AG, Bustamante CD: Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008, 4: e1000083-PubMedPubMed CentralView ArticleGoogle Scholar
- Williamson SH, Hernandez R, Fledel-Alon A, Zhu L, Nielsen R, Bustamante CD: Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci USA. 2005, 102: 7882-7887.PubMedPubMed CentralView ArticleGoogle Scholar
- Keightley PD, Eyre-Walker A: Estimating the rate of adaptive molecular evolution when the evolutionary divergence between species is small. J Mol Evol. 2012, 74: 61-68.PubMedView ArticleGoogle Scholar
- Torgerson DG, Boyko AR, Hernandez RD, Indap A, Hu X, White TJ, Sninsky JJ, Cargill M, Adams MD, Bustamante CD, Clark AG: Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. 2009, 5: e1000592-PubMedPubMed CentralView ArticleGoogle Scholar
- Kryukov GV, Schmidt S, Sunyaev S: Small fitness effect of mutations in highly conserved non-coding regions. Hum Mol Genet. 2005, 14: 2221-2229.PubMedView ArticleGoogle Scholar
- Chen CTL, Wang JC, Cohen BA: The strength of selection on ultraconserved elements in the human genome. Am J Hum Genet. 2007, 80: 692-704.PubMedPubMed CentralView ArticleGoogle Scholar
- Artieri CG, Haerty W, Singh RS: Ontogeny and phylogeny: molecular signatures of selection, constraint, and temporal pleiotropy in the development of Drosophila. BMC Biol. 2009, 7: 42-PubMedPubMed CentralView ArticleGoogle Scholar
- Charlesworth B: Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat Rev Genet. 2009, 10: 195-205.PubMedView ArticleGoogle Scholar
- Ohta T: Slightly deleterious mutant substitutions in evolution. Nature. 1973, 246: 96-98.PubMedView ArticleGoogle Scholar
- Ohta T, Gillespie JH: Development of neutral and nearly neutral theories. Theor Popul Biol. 1996, 49: 128-142.PubMedView ArticleGoogle Scholar
- Eyre-Walker A, Keightley PD: The distribution of fitness effects of new mutations. Nat Rev Genet. 2007, 8: 610-618.PubMedView ArticleGoogle Scholar
- Eyre-Walker A, Keightley PD, Smith NG, Gaffney D: Quantifying the slightly deleterious mutation model of molecular evolution. Mol Biol Evol. 2002, 19: 2142-2149.PubMedView ArticleGoogle Scholar
- Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM: Recent human effective population size estimated from linkage disequilibrium. Genome Res. 2007, 17: 520-526.PubMedPubMed CentralView ArticleGoogle Scholar
- Li H, Durbin R: Inference of human population history from individual whole-genome sequences. Nature. 2011, 475: 493-496.PubMedPubMed CentralView ArticleGoogle Scholar
- Keightley PD, Lercher MJ, Eyre-Walker A: Evidence for widespread degradation of gene control regions in hominid genomes. PLoS Biol. 2005, 3: e42-PubMedPubMed CentralView ArticleGoogle Scholar
- Pang KC, Frith MC, Mattick JS: Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet. 2006, 22: 1-5.PubMedView ArticleGoogle Scholar
- Blow MJ, McCulley DJ, Li Z, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M, Wright C, Chen F, Afzal V, Bristow J, Ren B, Black BL, Rubin EM, Visel A, Pennacchio LA: ChIP-Seq identification of weakly conserved heart enhancers. Nat Genet. 2010, 42: 806-810.PubMedPubMed CentralView ArticleGoogle Scholar
- Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigò R: The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure evolution, and expression. Genome Res. 2012, 22: 1775-1789.PubMedPubMed CentralView ArticleGoogle Scholar
- Kondrashov AS: Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over?. J Theor Biol. 1995, 175: 583-594.PubMedView ArticleGoogle Scholar
- Chamary JV, Parmley JL, Hurst LD: Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet. 2006, 7: 98-108.PubMedView ArticleGoogle Scholar
- Nakagawa S, Naganuma T, Shioi G, Hirose T: Paraspeckles are subpopulation-specific nuclear bodies that are not essential in mice. J Cell Biol. 2011, 193: 31-39.PubMedPubMed CentralView ArticleGoogle Scholar
- Eißmann M, Gutschner T, Hämmerle M, Günther S, Caudron-Herger M, Groß M, Schirmacher P, Rippe K, Braun T, Zörnig M, Diederichs S: Loss of the abundant nuclear non-coding RNA MALAT1 is compatible with life and development. RNA Biol. 2012, 9: 1076-1087.PubMedPubMed CentralView ArticleGoogle Scholar
- Nakagawa S, Ip JY, Shioi G, Tripathi V, Zong X, Hirose T, Prasanth KV: Malat1 is not an essential component of nuclear speckles in mice. RNA. 2012, 18: 1487-1499.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang B, Arun G, Mao YS, Lazar Z, Hung G, Bhattacharjee G, Xiao X, Booth CJ, Wu J, Zhang C, Spector DL: The lncRNA Malat1 is dispensable for mouse development but its transcription plays a cis-regulatory role in the adult. Cell Rep. 2012, 2: 111-123.PubMedPubMed CentralView ArticleGoogle Scholar
- Bond AM, Vangompel MJW, Sametsky EA, Clark MF, Savage JC, Disterhoft JF, Kohtz JD: Balanced gene regulation by an embryonic brain ncRNA is critical for adult hippocampal GABA circuitry. Nat Neurosci. 2009, 12: 1020-1027.PubMedPubMed CentralView ArticleGoogle Scholar
- Lewejohann L, Skryabin BV, Sachser N, Prehn C, Heiduschka P, Thanos S, Jordan U, Dell'Omo G, Vyssotski AL, Pleskacheva MG, Lipp HP, Tiedge H, Brosius J, Prior H: Role of a neuronal small non-messenger RNA: behavioural alterations in BC1 RNA-deleted mice. Behav Brain Res. 2004, 154: 273-289.PubMedView ArticleGoogle Scholar
- Zhong J, Chuang SC, Bianchi R, Zhao W, Lee H, Fenton AA, Wong RKS, Tiedge H: BC1 regulation of metabotropic glutamate receptor-mediated neuronal excitability. J Neurosci. 2009, 29: 9977-9986.PubMedPubMed CentralView ArticleGoogle Scholar
- Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, Proctor M, St Onge RP, Tyers M, Koller D, Altman RB, Davis RW, Nislow C, Giaever G: The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science. 2008, 320: 362-365.PubMedPubMed CentralView ArticleGoogle Scholar
- Mercer TR, Dinger ME, Mattick JS: Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009, 10: 155-159.PubMedView ArticleGoogle Scholar
- Dinger ME, Amaral PP, Mercer TR, Mattick JS: Pervasive transcription of the eukaryotic genome: functional indices and conceptual implications. Brief Funct Genomic Proteomic. 2009, 8: 407-423.PubMedView ArticleGoogle Scholar
- Bonferroni C: Teor ia statistica delle classi e calcolo delle probabilità. 1936, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali diGoogle Scholar
- Ladoukakis E, Pereira V, Magny EG, Eyre-Walker A, Couso JP: Hundreds of putatively functional small open reading frames in Drosophila. Genome Biol. 2011, 12: R118-PubMedPubMed CentralView ArticleGoogle Scholar
- Belgard TG, Marques AC, Oliver PL, Abaan HO, Sirey TM, Hoerder-Suabedissen A, García-Moreno F, Molnár Z, Margulies EH, Ponting CP: A transcriptomic atlas of mouse neocortical layers. Neuron. 2011, 71: 605-616.PubMedPubMed CentralView ArticleGoogle Scholar
- Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009, 37 (Database): D793-D796.PubMedPubMed CentralView ArticleGoogle Scholar
- The OMIM database. [http://omim.org]
- FlyBase database. [http://www.flybase.org]
- Ensembl database. [http://www.ensembl.org]
- Stone EA: Joint genotyping on the fly: Identifying variation among a sequenced panel of inbred lines. Genome Res. 2012, 22: 966-974.PubMedPubMed CentralView ArticleGoogle Scholar
- Drosophila Genetic Reference Panel. [http://www.hgsc.bcm.tmc.edu/project-species-i-Drosophila_genRefPanel.hgsc2013.05.21]
- 1000 Genomes Project Consortium. [http://www.1000genomes.org]
- UCSC database. [http://genome.ucsc.edu]
- Hutter S, Vilella AJ, Rozas J: Genome-wide DNA polymorphism analyses using VariScan. BMC Bioinformatics. 2006, 7: 409-PubMedPubMed CentralView ArticleGoogle Scholar
- Andolfatto P: Controlling type-I error of the McDonald-Kreitman test in genomewide scans for selection on noncoding DNA. Genetics. 2008, 180: 1767-1771.PubMedPubMed CentralView ArticleGoogle Scholar
- McDonald JH, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991, 351: 652-654.PubMedView ArticleGoogle Scholar
- Jenkins DL, Ortori CA, Brookfield JF: A test for adaptive change in DNA sequences controlling transcription. Proc Biol Sci. 1995, 261: 203-207.PubMedView ArticleGoogle Scholar
- Ludwig MZ, Kreitman M: Evolutionary dynamics of the enhancer region of even-skipped in Drosophila. Mol Biol Evol. 1995, 12: 1002-1011.PubMedGoogle Scholar
- Eyre-Walker A, Keightley PD: Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol Biol Evol. 2009, 26: 2097-2108.PubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004, 5: 113-PubMedPubMed CentralView ArticleGoogle Scholar
- DFE-alpha. [http://homepages.ed.ac.uk/eang33]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.