- Open Access
LncRNAs in polyploid cotton interspecific hybrids are derived from transposon neofunctionalization
© The Author(s). 2018
- Received: 26 July 2018
- Accepted: 23 October 2018
- Published: 12 November 2018
Interspecific hybridization and whole genome duplication are driving forces of genomic and organism diversification. But the effect of interspecific hybridization and whole genome duplication on the non-coding portion of the genome in particular remains largely unknown. In this study, we examine the profile of long non-coding RNAs (lncRNAs), comparing them with that of coding genes in allotetraploid cotton (Gossypium hirsutum), its putative diploid ancestors (G. arboreum; G. raimondii), and an F1 hybrid (G. arboreum × G. raimondii, AD).
We find that most lncRNAs (80%) that were allelic expressed in the allotetraploid genome. Moreover, the genome shock of hybridization reprograms the non-coding transcriptome in the F1 hybrid. Interestingly, the activated lncRNAs are predominantly transcribed from demethylated TE regions, especially from long interspersed nuclear elements (LINEs). The DNA methylation dynamics in the interspecies hybridization are predominantly associated with the drastic expression variation of lncRNAs. Similar trends of lncRNA bursting are also observed in the progress of polyploidization. Additionally, we find that a representative novel lncRNA XLOC_409583 activated after polyploidization from a LINE in the A subgenome of allotetraploid cotton was involved in control of cotton seedling height.
Our results reveal that the processes of hybridization and polyploidization enable the neofunctionalization of lncRNA transcripts, acting as important sources of increased plasticity for plants.
Interspecific hybridization and polyploidization are known as intrinsic powers behind genome evolution. Polyploidization, also known as whole genome duplication (WGD), is commonly observed in the evolution of angiosperm plants . Polyploidy, especially allopolyploidy, stabilizes the vigor traits created by hybridization. The genomic interactions in hybrids and polyploids trigger a rapid and extensive reprogrammed response, associated with dramatic changes in the epigenetic modifications involved with, but not limited to, the following: DNA methylation, siRNAs, transposable elements (TEs), and histone modification [2, 3]. The integrative results of these genome-wide modifications lead to expression changes in about 20–50% of mRNA, which is the proposed molecular basis for the vigor of hybridization and polyploidization [4, 5].
Small interfering RNAs (siRNAs), especially those derived from TEs, can suppress mobile element activity via RNA-directed DNA methylation (RdDM) [6–8]. In hybrids of Arabidopsis thaliana, rice, and maize, expression levels of siRNAs changed dramatically upon polyploidization [9–11]. DNA methylation changes also coincided with activation of TEs in analysis of intraspecific hybrids of A. thaliana, rice, and maize [12–16]. However, the impact of hybridization and polyploidization on the lncRNA of whole genomes remains largely unknown.
Long noncoding RNAs (lncRNAs) are typically transcribed from the intergenic regions of the genome, while some lncRNAs originate from the antisense strands of coding genes . In the last few years, lncRNAs have been widely identified in both animal and plant genomes [18–22]. In animal genomes, lncRNAs are associated with X chromosome inactivation , disease development , etc. Epigenetic modifications on lncRNA are reported to play critical roles on its expression and function. More than 1000 lncRNA genes are found to be hypo-methylated in cancer cell lines recurrently . In plants, lncRNAs are reported to play critical roles in multiple regulation functions, such as developmental regulation [26–28] and both biotic and abiotic stress responses [26, 29–31]. Although the many functions of lncRNA have gradually been elucidated, the exact origin of lncRNA is still obscure.
According to previous genome-wide investigations, lncRNA transcriptomes appeared unique to each species [32, 33]. For example, mRNA similarity between genomes of human and mouse is 92%, but the lncRNA similarity between them is as low as 35% . Less than 6% of zebrafish lincRNAs (long intergenic RNAs) have any detectable sequence similarity to human and mouse lincRNAs . In the comparison of 16 vertebrate species and the sea urchin, > 70% of lincRNAs cannot be traced to homologs in species that diverged > 50 million years ago . Similar trends have been observed in plant species. For example, less than 0.4% of predicted lncRNAs were reported to present in two different tomato species . These data suggest that a genome can generate a large amount of novel lncRNAs efficiently when a new species comes into being. The origin of these species-specific lncRNAs is still unknown.
The latest studies have reported that transposon elements (TEs) might be involved with lncRNA origin and diversification [38–40]. For example, Xist originated from a coding gene, Lnx3, with accumulated TEs in its exons . We previously found that an NAT (natural antisense transcript) originating from a locus on the coding gene GhMML3 is associated with TE insertion. This NAT caused the fuzzless seed mutant of N1 by suppressing GhMML3 expression . TEs are abundant in advanced organisms, especially in plants. For example, TEs comprise 80% of the maize genome and 65% of the cotton genome [43–46]. TEs can be classified as retrotransposons and DNA transposons, each with diverse patterns in sequence and activity . We still do not know which type of TE is related to lncRNA origin.
The behavior of lncRNAs during hybridization and polyploidization provides an important clue to the origin of lncRNA. Here we utilized a simplified model of cotton hybridization and polyploidization to study the origin of lncRNA. Cotton is not only a source of natural and renewable fiber for textiles, but also a fine model for heterosis studies. Regarding evolutionary lineage, upland cotton (Gossypium hirsutum, (AADD)1, Gh), an allotetraploid species, was formed after the hybridization and polyploidization of its two closest extant progenitors, G. herbaceum (A1 genome) or G. arboreum (A2 genome), and G. raimondii (D5 genome), about 1–1.5 million years ago (MYA) [44, 48]. The two diploid progenitors diverged 6–6.3 MYA [44, 48]. We crossed G. arboreum (Ga) accession from Shixiya with G. raimondii (Gr), generating an F1 hybrid. Using Ga, Gr, (Ga × Gr) F1, and Gh (accession Texas Marker-1 (TM-1)), we constructed a system to mimic the evolution of Gossypium spp. from diploid to allotetraploid. To determine the origin and behavior of lncRNAs in plant genome evolution, we used methods of interspecific comparative genomics, after identifying 1:1 lncRNA orthologs between species. Based on our integrative analysis of lncRNA sequencing, small RNA sequencing, ChIP-Seq, and DNA methylation data, our results suggested that LINEs arising from TEs play a crucial role in the origin of lncRNAs.
The Gossypium lineage-specific lncRNA transcriptome
For comparative genome analysis, the genome sequences of Arabidopsis thaliana, Oryza sativa, and Theobroma cacao were selected to test conservation of lncRNA. Less than 3.83% (n = 590) of lncRNAs showed homologous sites in A. thaliana and O. sativa, from which the cotton genome diverged 87 and 115 million years ago (MYA) , respectively. But the primary sequences of most lncRNAs (86%, n = 13,282) were common to all three cotton genomes (Fig. 1b, Additional file 2: Figure S2A), which diverged ~ 2–8 MYA [44, 48]. A small portion of lncRNAs (12.43%) had homologs in Theobroma cacao, in contrast to the much greater number of PCG homologs (69.25% homologs in T. cacao) (Additional file 2: Figure S2B). The above data confirmed that cotton lncRNAs are predominantly Gossypium lineage-specific.
To obtain a reliable cross-species comparison, the lncRNA loci were classified into syntenic and non-syntenic groups based on their chromosomal locations (Fig. 1c, d, details in method). We first defined the syntenic lincRNA loci, which represent one-to-one homology with lncRNA loci (Fig. 1d, Additional file 1: Table S4). Syntenic groups were then further categorized into three subgroups: (1) syntenic and transcribed (ST), (2) syntenic and allelic-transcribed (SA), and (3) syntenic to a PCG (Fig. 1d, e). NATs and intronic RNAs were excluded from this analysis because of their partial overlap with PCGs. The sequence similarity of ST lncRNA (mean 85.71%) was much lower than that of mRNA (mean 96.60%) (Additional file 2: Figure S3). The SA category constituted the majority (71.67–86.00%) of syntenic lncRNA loci in all four comparisons (Fig. 1e). Importantly, the SA lncRNAs comprised over 80% (1269 out of 1517) of syntenic homologous lncRNA loci (Fig. 1e) in the comparison between GhAT (a sub-genome of the allotetraploid (AADD)1 genome) and GhDT, based on data from identical genome and tissues. The above data further confirmed that expression of lncRNAs are predominantly species-specific.
LncRNAs are reprogramed in synthetic interspecies F1
To exclude the possibility that the lncRNA burst was triggered by inbreeding, two near-isogenic lines (NILs) from inbred lines of cotton, upland cotton (Gh, accession Zhong12), and Zhong12 GL (a dominant, glandless line produced by multiple generations of backcrossing with Zhong12) were selected as control group. These NILs share identical genetic backgrounds. Using the same pipeline we had constructed earlier, a total of 4615 lncRNAs from the control group were identified (Additional file 1: Table S2). In the comparison between Zhong12 and Zhong12 GL, only 1.71% (73 out of 4281) of lncRNAs were differentially expressed (p < 0.05). In contrast to the lncRNA expression pattern in the NILs, the in silico parents and F1 exhibited a total of 34.75% (1999 out of 5752) differentially expressed lncRNA loci in leaves and 50.00% (2594 out of 5184) in ovules. The global changes in the lncRNAs of F1 suggest that interspecies hybridization stimulated a reprogramming of transcription on the non-coding region of the genome.
The conserved lncRNAs are overlapped with TEs
To investigate the effective factors affecting lncRNA preservation in polyploidization, we anchored our analysis on the F1 genome. The shared lncRNAs in F1 and parents tended to have long transcripts and were overlapped with TEs (Fig. 2d, e). The proportions of F1-specific (F1S) and parent-specific (PS) lncRNAs overlapping with TEs were 63.49% and 60.41%, respectively, while in conserved lncRNAs, this proportion was as high as 84.47%. This phenomenon was also observed in intronic RNA and NATs (Fig. 2e). Considering all of these results, we hypothesized that TE was involved with lncRNA retention and burst in hybridization.
LncRNAs are constrained on LINE and Gypsy-overlapped loci
Previous comparative studies of human, mouse, and zebrafish genomes indicated that non-TE lncRNAs might suffer relatively high evolutionary constraint than TE-derived lncRNAs do . TE might contribute to the evolution of lncRNA in both the short (i.e., interspecies hybrid) and long term (i.e., polyploidization).
The association of TEs and lncRNA expression in the ST and SA groups were examined. The comparisons were conducted between the presence frequency of TEs in the ST and SA groups. The distribution of LINEs was skewed toward ST lncRNAs, while Gypsy was significantly enriched in the SA lncRNAs (Fisher’s exact test, typical p < 0.01) (Fig. 3b). The correlation coefficient of TE-overlapped lncRNA expression levels between the parents and F1 was calculated. Surprisingly, an even stronger correlation coefficient was observed for LINE-overlapped lncRNAs compared to Gypsy-overlapped lncRNAs, both in ovule (LINE r = 0.92; Gypsy r = 0.66) and leaf tissues (LINE r = 0.87; Gypsy r = 0.63) (Fig. 3c and Additional file 2: Figure S4). These association test results suggested that TEs in the categories of LINE and Gypsy represented distinct functional pattern in genome shock.
LncRNAs are transcribed from siRNA-depleted LINEs/TEs
Similar to findings in model plant genomes ), cotton siRNAs were abundantly enriched in TE bodies, but less in the upstream and downstream of TE bodies (Fig. 4c, Additional file 1: Figure S5). For the PCGs, sRNAs were enriched in the upstream and downstream regions of the gene body and less in the gene body (Fig. 4c, Additional file 1: Figure S5). Meanwhile, the siRNA distribution pattern on lncRNA loci was distinctive from that on both TEs and PCGs (Fig. 4c, Additional file 1: Figure S5). Compared to common TEs, lncRNA-associated TEs were covered by less siRNA (Fig. 4c, Additional file 1: Figure S5).
SiRNAs are known to suppress TE activity via siRNA-directed DNA methylation (RdDM) pathway in plant genomes. Accordingly, we predicted that the expression of TE-overlapped lncRNA could be affected by siRNA. Since LINE-overlapped lncRNAs were more stably transcribed compared to Gypsy-overlapped lncRNAs (Fig. 3c, Additional file 1: Figure S5), we hypothesized that siRNA distribution pattern on lncRNA-overlapped LINE and Gypsy might be different. To test this, we examined the distribution density of siRNA over the Gypsy and LINEs that overlapped with lncRNA regions respectively. As expected, the mapping densities of siRNAs in the transcribed regions of LINEs were much lower than those in Gypsy (Fig. 4c, Additional file 1: Figure S5). Twenty-four nucleotides and 21 nt siRNA were both enriched on lncRNA with similar pattern during the genomic shock (Additional file 1: Figures S6 and S7).
LncRNAs were primarily transcribed from demethylated LINEs/TEs
To confirm this observation, we examined the DMR patterns on differentially expressed lncRNAs by plotting DNA methylation distributions. As shown in Fig. 6c, the lncRNA-overlapped TEs were significantly less methylated in the upregulated lncRNAs in F1, which was true in all three DNA methylation contents (Fig. 6c). The CHH methylation on lnRNAs was constantly low in the F1, which was consistent with the hypo-methylation status in general. This trend suggested the RdDM might be active on lncRNA genes in F1 hybrid.
DNA methylation is negatively associated with lncRNA expression in F1 hybrid
Although genome-wide DMRs were identified in multiple hybridization tests, it is still inconclusive whether DNA methylation is associated with PCG expression changes in hybridization. We performed a correlation test for DNA methylation changes versus PCG and lncRNA expression changes in F1. There was no correlation between DNA CG methylation and the expression of PCGs in F1 (r = 0.03, p < 8.53 × 10−02) (Fig. 6d), but lncRNA was negatively correlated with DNA CG methylation (r = − 0.63, p < 2.2 × 10−16) (Fig. 6d). The correlation remained significant for CHG and CHH methylation (CHG r = − 0.74, p < 2.20 × 10−16; CHH for, r = − 0.71, p < 2.2 × 10−16). The representative example of XLOC_035525 activated in F1 clearly showed the difference in DNA methylation between the parent Ga and F1 (Fig. 6e). We therefore concluded that the DNA methylation level changes on lncRNA-overlapped TE regions were the major cause of lncRNA expression changes in F1. Specific demethylated TE regions contributed to the origin of novel lncRNA in the F1 genome.
TE-derived lncRNAs as a source of functional genes
To further investigate whether the non-coding transcripts stimulated by genome shock have potential functions, we selected 10 lncRNAs (Additional file 1: Table S9) from the ST and SA groups in Gh vs F1 for functional tests and comparison. One lncRNA among these candidates, XLOC_409583, was expressed from a demethylated TE locus. The primary sequence of XLOC_409583 was identified in both the DT and Gr (D) genomes, while the AT subgenome lacked an apparent orthologous sequence (Fig. 7c). In the DT subgenome, XLOC_409583 originated from a LINE locus (Fig. 7d). In contrast to F1 and its diploid ancestor Gr, the active expression of XLOC_409583 in the cultivated upland tetraploid cotton TM-1was associated with the demethylation of LINE. The active expression of XLOC_409583 was also detected in the wild upland cotton yucatanense, 4 land races (latifolium, punctatum, morrilli, palmeri), and 40 up-land cotton cultivars (Fig. 7d and Additional file 1: Table S8), indicating that XLOC_409583 transcription is stable after polyploidization.
To refine our understanding of the biological role of XLOC_409583, we performed virus-induced gene silencing (VIGS) tests in TM-1. The plants that underwent XLOC_409583 silencing showed increased height compared to the control group, indicating that the novel lncRNA XLOC_409583 played a role in plant development in the tetraploid cotton genome (n = 15 in each treatment, with two repetitions) (Fig. 7f, g). Discovery of the activation of XLOC_409583 by demethylation provides insight into the role of DNA demethylation in the emergence of novel lncRNA in hybrids and polyploids. Functional analysis of these novel lncRNAs will further uncover their biological significance in hybrids and polyploids .
RNA polymerase II is essential for the transcription of TE-overlapped lncRNA
To confirm that lncRNAs were transcribed by RNA polymerase II (Pol II), we used Pole II antibody to pull down the binding DNA fragments in diploid cotton (Ga and Gr), the F1, and the allotetraploid (Gh) cotton species (Additional file 1: Table S10). Then, using the model-based analysis for ChIP-Seq, we identified 1952–7576 high-confidence peaks (Additional file 1: Table S11). Pol II signals were enriched on both lncRNA and PCG in similar patterns (Fig. 8b, Additional file 2: Figure S8). Compared to the diploid parents, the binding efficiency of Pol II in F1 is not associated with transcription efficiency on either lncRNA loci or on PCGs (Pearson’s correlation test, p > 0.05). In addition, most Pol II-associated lncRNA transcripts contain TEs (Fig. 8c). These observations suggest that Pol II is the major RNA polymerase binding to lncRNA loci, especially on TEs (Fig. 8d). These results imply that, in addition to Pol IV and V, Pol II is also involved with TE transcription.
Interspecies hybrid is a model for the study of lncRNA evolution
Evolutionary conservation of lncRNA is poorly understood due to the lack of sufficiently close species with finely sequenced genomes for study . Taking advantage of the three published genomes of closely related cotton species [44–46], we identified that 83.97% of cotton lncRNAs were conserved in Gossypium spp. Furthermore, approximately 59.29 to 76.34% of PCGs were one-to-one syntenic in all four genomes. These highly syntenic genomes helped to identify the homologous lncRNA loci.
Using the collinear method, homologous lncRNA loci with low sequence similarities were identified at a high confidence level. Our research model applied to the F1 and allotetraploid genomes was designed to facilitate the examination of homologous lncRNA loci. Therefore, the genome specificity observed in this system provided solid evidence of the fast turnover of lncRNA. We found that only 10.86–26.15% of syntenic lncRNA loci were constantly expressed in multiple cotton species. These results further confirm the previous report that in animal genomes both sequence divergence and expression turnover contribute to the species specificity of lncRNA .
Rapid turnover of lncRNA in hybrid
Hybridization and polyploidization are both common and crucial in genome evolution. Genome-wide changes can be ascribed to variations in PCG expression and alteration of epigenetic modifications, such as DNA methylation, histone modification, and sRNA generation. Most PCGs in synthetic allopolyploid are expressed at mid-parent level . However, in this study, we found that lncRNA expression was changed dramatically in hybrid. Our data indicated that lncRNA was not substantially gained and lost during evolution, but was instead induced by the genomic shock of interspecies hybridization, provoking new species formation. Transcription of lncRNA underwent tremendous variation during genome shock. Given that lncRNAs participate in critical biological process, such as Xist silencing in animals  or miRNA target mimicry in plants , it is reasonable to assume that lncRNA reprograming in hybrids can affect genes regulated by non-coding RNAs. Therefore, we hypothesized that the rapid transcriptional turnover of lncRNAs might further affect the lineage-specific emergence or disappearance of specific traits.
The epigenetic modifications on TE affect the lncRNA origin
TEs have been reported to be involved in miRNA origin and evolution [66, 67]. TEs also contribute to alternative gene structures such as novel promoters, splice sites, or polyadenylation signals . Previous reports elucidated that TEs are major contributors to the origin of some lncRNA in vertebrates [38–40]. Many functional lncRNAs such as Xist , TUG1 , linc-ROR , PCAT-1 , and SLC7A2-IT1A  are overlapped with TEs. We also observed a strong correlation between TEs and lncRNAs along the evolutionary path from diploid to allotetraploid. In our simulation model of cotton evolution, lncRNAs tended to be retained with TEs, indicating the potential impact of TEs on lncRNA origin as well as heritability. We found that TEs exhibited biased distribution toward lncRNA loci rather than coding genes, and LINEs especially contributed disproportionally to lncRNAs in all cotton species.
As a mobile element, TEs are normally transcriptionally silent regions due to DNA methylation via RdDM. But TEs overlapping with lncRNA loci are transferred to a transcriptionally active status, implying a possible difference in local regulation or modification. The sRNA distribution pattern and DNA methylation levels of lncRNA loci in the F1 hybrid confirmed that these regions were activated. Therefore, we hypothesized that the lncRNA loci originated from select TEs, such as LINEs, with few suppressive modifications. Based on our analysis, the de novo methylation as well as reprogramming of DNA methylation in hybridization created novel lncRNAs arising from LINEs (Fig. 8). A latest study on the epigenetic landscape of cancer cells finds that the lncRNA genes are hypo-methylated . Some oncogenic lncRNA genes are under the diverse epigenetic modifications, such as CpG methylation. The de-methylated lcnRNA gene EPIC1 can promote the cell propagation in cancer . These reports suggested that the DNA methylation-directed lncRNA regulation is a general mechanism both in plant and animal genomes.
Interspecies hybrids of Gossypium arboreum (AA, 2n = 2x = 26) and G. raimondii (DD, 2n = 2x = 26) were generated by hand pollination. Three biological replicates of 0 DPA (days post anthesis) ovules and leaves from each of G. arboreum, G. raimondii, interspecies hybrid (G. arboreum × G. raimondii) F1, and G. hirsutum (AADD, 2n = 4x = 52) were collected from the greenhouse of Nanjing Agricultural University. All plants were under the same controlled growing conditions at 25 °C, 16/8 h day/night. Samples were frozen in liquid nitrogen immediately upon collection and stored at − 70 °C in preparation for RNA isolation.
LncRNA library construction and sequencing
Total RNA was isolated from the plant tissues using the Spectrum Plant Total RNA Kit (Sigma-Aldrich). After RNA isolation, ribosomal RNA was removed using the Epicentre Ribo-zero™ rRNA Removal Kit (Epicentre, USA). Next, sequencing libraries were generated from rRNA-depleted RNA using the NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina® (NEB, USA). Finally, strand-specific sequencing was performed with the Illumina HiSeq 2000 system (paired-end 125-bp reads).
Pipeline for lncRNA identification
All strand-specific RNA-seq reads were quality-trimmed and quality-filtered using fastx_toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Clean reads were then mapped separately to the corresponding references using TopHat (v.2.0.14) (N = 2, library type = fr-first strand). The genome sequences and annotation files for G. arboreum (v.2.0), G. raimondii, and G. hirsutum (v.1.1) were collected from the Cotton Genome Project (CGP: https://www.ncbi.nlm.nih.gov/genome/?term=Gossypium+arboreum), Phytozome (v.9.0) (http://www.phytozome.net), and the website (http://mascotton.njau.edu.cn/Data.htm), respectively. The F1 genome was generated by mixing the genome sequences of G. arboretum and G. raimondii. The transcripts from each dataset were assembled independently using the Cufflinks program (v.2.2.1) (-u, library type= fr-first strand) and Scripture program (--coverage =0.2) [53, 72]. All transcripts from each species were merged to generate final transcripts using Cuffmerge . Transcripts less than 200 nt were discarded. Using Cuffcompare, transcripts were given the class code “u,” “x,” or “i,” representing intergenic sequences, antisense sequences of known genes, and intronic sequences, respectively . The Coding Potential Calculator (CPC) software was used to calculate the coding potential of the remaining transcripts . All transcripts with CPC scores > 0 were discarded. The remaining transcripts were subjected to HMMER (v. 3.1b2) in order to exclude transcripts containing known protein domains (cutoff < 0.001) . The remaining transcripts were candidate lncRNAs. To reduce the isoform complexity of lncRNA, only the longest transcript of each loci was used for further analysis.
Identification of transposable element-derived lncRNA
We annotated transposable elements in the genome using RepeatMasker (v.4.0.6) (http://www.repeatmasker.org). RepeatModeler (v.1.0.8) (http://www.repeatmasker.org/RepeatModeler.html) was used to create three de novo transposable element (TE) libraries based on the G. raimondii, G. arboreum, and G. hirsutum reference genomes using default parameters. We then used RepeatMasker to identify repeat elements using both the de novo libraries and the MIPS repeat database (mipsREdat_9.3p) . The annotation from RepeatMasker was then parsed to exclude low complexity and non-TE repeats. Next, transposons were classified into Gypsy, Copia, LTR, LINE, DNA, unknown, and other categories. LncRNA-derived TEs were identified by determining overlapping genomic coordinates of TEs or TE fragments of at least 1 bp using the intersectBed program from BEDTools (v.2.17.0) . When multiple TE features were found for a single lncRNA, the longer TE feature was counted.
Identification of ST and SA lncRNAs
ST lncRNA was reconstructed based on sequence similarity and position. NATs and intronic RNAs that overlapped protein-coding gene (PCG) loci were removed. Then the PCGs and lincRNAs in different subgenomes were reciprocally aligned using BLASTN (v.2.2.27) (E value < 1 × 10−10, -max_target_seq 1) . ST lncRNA, in syntenic blocks between two subgenomes, were identified using the MCScanX (-b 2, -s 5) . To identify SA lncRNA, the lncRNA sequences of species A were aligned to the syntenic blocks of species B using BLASTN (E value < 1 × 10−10, -max_target_seq 1) .
HTSeq-count software (v.0.6.0)  was used to obtain read counts for each lncRNA or gene module (-s yes –m union). Read counts were normalized to RPKM (reads per kilobase per million reads). To assess the accumulated gene expression divergence between the parent lines and the hybrid F1, an in silico parental mix was constructed by combining clean reads of G. raimondii and G .arboreum at a ratio of 1:1. Spearman’s correlation between biological replicates was calculated using R from the RPKM values. Differentially expressed transcripts were calculated using the R package, edgeR .
Small RNA library construction and sequencing
Total RNA was extracted from the 0 DPA ovules and leaves of two biological replicates. Small RNAs were then separated from total RNA by polyacrylamide gel electrophoresis. Three micrograms of total RNA per sample was used as the input material for construction of the small RNA library. Sequencing libraries were generated using NEBNext® Multiplex Small RNA Library Prep Set for Illumina® (NEB, USA), following the manufacturer’s recommendations. The library preparations were sequenced on an Illumina HiSeq 2000 platform and 50-bp single-end reads were generated.
Processing of sRNA sequencing data
After sRNA sequencing, adapters and low-quality nucleotides were trimmed from the data. sRNA clean reads were then aligned with the F1 genome (a mixture of the Ga and Gr genomes) using Bowtie, with no mismatch (-m 50, -v 0) . Any aligned small RNA reads that mapped to more than 50 loci were removed. The remaining mapped reads were aligned with noncoding RNAs using Rfam release (http://rfam.sanger.ac.uk/) and the known miRNA database in miRBase release 19 (http://www.mirbase.org/) , in order to identity miRNA, snRNA, tRNA, and rRNA. miREvo  and mirdeep2 software  were integrated to predict novel miRNAs. All reads originating from miRNA, TAS genes, rRNA, tRNA, snRNA, and snoRNA were removed. The remaining 20–25-nt-long reads were selected as siRNA. The distribution of siRNA across different features was drawn using deeptools .
Analysis of MethylC-seq reads
MethylC-seq and differentially methylated regions (DMRs) were retrieved from a previous study in the supplemental information (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1229-8) . The methylation level across different features was calculated using an in-house perl and R script.
Virus-induced gene silencing technology
A 300-bp fragment of XLOC409583 was amplified (F primer, AATAAG TGTGAAATTGTCGGGC; R primer, ATTCATGGCGATAAAGTCGGA) and cloned to form a XbaI/BamHI-digested pTRV2 vector, creating a VIGS vector named pTRV2- XLOC409583 (F primer: ATTCTGTGAGTAAGGTTACCGAATTCGAAA GTCCTTCGCTACAAAT; R primer: AGACGCGTGAGCTCGGTACCGGATCC ACTATTGCCAATCGTCTTCA). The vectors pTRV1 and pTRV2-XLOC409583 were then transformed by the Agrobacterium strain GV3101 via electroporation (Bio-Rad, Hercules, CA, USA) . For the VIGS assay, the transformed Agrobacterium colonies were incubated overnight at 28 °C in an antibiotic selection medium containing 50 mg/L rifampicin and 50 mg/L kanamycin. Agrobacterium cells were centrifuged and resuspended in infiltration buffer (10 mM MgCl2, 10 mM MES, and 200 mM acetosyringone), adjusted to an OD600 = 0.5. Agrobacterium strains containing pTRV1 and pTRV2 vectors were mixed in a ratio of 1:1. Seedlings with mature cotyledons but without a visible true leaf (7 days after germination) were infiltrated by inserting the Agrobacterium suspension into the cotyledons via syringe. The plants were grown in pots at 25 °C in a growth chamber under a 16/8 h light/dark cycle with 60% humidity. For each treatment group, 32 individual plants were employed.
RNA extraction and qRT-PCR
RNA was extracted from leaf tissue and treated with a BioFlux kit. First-strand cDNA was generated using TransScript One-Step gDNA Removal and cDNA Synthesis SuperMix (TransGen Biotec Co., Ltd.), according to the manufacturer’s instructions. Quantitative RT-PCRs were performed with the primers F: CCTTGTCAGAGTCCTCTGGTAG; R: GAGTTGAATGGGCATTCTTG.
Chromatin immunoprecipitation and sequencing (ChIP-Seq)
Chromatin immunoprecipitation (ChIP) was performed as described and with several adaptations . One gram of leaves for each sample of AA, DD, AD, and AADD genomes was used in the assay. After plant material crosslinking, nuclei isolation, cell lysis, and chromatin sonication as described in the protocol, ChIP reaction was performed using Anti-RNA polymerase II antibody (ABcam, Anti-RNA polymerase II CTD repeat YSPTSPS antibody [8WG16] - ChIP Grade, ab817) and protein A+G magnetic beads (Millpore), referred to as the “ChIP” group. The control group for each sample was set up similarly with the experimental group using sonicated chromatin with protein A+G magnetic beads but without antibody, referred to as the “Mock” group, which served as the background of the ChIP reaction. The ChIP reaction was performed overnight at 4° with gentle rotation, followed by separation and washing of beads using magnetic separation device (Millpore Magna GrIP Rack). DNA purification was performed using a commercial spin column kit. To verify the DNA enrichment, ChIP-Seq libraries were constructed with the NEBNext ChIP-Seq Library Prep Master Mix Set for Illumina (NEB) using NEBNext Multiplex Oligos for Illumina (NEB). DNA libraries including the “ChIP” and “Mock” groups respectively for each sample were pair-end sequenced with 150 bp reads using an Illumina HiSeq2500.
Analysis of ChIP-Seq
All ChIP-Seq reads were quality-trimmed and quality-filtered using fastx_toolkit (http://hannonlab.csh.edu/fastx_toolkit/). Clean reads were then mapped separately to the corresponding references using Bowtie (v.2.0.14) with no mismatch . Peak calling analysis was used for model-based analysis for ChIP-seq (MACS) . Profile of ChIP-Seq in PCG and lncRNA was visualized using Deeptools with default parameter .
Scan lncRNA expression variation in upland cotton population
One wild upland cotton (accession yucatanense),4 land races (latifolium, punctatum, morrilli, palmeri) and 40 up-land cotton cultivars were download from SRA project SRX2326742  (Additional file 1: Table S7). Reads were quality-trimmed and quality-filtered using fastx_toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Clean reads were then mapped to the corresponding references of Gossypium hirsutum using TopHat (v.2.0.14) . HTSeq-count software (v.0.6.0)  was used to obtain read counts for each lncRNA or gene module (-s yes –m union), then read counts were normalized to RPKM. Expressed lncRNA were determined by applying a threshold of RPKM > 0.5.
We thank CAAS Cotton Research Institute for providing the G. raimondii cotton ovule and leaf tissue used in the present study. We thank Prof. Xiaoya Chen and Yingbo Mao from SIPPE for kindly sharing the data of G. hirsutum Zhong12 and Zhong12 GL. We thank Dr. Qingxin Song for the insightful suggestions and kindly sharing the DNA methylation analysis data. We thank Pengchuan Sun from North China University of Science and Technology for generous advices on data visualization.
This work was financially supported in part by grants from the National Key Research and Development Program (2016YFD0101006), the National Natural Science Foundation of China (NSFC, 31600989), the Ministry of Science and Technology (2016YFA0500800), the Natural Science Foundation of Jiangsu Province (BK20150653), and the JCIC-MCP project.
Availability of data and materials
The lncRNA sequences and genome coordinate files can be accessed from github repositories (https://github.com/epi-cotton/LncRNA-in-polyploid-cotton) . Sequences of ssRNA-seq and Chip-seq have been deposited in the NCBI Nucleotide and Sequence Read Archive (SRA) under the accession PRJNA373801 . Sequences of sRNA-seq have been deposited in the Sequence Read Archive (SRA) under the accession PRJNA375828 . BS-seq are publicly available under the accession SRP071640 . For the comparison with rRNA-depleted RNA-seq, 35 tissues of Gossypium hirsutum were downloaded from SRA project SRP044705 . To scan lncRNA expression variation in upland cotton population, one wild upland cotton (accession yucatanense), 4 land races (latifolium, punctatum, morrilli, palmeri), and 40 up-land cotton cultivars were download from SRA project SRX2326742 . DMRs of BS-seq were downloaded from the link (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1229-8).
XG and BZ conceptualized the research program. XG designed experiments and coordinated the project. TZ and SF collected the tissues. XT conducted the ChIP-Seq operation. TZ and XG analyzed all data and wrote the manuscript. HH prepared the Zhong12 and Zhong12 GL material and ssRNA seq data. BZ provided the (G. arboreum × G. raimondii) F1 leaf and ovule tissue for the study. TZ, XG, and LW visualized the data. WM, GS, SG, and YH conducted the experiments. All authors discussed the results and commented on the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Ethics approval was not needed for this study.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, et al. Ancestral polyploidy in seed plants and angiosperms. Nature. 2011;473:97–100.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen ZJ. Genomic and epigenetic insights into the molecular bases of heterosis. Nat Rev Genet. 2013;14:471–82.PubMedView ArticlePubMed CentralGoogle Scholar
- Song Q, Chen ZJ. Epigenetic and developmental regulation in plant polyploids. Curr Opin Plant Biol. 2015;24:101–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Li A, Liu D, Wu J, Zhao X, Hao M, Geng S, Yan J, Jiang X, Zhang L, Wu J, et al. mRNA and small RNA transcriptomes reveal insights into dynamic homoeolog regulation of allopolyploid heterosis in nascent hexaploid wheat. Plant Cell. 2014;26:1878–900.PubMedPubMed CentralView ArticleGoogle Scholar
- Yoo MJ, Szadkowski E, Wendel JF. Homoeolog expression bias and expression level dominance in allopolyploid cotton. Heredity (Edinb). 2013;110:171–80.View ArticleGoogle Scholar
- Tran RK, Zilberman D, de Bustos C, Ditt RF, Henikoff JG, Lindroth AM, Delrow J, Boyle T, Kwong S, Bryson TD, et al. Chromatin and siRNA pathways cooperate to maintain DNA methylation of small transposable elements in Arabidopsis. Genome Biol. 2005;6:R90.PubMedPubMed CentralView ArticleGoogle Scholar
- Zheng X, Zhu J, Kapoor A, Zhu JK. Role of Arabidopsis AGO6 in siRNA accumulation, DNA methylation and transcriptional gene silencing. EMBO J. 2007;26:1691–701.PubMedPubMed CentralView ArticleGoogle Scholar
- Havecker ER, Wallbridge LM, Hardcastle TJ, Bush MS, Kelly KA, Dunn RM, Schwach F, Doonan JH, Baulcombe DC. The Arabidopsis RNA-directed DNA methylation argonautes functionally diverge based on their expression and interaction with target loci. Plant Cell. 2010;22:321–34.PubMedPubMed CentralView ArticleGoogle Scholar
- Ha M, Lu J, Tian L, Ramachandran V, Kasschau KD, Chapman EJ, Carrington JC, Chen X, Wang XJ, Chen ZJ. Small RNAs serve as a genetic buffer against genomic shock in Arabidopsis interspecific hybrids and allopolyploids. Proc Natl Acad Sci U S A. 2009;106:17835–40.PubMedPubMed CentralView ArticleGoogle Scholar
- Barber WT, Zhang W, Win H, Varala KK, Dorweiler JE, Hudson ME, Moose SP. Repeat associated small RNAs vary among parents and following hybridization in maize. Proc Natl Acad Sci U S A. 2012;109:10444–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Groszmann M, Greaves IK, Albertyn ZI, Scofield GN, Peacock WJ, Dennis ES. Changes in 24-nt siRNA levels in Arabidopsis hybrids suggest an epigenetic contribution to hybrid vigor. Proc Natl Acad Sci U S A. 2011;108:2617–22.PubMedPubMed CentralView ArticleGoogle Scholar
- Madlung A, Masuelli RW, Watson B, Reynolds SH, Davison J, Comai L. Remodeling of DNA methylation and phenotypic and transcriptional changes in synthetic Arabidopsis allotetraploids. Plant Physiol. 2002;129:733–46.PubMedPubMed CentralView ArticleGoogle Scholar
- He G, Chen B, Wang X, Li X, Li J, He H, Yang M, Lu L, Qi Y, Wang X, Deng XW. Conservation and divergence of transcriptomic and epigenomic variation in maize hybrids. Genome Biol. 2013;14:R57.PubMedPubMed CentralView ArticleGoogle Scholar
- Chodavarapu RK, Feng S, Ding B, Simon SA, Lopez D, Jia Y, Wang GL, Meyers BC, Jacobsen SE, Pellegrini M. Transcriptome and methylome interactions in rice hybrids. Proc Natl Acad Sci U S A. 2012;109:12040–5.PubMedPubMed CentralView ArticleGoogle Scholar
- Rigal M, Becker C, Pelissier T, Pogorelcnik R, Devos J, Ikeda Y, Weigel D, Mathieu O. Epigenome confrontation triggers immediate reprogramming of DNA methylation and transposon silencing in Arabidopsis thaliana F1 epihybrids. Proc Natl Acad Sci U S A. 2016;113:E2083–92.PubMedPubMed CentralView ArticleGoogle Scholar
- Ng DW, Miller M, Yu HH, Huang TY, Kim ED, Lu J, Xie Q, McClung CR, Chen ZJ. A role for CHH methylation in the parent-of-origin effect on altered circadian rhythms and biomass heterosis in Arabidopsis intraspecific hybrids. Plant Cell. 2014;26:2430–40.PubMedPubMed CentralView ArticleGoogle Scholar
- Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–41.PubMedView ArticlePubMed CentralGoogle Scholar
- Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, Barrette TR, Prensner JR, Evans JR, Zhao S, et al. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet. 2015;47:199–208.PubMedPubMed CentralView ArticleGoogle Scholar
- Pauli A, Valen E, Lin MF, Garber M, Vastenhouw NL, Levin JZ, Fan L, Sandelin A, Rinn JL, Regev A, Schier AF. Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res. 2012;22:577–91.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang YC, Liao JY, Li ZY, Yu Y, Zhang JP, Li QF, Qu LH, Shu WS, Chen YQ. Genome-wide screening and functional analysis identify a large number of long noncoding RNAs involved in the sexual reproduction of rice. Genome Biol. 2014;15:512.PubMedPubMed CentralView ArticleGoogle Scholar
- Li L, Eichten SR, Shimizu R, Petsch K, Yeh CT, Wu W, Chettoor AM, Givan SA, Cole RA, Fowler JE, et al. Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol. 2014;15:R40.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu J, Jung C, Xu J, Wang H, Deng S, Bernad L, Arenas-Huertero C, Chua NH. Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in Arabidopsis. Plant Cell. 2012;24:4333–45.PubMedPubMed CentralView ArticleGoogle Scholar
- Duret L, Chureau C, Samain S, Weissenbach J, Avner P. The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science. 2006;312:1653–5.PubMedView ArticlePubMed CentralGoogle Scholar
- Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC, Laxman B, Asangani IA, Grasso CS, Kominsky HD, et al. Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol. 2011;29:742–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang ZH, Yang B, Zhang M, Guo WW, Wu ZY, Wang Y, Jia L, Li S, Xie W, Yang D, Network CGAR. lncRNA epigenetic landscape analysis identifies EPIC1 as an oncogenic lncRNA that interacts with MYC and promotes cell-cycle progression in cancer. Cancer Cell. 2018;33:706–720.e9.PubMedView ArticlePubMed CentralGoogle Scholar
- Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI, Rubio-Somoza I, Leyva A, Weigel D, Garcia JA, Paz-Ares J. Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet. 2007;39:1033–7.PubMedView ArticleGoogle Scholar
- Wang Y, Fan X, Lin F, He G, Terzaghi W, Zhu D, Deng XW. Arabidopsis noncoding RNA mediates control of photomorphogenesis by red light. Proc Natl Acad Sci U S A. 2014;111:10359–64.PubMedPubMed CentralView ArticleGoogle Scholar
- Ding J, Lu Q, Ouyang Y, Mao H, Zhang P, Yao J, Xu C, Li X, Xiao J, Zhang Q. A long noncoding RNA regulates photoperiod-sensitive male sterility, an essential component of hybrid rice. Proc Natl Acad Sci U S A. 2012;109:2654–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Nadal-Ribelles M, Sole C, Xu Z, Steinmetz LM, de Nadal E, Posas F. Control of Cdc28 CDK1 by a stress-induced lncRNA. Mol Cell. 2014;53:549–61.PubMedPubMed CentralView ArticleGoogle Scholar
- Mach J. The long-noncoding RNA ELENA1 functions in plant immunity. Plant Cell. 2017;29:916.PubMedPubMed CentralView ArticleGoogle Scholar
- Yatusevich R, Fedak H, Ciesielski A, Krzyczmonik K, Kulik A, Dobrowolska G, Swiezewski S. Antisense transcription represses Arabidopsis seed dormancy QTL DOG1 to regulate drought tolerance. EMBO Rep. 2017;18:2186–96.PubMedView ArticlePubMed CentralGoogle Scholar
- Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, Ponting CP, Odom DT, Marques AC. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 2012;8:e1002841.PubMedPubMed CentralView ArticleGoogle Scholar
- Ulitsky I. Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nat Rev Genet. 2016;17:601–14.PubMedView ArticlePubMed CentralGoogle Scholar
- Washietl S, Kellis M, Garber M. Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res. 2014;24:616–28.PubMedPubMed CentralView ArticleGoogle Scholar
- Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP. Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell. 2011;147:1537–50.PubMedPubMed CentralView ArticleGoogle Scholar
- Hezroni H, Koppstein D, Schwartz MG, Avrutin A, Bartel DP, Ulitsky I. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 2015;11:1110–22.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang X, Ai G, Zhang C, Cui L, Wang J, Li H, Zhang J, Ye Z. Expression and diversification analysis reveals transposable elements play important roles in the origin of Lycopersicon-specific lncRNAs in tomato. New Phytol. 2016;209:1442–55.PubMedView ArticlePubMed CentralGoogle Scholar
- Davis MP, Carrieri C, Saini HK, van Dongen S, Leonardi T, Bussotti G, Monahan JM, Auchynnikava T, Bitetti A, Rappsilber J, et al. Transposon-driven transcription is a conserved feature of vertebrate spermatogenesis and transcript evolution. EMBO Rep. 2017;18:1231–47.PubMedPubMed CentralView ArticleGoogle Scholar
- Kelley D, Rinn J. Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol. 2012;13:R107.PubMedPubMed CentralView ArticleGoogle Scholar
- Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, Yandell M, Feschotte C. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 2013;9:e1003470.PubMedPubMed CentralView ArticleGoogle Scholar
- Elisaphenko EA, Kolesnikov NN, Shevchenko AI, Rogozin IB, Nesterova TB, Brockdorff N, Zakian SM. A dual origin of the Xist gene from a protein-coding gene and a set of transposable elements. PLoS One. 2008;3:e2521.PubMedPubMed CentralView ArticleGoogle Scholar
- Wan Q, Guan X, Yang N, Wu H, Pan M, Liu B, Fang L, Yang S, Hu Y, Ye W, et al. Small interfering RNAs from bidirectional transcripts of GhMML3_A12 regulate cotton fiber development. New Phytol. 2016;210:1298–310.PubMedView ArticlePubMed CentralGoogle Scholar
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–5.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang T, Hu Y, Jiang W, Fang L, Guan X, Chen J, Zhang J, Saski CA, Scheffler BE, Stelly DM, et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol. 2015;33:531–7. https://doi.org/10.1038/nbt.3207.View ArticlePubMedPubMed CentralGoogle Scholar
- Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, Llewellyn D, Showmaker KC, Shu S, Udall J, et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012;492:423–7.PubMedView ArticlePubMed CentralGoogle Scholar
- Li F, Fan G, Wang K, Sun F, Yuan Y, Song G, Li Q, Ma Z, Lu C, Zou C, et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat Genet. 2014;46:567–72.PubMedView ArticlePubMed CentralGoogle Scholar
- Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82.PubMedPubMed CentralView ArticleGoogle Scholar
- Wendel JF, Cronn RC. Polyploidy and the evolutionary history of cotton. Adv Agron. 2003;78:139–86.View ArticleGoogle Scholar
- Zhao T, Wang L, Li S, Xu M, Guan X, Zhou B. Characterization of conserved circular RNA in polyploid Gossypium species and their ancestors. FEBS Lett. 2017;591:3660–9.PubMedView ArticlePubMed CentralGoogle Scholar
- Wang M, Yuan D, Tu L, Gao W, He Y, Hu H, Wang P, Liu N, Lindsey K, Zhang X. Long noncoding RNAs and their proposed functions in fibre development of cotton (Gossypium spp.). New Phytol. 2015;207:1181–97.PubMedView ArticleGoogle Scholar
- Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: a resource for timelines, Timetrees, and divergence times. Mol Biol Evol. 2017;34:1812–9.PubMedView ArticleGoogle Scholar
- Yoo MJ, Wendel JF. Comparative evolutionary and developmental dynamics of the cotton (Gossypium hirsutum) fiber transcriptome. PLoS Genet. 2014;10:e1004073.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–78.PubMedPubMed CentralView ArticleGoogle Scholar
- Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.PubMedPubMed CentralView ArticleGoogle Scholar
- Martienssen RA, Colot V. DNA methylation and epigenetic inheritance in plants and filamentous fungi. Science. 2001;293:1070–4.PubMedView ArticlePubMed CentralGoogle Scholar
- Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16:6–21.PubMedView ArticleGoogle Scholar
- Chan SW, Henderson IR, Jacobsen SE. Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat Rev Genet. 2005;6:351–60.PubMedView ArticleGoogle Scholar
- Law JA, Jacobsen SE. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet. 2010;11:204–20.PubMedPubMed CentralView ArticleGoogle Scholar
- Lu J, Zhang C, Baulcombe DC, Chen ZJ: Maternal siRNAs as regulators of parental genome imbalance and gene expression in endosperm of Arabidopsis seeds. Proc Natl Acad Sci U S A. 2012;109:5529–34.PubMedPubMed CentralView ArticleGoogle Scholar
- Song Q, Zhang T, Stelly DM, Chen ZJ. Epigenomic and functional analyses reveal roles of epialleles in the loss of photoperiod sensitivity during domestication of allotetraploid cottons. Genome Biol. 2017;18:99. https://doi.org/10.1186/s13059-017-1229-8; DMRs between F1 hybrid and the parents (Gr/Ga). https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1229-8.
- Kim KD, El Baidouri M, Abernathy B, Iwata-Otsubo A, Chavarro C, Gonzales M, Libault M, Grimwood J, Jackson SA. A comparative epigenomic analysis of polyploidy-derived genes in soybean and common bean. Plant Physiol. 2015;168:1433–47.PubMedPubMed CentralView ArticleGoogle Scholar
- Shen H, He H, Li J, Chen W, Wang X, Guo L, Peng Z, He G, Zhong S, Qi Y, et al. Genome-wide analysis of DNA methylation and gene expression changes in two Arabidopsis ecotypes and their reciprocal hybrids. Plant Cell. 2012;24:875–92.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang M, Tu L, Lin M, Lin Z, Wang P, Yang Q, Ye Z, Shen C, Li J, Zhang L, et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet. 2017;49:579–87. https://doi.org/10.1038/ng.3807.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang H, Niu QW, Wu HW, Liu J, Ye J, Yu N, Chua NH. Analysis of non-coding transcriptome in rice and maize uncovers roles of conserved lncRNAs associated with agriculture traits. Plant J. 2015;84:404–16.PubMedView ArticlePubMed CentralGoogle Scholar
- Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, Lawrence J, Willard HF. The human XIST gene - analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell. 1992;71:527–42.PubMedView ArticlePubMed CentralGoogle Scholar
- Li Y, Li C, Xia J, Jin Y. Domestication of transposable elements into MicroRNA genes in plants. PLoS One. 2011;6:e19212.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhou ZK, Wang Z, Li WY, Fang C, Shen YT, Li CC, Wu YS, Tian ZX. Comprehensive analyses of microRNA gene evolution in paleopolyploid soybean genome. Plant J. 2013;76:332–44.PubMedPubMed CentralGoogle Scholar
- Cowley M, Oakey RJ. Transposable elements re-wire and fine-tune the transcriptome. PLoS Genet. 2013;9:e1003234.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang L, Lin C, Liu W, Zhang J, Ohgi KA, Grinstein JD, Dorrestein PC, Rosenfeld MG. ncRNA- and Pc2 methylation-dependent gene relocation between nuclear structures mediates gene activation programs. Cell. 2011;147:773–88.PubMedPubMed CentralView ArticleGoogle Scholar
- Loewer S, Cabili MN, Guttman M, Loh YH, Thomas K, Park IH, Garber M, Curran M, Onder T, Agarwal S, et al. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat Genet. 2010;42:1113–7.PubMedPubMed CentralView ArticleGoogle Scholar
- Cartault F, Munier P, Benko E, Desguerre I, Hanein S, Boddaert N, Bandiera S, Vellayoudom J, Krejbich-Trotot P, Bintner M, et al. Mutation in a primate-conserved retrotransposon reveals a noncoding RNA as a mediator of infantile encephalopathy. Proc Natl Acad Sci U S A. 2012;109:4980–5.PubMedPubMed CentralView ArticleGoogle Scholar
- Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol. 2010;28:503–10.PubMedPubMed CentralView ArticleGoogle Scholar
- Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007;35:W345–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29–37.PubMedPubMed CentralView ArticleGoogle Scholar
- Nussbaumer T, Martis MM, Roessner SK, Pfeifer M, Bader KC, Sharma S, Gundlach H, Spannagl M. MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 2013;41:D1144–51.PubMedView ArticlePubMed CentralGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49.PubMedPubMed CentralView ArticleGoogle Scholar
- Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9.View ArticleGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.PubMedView ArticlePubMed CentralGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.PubMedPubMed CentralView ArticleGoogle Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–4.PubMedView ArticlePubMed CentralGoogle Scholar
- Wen M, Shen Y, Shi S, Tang T. miREvo: an integrative microRNA evolutionary analysis platform for next-generation sequencing experiments. BMC Bioinformatics. 2012;13:140.PubMedPubMed CentralView ArticleGoogle Scholar
- Friedländer MR, Mackowiak SD, Li N, Chen W, Rajewsky N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012;40:37–52.PubMedView ArticlePubMed CentralGoogle Scholar
- Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42:W187–91.PubMedPubMed CentralView ArticleGoogle Scholar
- Gao X, Wheeler T, Li Z, Kenerley CM, He P, Shan L. Silencing GhNDR1 and GhMKK2 compromises cotton resistance to Verticillium wilt. Plant J. 2011;66:293–305.PubMedPubMed CentralView ArticleGoogle Scholar
- Haring M, Offermann S, Danker T, Horst I, Peterhansel C, Stam M. Chromatin immunoprecipitation: optimization, quantitative analysis and data normalization. Plant Methods. 2007;3:11.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9:R137.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhao T, Tao XY, Feng SL, Wang LY, Hong H, Ma W, Shang GD, Guo SS, He YY, Zhou BL, Guan XY. The lncRNA sequences and genome coordinate files. LncRNA-in-polyploid-cotton. http://github.com/epi-cotton/LncRNA-in-polyploid-cotton. Accessed 9 Oct 2018.
- Zhao T, Tao XY, Feng SL, Wang LY, Hong H, Ma W, Shang GD, Guo SS, He YY, Zhou BL, Guan XY. Gossypium arboreum, Gossypium raimondii, Gossypium hirsutum and Gossypium arboreum x Gossypium raimondii F1 Raw sequence reads. Sequence Read Archive. 2018. https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA373801. Accessed 9 Oct 2018.
- Zhao T, Tao XY, Feng SL, Wang LY, Hong H, Ma W, Shang GD, Guo SS, He YY, Zhou BL, Guan XY. Small RNA sequencing of Gossypium arboreum x Gossypium raimondii F1. Sequence Read Archive. 2018. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA375828/. Accessed 9 Oct 2018.