Transcript and protein expression decoupling reveals RNA binding proteins and miRNAs as potential modulators of human aging

Background In studies of development and aging, the expression of many genes has been shown to undergo drastic changes at mRNA and protein levels. The connection between mRNA and protein expression level changes, as well as the role of posttranscriptional regulation in controlling expression level changes in postnatal development and aging, remains largely unexplored. Results Here, we survey mRNA and protein expression changes in the prefrontal cortex of humans and rhesus macaques over developmental and aging intervals of both species’ lifespans. We find substantial decoupling of mRNA and protein expression levels in aging, but not in development. Genes showing increased mRNA/protein disparity in primate brain aging form expression patterns conserved between humans and macaques and are enriched in specific functions involving mammalian target of rapamycin (mTOR) signaling, mitochondrial function and neurodegeneration. Mechanistically, aging-dependent mRNA/protein expression decoupling could be linked to a specific set of RNA binding proteins and, to a lesser extent, to specific microRNAs. Conclusions Increased decoupling of mRNA and protein expression profiles observed in human and macaque brain aging results in specific co-expression profiles composed of genes with shared functions and shared regulatory signals linked to specific posttranscriptional regulators. Genes targeted and predicted to be targeted by the aging-dependent posttranscriptional regulation are associated with biological processes known to play important roles in aging and lifespan extension. These results indicate the potential importance of posttranscriptional regulation in modulating aging-dependent changes in humans and other species. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0608-2) contains supplementary material, which is available to authorized users.


Supplemental Figures
. Effect of age and ethnicity on expression of human datasets.
Cumulative frequency curves showing the proportions of mRNA and protein expression variance explained by age (solid lines) and ethnicity (dashed lines) across genes expressed in the human mRNA (green) and protein (blue) datasets. The proportions were calculated using polynomial regression models. The grey dotted lines show median proportions of the variance explained by the factors: 3% of mRNA and 6% of protein expression variation -by ethnicity; and 64% of mRNA and 61% of protein expression variation -by age.  Distribution of Spearman's rank correlation coefficients between human and macaque time-series data, calculated for 1,609 orthologous genes of both mRNAs (B) and proteins (D), out of 1,963 genes showing significant expression changes with age in humans and reliably detected in macaques. Shaded grey curves show the chance distribution generated by 1,000 permutations of ages. The median values of both simulations of mRNA and protein are 0.0 (shown by a grey dashed line), and are significantly less positively correlated compared to real data (shaded colored part; p < 0.001, Wilcoxon test, FDR < 0.001, permutation). (A) Bar plot shows the numbers of significantly positively/negatively correlated genes identified in the human PFC time-series data in development (light color) and aging (dark color) intervals. The y-axis on the right shows the ratio of gene numbers compared to the total amount of 1,963. Pvalues of chi-square tests are marked for both comparisons.
(B-C) The chance distribution of cumulative frequency of Spearman's rank correlation coefficients based on mRNA (B) and protein (C) expression in developmental (light grey curves) and aging (dark grey curves) intervals, of human time-series data. Distribution of curves were determined by subsampling 1,000 times using an equalized distribution of coefficient of variation between development and aging intervals. The dashed lines show median values of the curve distributions, the p-values show significance of the difference between medians estimated by Wilcoxon test.
(D-E) The chance distribution of cumulative frequency of Spearman's rank correlation coefficients based on mRNA and protein expression changes in developmental (light grey curves) and aging (dark grey curves) intervals, of human (D) and macaque (E) time-series data. Distribution of curves were made by 1,000 times of subsampling using equalized distribution between development and aging intervals, of mRNA expression, protein expression, mRNA expression change amplitudes, and protein expression change amplitudes. The dashed lines show median values of the curve distributions, the p-values show significance of the difference between medians estimated by Wilcoxon test.
(F-G) The cumulative frequency of Spearman's rank correlation coefficients based on mRNA and protein expression changes in developmental (light grey curves) and aging (dark grey curves) intervals respectively. The dashed lines show median values of the curves, the p-values show significance of the difference between medians estimated by Wilcoxon tests. The histograms at the bottom of the panels show distributions of Spearman's rank correlation coefficients in developmental (light grey) and aging (dark grey) intervals composing the curves. The results are shown for human mRNA with macaque protein (F), and vice versa (G). The y-axis shows numbers of genes used in each comparison. Two-dimensional density plot showing distribution of mRNA-protein Spearman's rank correlation coefficients measured during developmental (x-axis) and aging (y-axis) intervals of macaque timeseries data (A), of another human RNA-seq time-series dataset (B), of human mRNA with macaque protein (D) and vice versa (F). The grey dashed lines show the correlation coefficient cut-offs used to define concordant and discordant gene groups (p < 0.05, Spearman's rank correlation; FDR < 0.05, permutations).
The overlap of concordant and discordant gene groups between human time-series and another human RNA-seq time-series dataset (C), of human mRNA with macaque protein (E) and vice versa (G). The arrows show numbers of overlap found in actual data; the distributions show chance overlap estimated by 1,000 permutations of gene labels. The dashed line indicates 95% quantile of the distribution. Figure S6. Relationship of age-dependent mRNA and protein level changes measured in the same set of 10 human tissue samples.
(A) The cumulative frequency of Spearman's rank correlation coefficients based on mRNA and protein expression changes in developmental (light grey curves) and aging (dark grey curves) intervals respectively. The dashed lines show median values of the curves, the p-values show significance of the difference between medians estimated by Wilcoxon test. The histograms at the bottom of the panels show distributions of Spearman's rank correlation coefficients in developmental (light grey) and aging (dark grey) intervals composing the curves. The y-axis shows numbers of genes used in each comparison.
(B) Two-dimensional density plot showing distribution of mRNA-protein Spearman's rank correlation coefficients measured during developmental (x-axis) and aging (y-axis) intervals in human PFC, including only ten samples matched between mRNA dataset 1 and the protein dataset. The grey dashed lines show the correlation coefficient cut-offs used to define concordant and discordant gene groups (p < 0.05, Spearman's rank correlation; FDR < 0.05, permutations).
(C) The overlap of concordant and discordant gene groups between human full list and human time-series including only ten samples matched between mRNA dataset 1 and protein dataset. The arrows show numbers of overlap found in actual data; the distributions show chance overlap estimated by 1,000 permutations of gene labels. The dashed line indicates 95% quantile of the distribution.   Enrichment of RBP binding sites within concordant genes in each of the four main expression patterns, compared to discordant genes from the same pattern. The x-axis shows the enrichment fold-change based on the binding site number, the y-axis shows significance of binding site density difference. Each circle represents one RBP. The circle radius shows the proportion of concordant genes targeted by the RBP within the group. Colors indicate RBPs showing significant enrichment (red), no difference (grey) or significant depletion (blue) of binding sites within concordant genes compared to discordant ones.  (A-B) Difference between the distribution of predicted miRNA-target correlations and the chance background. The curves show predicted miRNA-target distribution of Spearman's rank correlation coefficients measured based on expression profiles of age-dependent miRNA and protein expression of their predicted target transcripts in aging interval. miRNA target prediction is made by mirDB3.0 (A) and COMETA (B) [1, 2]. The background chance distribution was estimated by generating the same number of predicted miRNA-target pairs based on randomly chosen agedependent miRNA and target genes within discordant groups of each pattern for 1,000 times. The shaded areas show the 95% confidence intervals of the correlation coefficients' chance distributions. The median of background chance distribution was subtracted from both the predicted miRNA-target and the chance background distributions.  The curves show the mRNA (grey) and protein (colored) expression of the six discordant genes in the PI3K-Akt_mTORC1 signaling cascade, calculated using cubic spline regression. The symbols show the expression in each individual. The y-axis shows mRNA and protein expression, normalized to the mean and standard deviation of corresponding expression levels in developmental interval. The vertical dashed line marks separation of developmental and aging intervals.

Ethics Statement
Informed consent for the use of human tissues for research was obtained in writing from all donors or their next of kin. All macaques used in this study suffered sudden deaths for reasons other than their participation in this study and without any relation to the tissue used. The Biomedical Research Ethics Committee of Shanghai Institutes for Biological Sciences completed the review of the use and care of animals in this research project (approval ID: ER-SIBS-260802P).

Sample collection
Healthy human tissues were obtained from the NICHD Brain and Tissue Bank for Developmental Disorders at the University of Maryland, USA and the Netherlands Brain Bank. Macaque tissues were obtained from the Suzhou Experimental Animal Centre in China.
The role of the NICHD Brain and Tissue Bank and the Netherlands Brain Bank are to distribute tissue and, therefore, cannot endorse the studies performed or the interpretation of results. All human subjects were defined as normal controls by forensic pathologists at the NICHD Brain and Tissue Bank and the Netherlands Brain Bank. No subjects who suffered a prolonged agonal state were included.
PFC dissections were made from the frontal part of the superior frontal gyrus, a cortical region approximately corresponding to Brodmann area 9. All samples contained an ~2:1 grey matter to white matter volume ratio.

Human-macaque consensus reference genome construction
Chained and netted alignment files for human (hg19) and macaque (rheMac3), aligned using BLASTZ [12], were downloaded from the UCSC Genome Browser. On the basis of these alignments, we constructed a human-macaque consensus genome. Specifically, we used the human genome as a reference for all discordant genomic sites, and replaced insertions and deletions with "N"s in the consensus genome. Furthermore, 6 bp regions flanking each insertion or deletion site were masked by "N"s in the resulting human-macaque consensus reference genome.

Computational pre-processing of mRNA deep sequencing
The deep sequencing data of human and rhesus superior frontal gyrus of the prefrontal cortex (PFC) were obtained from the Sequence Reads Archive (http://www.ncbi.nlm.nih.gov/sra) under the accessions SRP005169 for [13] (Table S1-S3).
To quantify and compare gene expression in the two species in an unbiased manner, we mapped human and rhesus RNA-seq reads to the human-macaque consensus reference genome, using STAR (v. 2.3.0e; [14]) with parameters allowing a maximum of three mismatches and a minimum 90% percentage of non-'N' matches. Potential PCR duplicates were removed by samtools rmdup (v. 0.1.19; [15]). For each mapping procedure, only uniquely mapped reads were used in further analyses.
Gene expression was quantified as the number of reads per kilo-base per million of total mapped reads (RPKM) by GENCODE annotation (v.17; http://www.gencodegenes.org) [16]. The number of reads that map to each gene was counted by Python package HTSeq (v. 0.5.4p2; http://wwwhuber.embl.de/users/anders/HTSeq/), adding the only parameter of whether the data was from a strand-specific assay. The gene length was defined as the union of all its exons.
Only genes with RPKM ≥ 1 in more than 2/3 of the samples in one species (human or macaque) were classified as reliably expressed in that species and were used in subsequent analyses.

Protein sample preparation and Label-Free 2D-MS/MS Thermo-LTQ Proteomics Methods
For extraction, we used 100mg frozen prefrontal cortex samples from 12 human and 12 macaque individuals (Table S5-S6). The 12 human and 12 macaque samples were processed in two batches with 6 individuals in each, with similar age distributions in both batches. Tissue samples were minced, washed in ice-cold PBS and homogenized in ice-cold lysis buffer (8 M urea, 4% CHAPS, 65 mM DTT, 40 mM Tris, cocktail protease inhibitor, 100 mg of tissue/1 ml) using an electric homogenizer. The resulting protein solutions were sonicated on ice for a total of 3 minutes and then centrifuged at 25,000g for 1 hour at 4°C to remove DNA, RNA and other cell debris. Next, the protein supernatants were precipitated using 5× volumes of precipitation solution (ethanol: acetone: acetic acid = 50:50:0.1, volume ratio) at 4°C overnight, centrifuged and re-suspended in denaturing buffer [6 M guanidine hydrochloride, 100mM Tris, cocktail protease inhibitor, phosphatase inhibitors (1mM sodium orthovanadate and 1mM sodium fluoride), pH 8.3]. Protein concentration was determined using the Bradford assay. Next, 600μg of protein from each sample was reduced with DTT (100μg / 1μl 1M DTT), alkylated with IAA (100μg / 2μl 1M IAA), and precipitated again at 4°C overnight (as described above). After centrifugation, the resulting precipitates were re-suspended in digestion buffer (100mM ammonium bicarbonate) and incubated with Trypsin (enzyme:protein = 1:40, mass ratio) at 37°C for 20 hours, followed by ultrafiltration and lyophilization.
Each peptide sample was re-suspended in 50μl SCX loading buffer, loaded on a SCX (Strong Cation Exchange) column (Column Technology Inc., CA, USA) and eluted using a pH continuous gradient buffer (from pH 2.5-8.5), resulting in 10 fractions. Each of these fractions was then automatically loaded on one of two RP (Reversed Phase) alternative trap columns, by switching to the other RP column every 3 hours. Analysis was performed on the LTQ mass spectrometer equipped with a metal needle electrospray interface mass spectrometer (ThermoFinnigan, San Jose, CA, USA) in a data-dependent collection model (each full scan followed by ten MS/MS scans of most intense ions).

Computational pre-processing of quantitative proteomics
Peptides were identified by searching spectrums against the UniProt Knowledgebase (UniProKB) complete proteome human set [17], using the database search engine MS-GF+ [18] with default modification parameters. Each peptide had to be fully tryptic, and had a length of at least 6 amino acids. The parent mass tolerance was set to 2.5 Da. We estimated peptide-spectrum matches (PSM) level FDR of peptide identification by searching the combined database of target dataset and the reversed decoy database. Only peptides with FDR<1% were considered. For each database searching procedure, only spectrums matched to unique peptide were used in further analyses. Ten MS/MS scans of each sample were combined together.
Peptide data were mapped per gene to Ensembl genes by UniProtKB/Swiss-Prot and UniProtKB/TrEMBL annotations. Criteria for protein identification included detection of at least 2 unique peptides. Ambiguous peptides and redundant proteins were removed.
Quantitation of protein expression of each gene was achieved by the normalized spectral abundance factor (NSAF) [19] to allow the comparison of abundance of individual proteins in multiple independent samples. The count of MS/MS spectra assigned to a protein was normalized to the length of the protein (total number of amino acids), resulting in a Spectral Abundance Factor (SAF). Each SAF was further normalized against the sum of all SAFs in one sample, resulting in the NSAF value.
Only genes with mean NSAF≥1 and a positive NSAF value in at least 6 of the 12 individuals were classified as reliably detected in a species (human or macaque), and were included in the downstream analysis.

Age-scales for modeling expression change.
A log 2 -age scale is suited for modeling exponential changes during early development (e.g. [20]) while linear age allows better identification of linear changes that occur during aging [21,22]. To avoid bias toward either age stage, we used the power of 0.25 of donor age to simultaneously capture developmental and aging-dependent changes. Using the power of 0.25 of ages also ensures more uniform distribution of samples across age scale, compared to the other two scales. At the same time, it limits heteroscedasticity across ages, which pertains to the assumption of parametric regression models that errors be normally distributed and homogeneous among samples. We therefore used 0.25-power-based estimates when testing for age-effects. The choice of age scale does not qualitatively affect our results regarding mRNA-protein decoupling.

Computational pre-processing of age-dependent genes
We followed the steps of [23] to test the effect of age on mRNA expression level, using polynomial regression models. We used the power of 0.25 of donor age to simultaneously capture developmental and aging-dependent changes (as mentioned above). For each detected gene, we choose the best regression model with scaled age as predictor and mRNA expression level as response, using families of polynomial regression models and the "adjusted r 2 " criterion. The most complex of these models is defined in the formula: where Y ij is the expression level for gene i and subject j, A j is the age of the subject j, and ε ij is the error term. The sum of squared errors in this model are compared to the null model, Alternative models are the above models' subsets, including linear and quadratic ones.
The significance of the chosen regression model was estimated using the F-test, and Benjamini-Hochberg correction was carried out for all tested genes as multiple test correction.
We used the power of 0.25 of donor age (as mentioned above) and fitted spline curves to the expression data to describe change with age (degree of freedom=4).
We corrected for life-history differences between humans and macaques following the formula based on a number of resources [24, 25]: age human = 2.257 + 2.953*age macaque to transform macaque age into the same scale as human. We used the power of 0.25 of re-scaled macaque ages and fitted spline curves to the expression data to describe change with age (degree of freedom=4) in all downstream analysis.

Analysis of mRNA-protein disparity
To analyze the correlation between transcriptome and proteomics in the classical framework of human lifespan, we separated "development" interval (postnatal life interval prior to sexual maturity) and "aging" interval (life interval posterior to sexual maturity), with 14 years as borderline following the commonly-used age intervals for pre-and post-adulthood [21,26]. For both periods we used data from similar numbers of individuals. About 4 years was used as borderline for macaques based on the age formula mentioned above.
While our samples' ages are not uniform along the age-range, and the mRNA and protein datasets have different age windows, we interpolated 10 uniformed points along the age range of each period, at the 0.25-powered age scale (see also Figure S2).
Spearman's rank correlation coefficient rho was calculated for each gene for development and aging periods respectively, between mRNA and protein expression levels, based on interpolated points of spline curves.
We carried out a simulation 10,000 times, with a randomized subset of the original data under the same scale and distribution. We calculated ±0.552, as the 95% cut-off points of Spearman's rank correlation coefficient two-tailed distribution by simulation, which are used as cut-offs with FDR < 5% of significant correlation coefficient in the downstream analysis.

Equalization of variation, expression and amplitude
To preclude the effects led by different variation (standard deviation (sd) and coefficient of variation (cov)), by different expression levels during development and aging life-intervals, or by higher absolute expression changing rate within development, expression profiles were equalized by sd and cov, by average expression levels, as well as equalization of gene expression change amplitudes, between developmental and aging intervals.
We simulated the distribution of the common part of the expression/amplitude curves of developmental and aging intervals. Then, according to this distribution, we subsampled 1,000 times from age-dependent genes, for each species. In each subsampling, we compared the median of correlation distributions between developmental and aging intervals by Wilcoxon test.

Overlap of concordant/discordant gene groups between databases
We estimated the reproducibility of concordant and discordant gene groups resulted from original human time-series dataset, with gene groups resulted independently from (1) macaque time-series dataset, (2) a different human RNA-seq time-series dataset, (3) human mRNA and macaque protein expression, and (4) vice versa. We subsampled 1,000 times from all the age-dependent genes, of two datasets. In each subsampling, we selected genes from each dataset of the same number as concordant/discordant gene groups and checked the number of overlapped genes, and compared real overlap amount with simulated overlap distribution.

Clustering genes into groups
We grouped concordant and discordant genes into 4 patterns of mRNA expression using k-means clustering. Before clustering, each gene was standardized to mean=0 and standard deviation=1. Because k-means is a heuristic algorithm, we repeated the procedure 1,000 times to determine the most frequent constellation and used this clustering in downstream analysis.

Functional analysis
We used the Gene Ontology (GO) categories of biological process (BP), molecular function (MF) and cellular component (CC), together with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases [27,28] for testing functional enrichment of gene groups. To identify overrepresentation, we used GeneCoDis3, a non-redundant and modular enrichment analysis tool for functional annotation of gene sets [29] to investigate the putative functions of concordant and discordant gene groups in each of four patterns. Validated detected age-dependent genes were taken as background, and Hypergeometric test p-values were adjusted for multiple testing by the Benjamini-Hochberg correction.

Extraction of exonic RBP binding sites
All data sets of CLIP experiments (including Hits-CLIP, Par-CLIP and iCLIP) used in the analysis were collected from published literatures. Precise positional target site information were extracted from the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo), from the Sequence Reads Archive (http://www.ncbi.nlm.nih.gov/sra), from the EMBL-EBI ArrayExpress (http://www.ebi.ac.uk/arrayexpress/), or directly from the supplementary data of those literature without accession IDs (Table S10).
The precise positional RBP target sites extracted from the CLIP data sets were mapped to gene locations by Ensembl gene annotation. The versions of Ensembl annotation were selected corresponding to the versions of the human genome used by the original publications.
We followed the cut-off used in the original literature for each CLIP experiment. Only binding sites passing the cut-off were used in subsequent analyses. To focus on post-transcriptional regulations corresponding to the mature mRNA, we excluded the binding sites located in intronic regions or in intron/exon splicing junctions.

Key regulator RBP identification
To define a RBP as a key regulator of the disparity gene group of a pattern, we checked three conditions: (1) the regulator should target (has at least one validated binding site on the gene) a significantly higher percentage of discordant genes, compared to the concordant genes in the same pattern. The significance is indicated by p < 0.05 in a binomial test after Bonferroni correction between percentages of targeted genes in concordant and discordant gene groups. (2) The regulator should have significantly more binding sites on the exon union of discordant genes compared to the concordant genes in the same pattern. The binding site number enrichment is classified with p < 0.05 of one-sided Wilcoxon test after Bonferroni correction between distributions of binding site number on genes in concordant and discordant gene groups. (3) After correction by the length of exon union for each gene, the regulator should have significantly higher density of binding sites in discordant genes compared to the concordant genes in the same pattern. The significance is indicated by the p < 0.05 of one-sided Wilcoxon test after Bonferroni correction between distributions of binding site density of genes in concordant and discordant gene groups. A regulator is classified as a key regulator if it fulfills condition (1) and at least one of conditions (2) and (3).

Computational pre-processing of miRNA deep sequencing
The miRNA deep sequencing data of human superior frontal gyrus of the prefrontal cortex (PFC) were obtained from the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under series accession no. GSE18069 for [30] (Table S1 and S5).
We followed the procedures of small RNA sequencing data pre-processing and miRNA expression quantification from [31]. In brief, for the small RNA sequencing data pre-processing procedure, all unique sequences were trimmed using the custom trimming procedure, to remove the adapter sequence at the 3'-end. Trimmed sequences (18-28-nt) were mapped to the human genome (hg19) by Bowtie algorithm [32], requiring a perfect match. For the miRNA expression quantification procedure, we downloaded human miRNA annotations, including the sequences and genomic locations of miRNA precursors and mature miRNA sequences, from miRBase version 17 (ref). All perfectly-mapped sequences were used for the following quantification procedure. First, all sequences mapping within three nucleotides upstream or downstream of the annotated 5'-position of the mature miRNAs were retained. Then, for each mature miRNA, the sequence with a maximal copy number was designated as the reference sequence. Finally, the expression level of each miRNA was calculated as a sum of the copy number of the reference sequence and the sequences mapping at the same 5'-end position as the reference sequence. The expression level of each miRNA was calculated as the Transcripts Per Million reads (TPM): the number of reads mapped to the transcript normalized by the number of total mapped reads.

miRNA binding site extraction from in silico prediction
Human miRNA target prediction was based on miRNA target site identified using Targetscan [33] and additional target site conservation criterion. The mature miRNA sequences were downloaded from miRBase (v17). The 3'UTR sequences of human, mouse, rat, dog and chicken were downloaded from Ensemble Genes (v64) using biomart. The 3'UTR sequence alignments of five species was conducted using MUSCLE [34] with default parameters, aside from requiring the "stable" parameter to keep the input sequence order. Only the alignments with less than 50% of the gap sequence in human, mouse, rat, dog and chicken were used for downstream analysis. The scripts from the TargetScan website (v5.2) were used to predict miRNA target sites across five species. Only target sites that were conserved in at least 3 out of 4 species (mouse, rat, dog and chicken) were classified as reliable conserved targets.
These results were compared to target predictions obtained by tools based on different prediction strategies in downstream analysis, with mirDB [1] (version 3.0) based on support vector machines (SVMs) trained by microarray experiment data, and CoMeTa [2] based on co-expression metaanalysis ranking pooled targets predicted by multiple algorithms.

Enriched targeting miRNA identification
To identify miRNA with enrichment of predicted targets in discordant gene groups in each of the four patterns, we compared miRNA binding site number with concordant genes in the same pattern. To avoid the influence of miRNA gene families, we restricted this comparison to a single test per unique miRNA seed. If a miRNA's predicted targets were enriched in a pattern at hypergeometric test p < 0.05 after Bonferroni correction, we considered that miRNA "specific" to that discordant gene group. All the other miRNAs were considered "non-specific" between the concordant and discordant gene groups in this pattern. These results were compared to chance distributions obtained using 1,000 permutations by randomized labeling of concordant/discordant genes within each pattern.

Functional miRNA regulation estimation
To estimate the negative effect of miRNAs on the translation of discordant genes in each pattern, we calculated the predicted miRNA-target distribution of the Spearman's rank correlation coefficients measured based on expression profiles of age-dependent miRNA and protein expression of their predicted target transcripts in aging interval.
We estimated the background chance distribution by generating the same number of predicted miRNA-target pairs based on randomly chosen age-dependent miRNA and target genes within the discordant group of each pattern for 1,000 times.
We then estimated the difference between predicted miRNA-target distribution and background chance distribution by subtraction of the median of background chance distribution. A discordant group of one pattern was considered negatively regulated by miRNAs on the translation level, if and only if significant excess of negative correlation (p < 0.05, FDR < 5%, Spearman's rank correlation) was observed. The significance is indicated by 95% confident intervals of chance difference obtained in background chance distribution.