Animal models
Six-week-old male HXB/BXH RI rats (n = 30; left ventricle and liver), congenic SHR.BN-D3Rat108/D3Rat56 rats (SHR.BN-(3L); n = 5; left ventricle), congenic SHR.BN-D3Rat108/D3Rat124 rats (SHR.BN-(3S); n = 5; left ventricle), transgenic SHR/Ola-Tg rats expressing Endog (n = 5; left ventricle), and wild type SHR/Ola rats carrying an Endog LoF mutation (n = 5; left ventricle) were housed, bred, and fed ad libitum with a natural diet (Altromin 1314) in an air-conditioned animal facility at the Czech Academy of Sciences, Prague, Czech Republic. The congenic rat lines were designed as follows (as described in [20]): For (SHR.BN-D3Rat108/D3Rat56 or “SHR.BN-(3L)”), a longer genomic BN fragment (Chr3 0–60 Mb) replaces the entire Chr. 3p teQTL in an otherwise fully SHR/Ola genetic background. For (SHR.BN-D3Rat108/D3Rat124 or “SHR.BN(3S)”), only a shorter fragment (Chr3 11.2–60 Mb) adjacent to, but not overlapping, the identified teQTL is replaced with a BN fragment. Transgenic SHR/Ola-Tg(CMV-Endog)136 strain (hereafter referred to as the SHR-Endog transgenic) was derived by microinjecting fertilized eggs with a mix of the Sleeping Beauty construct containing Endog cDNA of BN origin under control of the universal EF-1α promoter and mRNA of the SB100X transposase. Transgenic rats were detected using PCR with the following primers: Endog-F 5′-CGA CAC CTT CTA CCT GAG CA-3′ and Endog-R 5′-GGC CCT GTG CAG ACA TAA AC-3′. The rats were killed by cervical dislocation without prior anesthesia.
The Endog KO mouse was derived from a C57BL/6J background and provided by Dr. Michael Lieber, University of Southern California, LA, CA, USA [92]. From the provided founder animals, a colony was established and actively maintained for multiple years within the lab of Daniel Sanchis (Institut de Recerca Biomedica de Lleida, Spain) [20]. The investigation with the Endog KO line was approved by the Experimental Animal Ethic Committee of the University of Lleida and conforms to the Guide for the Care and Use of Laboratory Animals, 8th Edition, published in 2011 by the US National Institutes of Health. The 6-week-old male mice used for Ribo-seq and RNA-seq experiments were housed in Tecniplast GM500 cages (391 × 199 × 160 mm) never exceeding 5 adults / cage. Animals were anesthetized with a lethal dose of inhaled isofluorane and decapitated within the facility by expert staff.
Generating a genotype map of the HXB/BXH panel
To refine an existing [3, 30] genotype map of the HXB/BXH panel and convert this map to the latest rat genome assembly (rn6), we genotyped the 30 HXB/BXH recombinant inbred panel lines using a custom-designed Affymetrix RATDIV single-nucleotide polymorphism (SNP) Array at 805,399 variable genetic positions, as described previously [93]. In short, genotyping was performed according to the Affymetrix SNP chip 6.0 protocol using 250 ng (RNase A-treated) genomic DNA, isolated from rat liver tissue and digested with StyI and NspI, respectively. Genotypes were called and high-quality markers were selected from the 805,399 genotyped SNPs. For this, the original 25-mer Affymetrix probes were first remapped to the latest Ensembl rat genome build (Rnor 6.0) [94] using BLAST [95], requiring the wild type or variant probe to map uniquely within the entire rat genome (as described previously in [93]). We furthermore excluded (i) SNPs within 13 base pairs of an indel, (ii) missing or heterozygous variant calls, (iii) monomorphic markers, and (iv) SNPs with a call rate lower than 0.99. The resulting genotype calls could be collapsed into 2957 genotype blocks, or strain distribution patterns (SDPs), with an average size of 0.75 Mb. Collapsing genotypes into SDPs increased the power for downstream QTL mapping, as not every SNP had to be tested individually. An SDP changed to a next SDP as soon as one of the 30 lines consistently switched genotypes. As some SDPs can occur more than once in the genome, e.g., by chance or because of genotyping or genome assembly errors, we merged such SDPs into a single, globally uniquely occurring SDP, while preserving positional information. Subsequently, we merged identical SDPs if separated by a single alternating SDP (e.g., due to a SNP genotyping error). This results in a set of 1685 unique SDPs that we subsequently used for QTL mapping (Additional file 6: Table S5).
Ribosome profiling of heart and liver tissue
For ribosome profiling and mRNA-seq, snap-frozen and powdered tissue was obtained from the animals described in the “Animal models” section. For all samples except for the transgenic Endog rats and the Endog knockout mice (see below), ribosome profiling was performed using the TruSeq Ribo Profile (Mammalian) Library Prep Kit (Illumina, San Diego, CA, USA), according to a TruSeq Ribo Profile protocol optimized for use on tissue material, as described previously [31, 96]. In short, ± 50–100 mg powdered tissue was lysed for 10 min on ice in 1 mL lysis buffer consisting of 1 × TruSeq Ribo Profile mammalian polysome buffer, 1% Triton X-100, 0.1% NP-40, 1 mM dithiothreitol, 10 U ml− 1 DNase I, cycloheximide (0.1 mg ml− 1), and nuclease-free H2O. Using immediate repeated pipetting and multiple passes through a syringe with a 21G needle, we dissociated tissue clumps to create a homogenous lysate that facilitates quick and equal lysis of the tissue powder. Samples were next centrifuged at 20,000g for 10 min at 4 °C to pellet cell and tissue debris. Per sample, 400–800 μl of lysate was further processed according to the TruSeq Ribo Profile (Mammalian) Reference Guide with the additional modification of 8% PAGE selection directly after PCR amplification of the final library. For all samples, ribosome profiling library size distributions were checked on the Bioanalyzer 2100 using a high-sensitivity DNA assay (Agilent; 5067-4626), multiplexed, and sequenced on an Illumina HiSeq 2500 producing single end 1 × 51 nt reads. HXB/BXH RI panel samples were always processed in large batches of maximum 30 samples to avoid a sample processing bias.
For heart tissue of transgenic and wild type SHR/Ola rats, as well as Endog knockout and wild type C57BL/6 mice, a slightly modified procedure was used due to the termination of the TruSeq RiboProfile kit production by Illumina. The isolation of ribosome footprints is identical to the procedure with the TruSeq kit and as described in [31], except for the use of 7.5 μL Ambion RNase 1 (Thermo Fisher Scientific AM2295; 100 U/μL). Following footprint isolation and PAGE purification, footprints were phosphorylated (NEB T4 PNK; New England Biolabs M0201) and used as input for small RNA library prep using the NEXTflex Small RNA-Seq Kit v3 (Bioo Scientific - PerkinElmer NOVA-5132-06). Libraries were prepared according to the manufacturer’s instructions (V19.01), size-selected on 8% PAGE gels (Thermo Fisher Scientific EC6215BOX), and quality checked on a Bioanalyzer 2100 (high sensitivity DNA assay; Agilent; 5067-4626). Libraries displayed an average size of 157 bp and were sequenced in a multiplexed manner averaging 4 samples per lane on an Illumina HiSeq 4000. Downstream Ribo-seq data QC shows identical read quality, library complexity, and footprint periodicity as libraries generated by Illumina’s TruSeq RiboProfile procedure.
Replicate HXB/BXH Ribo-seq experiments
On average, each genomic locus within the HXB/BXH RI panel is shared by 15 animals, as all 30 RI lines are a homozygous mixture of 2 genetic backgrounds (BN-Lx and SHR/Ola). To assess the biological variability across individual animals of each HXB/BXH RI line, we performed replicate Ribo-seq experiments on liver tissue of 3 animals (i.e., biological replicates) for 2 of the 30 RI lines: BXH12 and BXH13. For each, we find Pearson correlations > 0.99 across biological replicates, reassuring the high quality of our data and reproducibility of the library preparation and sequencing approach (Additional file 1: Figure S1C).
mRNA-seq and totRNA-seq
For mRNA-seq and totRNA-seq, total RNA was isolated using TRIzol Reagent (Invitrogen; 15596018) using 5–10 mg rat and mouse tissue of the exact same powdered tissue samples (from the exact same animals) used for Ribo-seq. RNA was DNase treated and purified using the RNA Clean & Concentrator™-25 kit (Zymo Research; R1018). RIN scores were measured on a BioAnalyzer 2100 using the RNA 6000 Nano assay (Agilent; 5067-1511). Poly(A)-purified mRNA-seq libraries or ribosomal RNA-depleted totRNA-seq libraries were generated from the same sample of high-quality RNA (average RNA integrity number (RIN) for HXB/BXH rats of 9.1 (Additional file 1: Figure S1A). RNA-seq library preparation was performed according to the TruSeq Stranded mRNA or total RNA Reference Guide, using 500 ng of total RNA as input. Libraries were multiplexed and sequenced on an Illumina HiSeq 2500 or 4000 producing paired-end 2 × 101 nt reads.
Polysome profiling of congenic rat hearts
Powdered left ventricular heart tissue (3 replicates per congenic line) was lysed in polysome lysis buffer composed of 20 mM Hepes pH 7.5, 5 mM MgCl2, 300 mM KCl, 2 mM DTT, 100 μg/mL cycloheximide, 0.2% NP-40, and 40 U/μl RNAseOut (Invitrogen). Following a 30-min incubation at 4 °C in rotation, the lysed tissue samples were centrifugated for 15 min at 20,000×g at 4 °C. An aliquot of the lysate was used to quantify total RNA concentration using the Direct-zol RNA kit (R2051; Zymo, USA) according to the manufacturer’s instructions. From the clear supernatants of the lysates, 15 μg of total RNA was loaded onto 10–50% linear sucrose gradients prepared in polysome buffer (20 mM Hepes pH 7.5, 5 mM MgCl2 and 300 mM KCl, 2 mM DTT), and centrifuged at 32,000 rpm (129,311×g) (SW40Ti rotor, Beckman) for 177 min at 4 °C. Sucrose gradient fractions were separated using a Biocomp Piston gradient fractionation system associated to a Biorad fraction collector (Biorad model 2110 Fraction Collector) into 42 fractions of 300 μl each, and the absorbance was monitored at 254 nm with an ultraviolet absorbance detector (Biorad model EM-1 Econo UV monitor) to record the polysome profile. Fractions corresponding to the monosomes, light, medium, and heavy polysomes were pooled separately. RNA was extracted with 3× volumes of TriFast-FL (VWR, USA) and purified using Direct-zol RNA kit (Zymo, USA) according to the manufacturer’s instructions. RNA was DNase treated and purified using the RNA Clean & Concentrator™-25 kit (Zymo Research; R1018). RIN scores were measured on a BioAnalyzer 2100 using the RNA 6000 Nano assay (Agilent; 5067-1511). Ribosomal RNA-depleted totRNA-seq libraries were generated from high-quality RNA (Additional file 2: Table S1). RNA-seq library preparation was performed according to the TruSeq Stranded total RNA Reference Guide, using 200 ng of total RNA as input. Libraries were multiplexed and sequenced on an Illumina HiSeq 4000 producing paired 2 × 78 nt reads.
Sequencing data alignment
Prior to mapping, ribosome profiling reads were clipped for residual adapter sequences and filtered for mitochondrial, ribosomal RNA, and tRNA sequences. Next, we trimmed the 2 × 101 nt mRNA-seq reads to 29-mers (matching Ribo-seq footprint lengths, which peak at 28-29 nt) and processed those mRNA reads exactly the same as the ribosome profiling data, in order to avoid a downstream mapping or quantification bias due to read length or filtering. For mapping of the HXB/BXH rat RI panel data, we first used Tophat2 v2.1.0 [97] to align the full-length 2 × 101 nt mRNA-seq against the rat reference genome (Rattus Norvegicus rn6, Ensembl release 82), in order to obtain all splicing events naturally occurring in heart and liver tissue. Next, all 29-mer trimmed mRNA and ribosome profiling data were mapped using the splice junction information gathered from the alignment of the full-length mRNA-Seq reads. TopHat2 was used for the initial sequencing data alignment and splice junction determination of the HXB/BXH data analysis, as at the time this project was initiated current state-of-the-art alignment tools were not yet available. Sequencing data was aligned to the reference genome, and not to reconstructed SNP-infused genomes, because the number of allowed mismatches per 29-mer (2 mismatches) suffices to overcome a mapping bias caused by SHR-specific SNPs. We tested this reasoning extensively by aligning replicate trimmed mRNA-seq and Ribo-seq data of SHR/Ola animals (5 replicates) [96] to the BN reference genome or to an SHR/Ola SNP-infused genome. Moreover, we detected no significantly differentially expressed genes, i.e., genes for which the expression change could be attributed to a mapping bias driven by local genetic variation. On average, for the HXB/BXH Ribo-seq data, we can uniquely align 27.8 M Ribo-seq reads for left ventricular tissue samples and 41.5 M Ribo-seq reads for liver tissue samples, equaling between 71 and 87% of the total number of sequenced reads used for mapping.
For Ribo-seq and RNA-seq data obtained from congenic rats, transgenic rats, knockout mice, and polysome fractionation experiments, sequencing alignment strategies were identical to described above, but using STAR 2-pass v2.7.1a [98] instead of TopHat2 to improve mapping accuracy and speed. Mice data was mapped to the Mus Musculus reference genome mm10, Ensembl release 85. We used STAR to align the previous datasets mapped with Tophat2 and we found Pearson correlations > 0.99 across both methods, supporting the reproducibility of the data regardless of the mapping algorithm. Data QC of all Ribo-seq libraries was performed using Ribo-seQC v1.1 [99].
Identifying translated open reading frames
To define the set of translated genes in rat heart and liver, we used RiboTaper v1.3 [100] with standard settings to detect open reading frames that display the characteristic 3-nt codon movement of actively translating ribosomes. For each sample, we selected only the read lengths for which at least 70% of the reads matched the primary ORF in a meta-gene analysis. This results in the inclusion of footprints of the most prominent read lengths: 28 and 29 nucleotides. The final list of translation events was stringently filtered requiring the translated gene to have an average mRNA-seq RPKM ≥ 1 and be detected as translated by RiboTaper in at least 10 out of 30 HXB/BXH RI lines. We did not only retain canonical translation events, but also translated short ORFs (sORFs) detected in long noncoding RNAs (lncRNAs), or upstream ORFs (uORFs) positioned in front of primary ORFs of annotated protein-coding genes. LncRNA sORFs were required to not show sense and in-frame overlap with annotated protein-coding genes. We categorically grouped noncoding genes with antisense, lincRNA, and processed transcript biotypes as long noncoding RNAs (lncRNAs), if they matched specific filtering criteria described previously [31]. Upstream ORFs encompass both independently located (non-overlapping) and primary ORF-overlapping translation events. Primary ORF-overlapping uORFs were distinguished from in frame, 5′ extensions of the primary ORF requiring each overlapping uORF to have a translation start site before the start of the canonical CDS, to end within the canonical CDS (prior to the annotated termination codon) and to be translated in a different frame than the primary ORF, i.e., to produce a different peptide. We combined both types of uORFs into a single uORF category as we detect no differential impact of each uORF category on the primary ORF TE, in accordance with previous work [31]. For the visualization of P-site tracks (Additional file 1: Figure S4E), we used plots generated by Ribo-seQC [99].
Quantifying mRNA expression and translation
Gene- or feature-specific expression quantification was restricted to annotated and identified translated (coding) sequence and performed using HTSeq v0.9.1 [101] with default parameters. For quantifying ribosome association in small and long noncoding RNAs, i.e., genes without annotated coding sequences (CDSs), we additionally ran HTSeq on exonic gene regions. For quantification of the Ttn gene, which codes for the longest protein existing in mammals, we used a custom annotation [31, 102] as Ttn is not annotated in the current rat gene annotation. For this reason, Ttn was initially not included in the QTL mapping analyses, but later on added to define the effect of its length on Ttn’s translational efficiency. Moreover, we masked one of the two identical SURF cluster regions in the rat genome (chr3:4,861,753-4,876,317 was masked and chr3:5,459,480-5,459,627 was included), as both regions shared 100% of nucleotide identity and the six expressed SURF genes could not be unambiguously quantified. Since 406 snoRNAs have paralogs with 100% of sequence identity and unique counts cannot be unambiguously assigned to these sequences, these RNAs were not considered for quantification. In summary, we thus used (i) uniquely mapping CDS-centric counts for mRNA and translational efficiency quantifications, and (ii) uniquely mapping exonic counts for noncoding RNA quantifications (e.g., SNORA48) after excluding snoRNAs clusters sharing 100% of sequence similarity.
The mRNA-seq and Ribo-seq count data was normalized using a joint normalization procedure (estimateSizeFactorsForMatrix; DESeq2 v1.26.0 [103]) as suggested previously [104]. This allows for the determination of size factors for both datasets in a joint manner, as both count matrices follow the same distribution. This is crucial for the comparability of the two sequencing-based measures of gene expression, which for instance becomes important for calculating a gene’s translational efficiency (TE). The TE of a gene can be calculated by taking the ratio of Ribo-seq reads over mRNA-seq reads [22], or, when biological replicates are available, calculated via specialized DESeq2-based tools [104,105,106]. As we here require sample-specific TE values for downstream genetic association testing with QTL mapping, we regress out the measured mRNA-seq expression from the Ribo-seq expression levels using a linear model. This allows us to derive residuals for each sample-gene pair, that we subsequently subject to QTL mapping. Thus, the TE refers to the residuals of the linear model: resid (lm (normalized_Ribo-seq_read_counts ~ normalized_mRNA-seq_read_counts)). The main advantage of TE values obtained with this calculation is that we retain a quantitative range suitable for QTL mapping, which would not be the case for ratio-based TEs.
Pairwise association testing using Matrix eQTL
In order to understand the impact of genetic variants on gene expression regulation, we performed quantitative trait locus (QTL) mapping using the linear regression model-based Matrix eQTL v2.1.1 [107]. For association testing, non-unique SDPs are grouped and associations of surrounding SDPs are considered when defining the correct SDP location, thereby avoiding falsely assigned distant QTLs because of misplaced contigs in the rn6 rat genome assembly. For this, we reasoned that true associations are likely visible in surrounding SDPs, as genotype changes between two neighboring SDPs are usually gradual, and only a statistically unlikely multitude of recombination events between two neighboring SDPs would fully quench the detected association. After each association is assigned to the correct SDP, we performed a Benjamini-Hochberg correction on local and distant associations separately. Subsequently, we performed permutation testing to determine the significance of local and distant associations, by deriving the distribution of test statistics under the null hypothesis that there is no association. We therefore randomized all samples in the gene expression matrix and performed 10,000 runs of Matrix eQTL on the original genotype matrix. A significant association was defined as having an empirical p value ≤ 0.0015 (less than 15 more extreme p values in 10,000 permutations). For all types of QTLs tested in this study (eQTLs, riboQTLs, teQTLs, and uORF-QTLs), the same association settings and filtering criteria are applied.
A QTL is defined as “local” when it locates within the SDP block of the gene locus for which the association was detected. Similarly, a distant QTL is defined as a trait-associated locus when it is located on a chromosome different from the one that hosts the associated gene. In order to evaluate the presence of cross-mappability artifacts when identifying distant QTLs, we adapted a published method [108] and identified pairs of sequences with shared 29-bp k-mer sequences that are susceptible to be cross-mapped, allowing a maximum of 2 mismatches. In the heart, only 1 gene (ENSRNOG00000054609) with a distant QTL was cross-mappable with a cis-gene (ENSRNOG00000019925) in the same SDP. This distant QTL was therefore filtered out. Moreover, no cross-mappable genes with distant QTLs were found in liver. Therefore, our analysis reassured that cross-mapping did not affect the detection of distant QTLs. Throughout the manuscript, QTL numbers reported are gene-centric, i.e., if for instance two neighboring SDPs show significant association with the same gene, a single association is counted. When a given gene associates with both local and distant SDPs, these associations are reported separately. We additionally tested all available technical covariates for a potential impact on our results. These included (i) date of tissue processing, (ii) individual who prepared the libraries, (iii) RIN of the sample, (iv) library concentration (after PCR amplification), (v) date of library PCR, and (vi) sequencing batch. None of these technical covariates showed a significant impact on our data (ANOVA p values between 0.11 and 0.97; using the first PC1 that describes 50–80% of the variance in the data). In addition, we also tested additional confounding factors for a possible impact on our results. We calculated the level of relatedness by assessing the covariance of the genotypes across all 30 recombinant inbred lines (average covariance of 0.506; Additional file 1: Figure S1I + J). Additionally, we ran fastSTRUCTURE [109] to investigate the population structure of our HXB/BXH rat RI panel, identifying five distinct subpopulations of 3–12 individuals defined by different SDP allele frequencies. We used these allele frequencies to estimate the fixation index (Fst): Fst = 1 − (Hs / HT), where Hs is the average expected heterozygosity within subpopulations and HT corresponds to the expected heterozygosity of the total population. A fixation index value of 0 indicates no differentiation between the defined subpopulations, whereas a value of 1 corresponds to complete differentiation [110]. In this case, the average fixation index for the panel was 0.203, suggesting a limited effect of the population structure in the observed genetic differences across individuals. Next, we used lme4qtl [111] to build a linear mixed model considering both relatedness and population structure, and we estimated the robustness of the identified QTLs by statistically comparing the linear mixed model with a null model where the genetic effects were not included. Reassuringly, 91.88–96.17% of cis and 80.00–100.00% of trans QTLs displayed significance in the linear mixed model (adjusted ANOVA p values < 0.0001). Additionally, we tested if QTL effects were largely driven by the presence of hidden surrogate variables. First, we evaluated the specific effect of hidden confounders in the matrix eQTL calculations by using the “sva” R package [112]. Correlation of original and corrected effect sizes for heart local QTLs confirmed that these covariates did not significantly impact our data (Pearson correlation > 0.99; Additional file 1: Figure S1K). Noteworthy, for distant teQTLs, the correlations corrected by surrogate variable were less significant (Pearson correlation 0.79–0.99). Second, we added the top three, five, and ten predicted covariates automatically predicted by the PEER package [113] to the linear mixed model, finding a similar fraction of QTLs with robust significance regardless of the inclusion of PEER covariates (90.63–96.60% of cis and 76.47–100.00% of trans QTLs). In large cohorts such as the Human GTEx Project, data is often collected from different sources and hidden covariates can explain a large fraction of the total variance in trans QTLs, stressing the need of aggressive corrections of potential confounders, which are common practice [114]. However, including hidden covariates did not significantly affect the results of our data as these added covariates explained a small fraction of the total variance. Moreover, correcting for hidden surrogate variables can negatively impact the detection of trans loci that associate with multiple genes [113]. Consequently, because of the robustness of our results when correcting for known and unknown effects on the identified set of QTLs, we decided to use the full set of detected QTLs for subsequent analyses.
All detected significant association results, including additional significance values for QTLs after correcting for relatedness, population structure, and hidden covariates, are reported in Additional file 3: Table S2 (eQTLs, riboQTLs and teQTLs).
Detection of tissue-specific and recurrent QTLs
Gene expression can be regulated in a highly tissue- and cell-type-specific manner and genetic effects on mRNA expression can similarly be both specific to, or shared amongst, tissues or cell types [8, 32]. Nevertheless, the difference in QTL significance between tissues can represent an artifact because of the presence of false negatives in one of the tissues. Hence, we estimated the statistical power to replicate QTLs across tissues and traits within the HXB/BXH panel adapting a method previously applied to RI lines [115]. For this, we calculated narrow-sense trait heritabilities (h2trait) using the formula reported by Bottolo et al. [116], based on the method of Hegmann and Possidente [117]: h2trait = 0.5VA/(0.5VA + VE), where VA is the additive genetic component, representing the variance of the strain means, and VE is the environmental component, representing the average variance across all strains. We estimated both components using the two replicated lines (BXH12 and BXH13, see “Replicate HXB/BXH Ribo-seq experiments”). The calculated average heritability was 0.443 for all expressed genes and 0.506 for the set of genes significantly associated to QTLs (Additional file 1: Figure S2D).
We estimated a power of 1 for standardized effect sizes above 0.7, which corresponds to the median effect size for the whole set of tissue-specific QTLs. The estimated power was ~ 0.7 for QTLs with standardized effect sizes lower than 0.55 (percentile 5th of the distribution of QTL effect sizes). Hence, only a very small fraction of low effect QTLs are expected to display tissue-specific significance because of undetected false negatives. In these calculations, standardized effect size estimates are fractions that represent differences in mean values of expression between homozygotes as a proportion of the total genetic variance. These values were calculated by running the R function “VarProp” [111] on the previously generated linear mixed models.
Considering only genes expressed in both tissues, both eQTLs and teQTLs show limited recurrence in QTL detection, indicative of high tissue specificity. Even though 83% of genes with cardiac eQTLs (605 out of 726) and 66% of genes with liver eQTLs (248 out of 377) are expressed in both tissues, we could only detect the same eQTL for 126 of these (17%). Similarly, the vast majority genes with teQTLs are expressed in both tissues (88% and 100% in heart and liver, respectively), though only a small fraction of teQTLs (n = 20; 9%) was independently detected in both. Moreover, tissue-specific QTLs exhibited a stronger effect size and missed associations manifested strongly reduced effect sizes, while the distribution of effect sizes remained constant in shared QTLs across both tissues (Additional file 1: Figure S2B). All but one of these recurrent eQTLs and teQTLs result from local associations (Additional file 3: Table S2), indicating strong enrichment of recurrent local over distant QTLs. This is in line with previous observations across human tissues [8, 32] and, in our study, likely influenced by the higher detection sensitivity for local over distant QTLs. A single distant eQTL for Tmcc2 forms the exception being regulated in trans in both tissues (Additional file 3: Table S2). Although isoform-specific expression regulation of human TMCC2, driven by local changes in chromatin dynamics, was previously shown to be of biological importance [118], its distant control was not yet known.
Finding causal variants for local teQTLs
To identify potential causal variants underlying teQTLs, we infused our genotype maps with known SHR/Ola- and BN-Lx-specific indels and SNVs that were previously identified through whole-genome sequencing [16, 17, 119]. Among all genes with a local QTL (either eQTL, riboQTL, or teQTL; Additional file 3: Table S2), we detect only 8 coding sequence variants with a predicted deleterious consequence resulting in one stop gain, one essential splice-site mutation, and six missense mutations [120, 121]. Of these, only a single missense variant in the Lss gene is associated with TE in the heart (teQTL padj = 0.0014; Additional file 3: Table S2). We find no variants altering the local translation initiation context or Kozak sequence—a previously proposed frequent cause of local teQTLs [10].
Detecting distant QTL hotspots with HESS
HESS [42] is a generic Bayesian variable selection approach, associated with an evolutionary stochastic search algorithm [122], and developed to tackle the challenging integrative task of linking parallel high-dimensional multivariate regressions in a computationally efficient way. When q genes are predicted by the same set p of SNPs, in HESS the prior probability of association between gene k (k = 1,…,q) and SNP j (j = 1,…,p) is decomposed into its marginal effects, i.e., πkj = πk × ρj, πkj ∈ [0, 1]. In this formulation, ρj ≥ 0 captures the relative “propensity” for SNP j to influence several genes at the same time. The SNP specific “propensity” ρj inflates/deflates the probability πk of selecting any SNP to be associated with gene k in a multiplicative fashion, i.e., the baseline risk for gene k to be associated to any SNP is increased/decreased by the “propensity” of SNP j to be a key regulatory marker or “hotspot.” For each gene k, the a priori baseline risk and the corresponding level of sparsity are controlled through a suitable choice of the hyper-parameters of the density p(πk). We ran R2HESS v1.0.1 [41] with default parameters. The marginal posterior probability of inclusion \( {\hat{\gamma}}_{kj}=\mathrm{E}\left({\gamma}_{kj}|\boldsymbol{Y},{\gamma}_{\backslash kj}\right) \) indicates the strength of association between gene k and SNP j after observing the data Y, and it is calculated as the number of times a particular gene-SNP pair has been selected. Significant gene-SNP associations were declared using a non-parametric FDR approach, where a mixture model of two beta densities was chosen to model the null H0 and the alternative H1 distributions. We ran the Expectation-Maximization algorithm [123] on \( {\hat{\gamma}}_{kj} \) (k = 1,…,q, j = 1,…,p) to estimate the parameters of mixture model and, for a fixed FDR level, we calculated the optimal cut-off point on t such that the estimated FDR is not greater that the desired one. Finally, the proportion of genes associated with each SNP is defined as the average number of genes that are significantly predicted by each SNP. This measure helps to prioritize SNPs that influence multiple genes at the same time and allows the discovery of so-called regulatory hot spots, i.e., genetic loci that are associated with a large number of mRNAs.
Western blot analysis and quantification
Frozen left ventricle tissues from congenic rats (SHR.BN-(3S) and SHR.BN-(3L)) were lysed in ice-cold modified RIPA buffer (150 mM NaCl, 50 mM Tris HCL pH 7.4, 1% Triton X-100, 0.5% sodium deoxycholate, 0.1% SDS, 5 mM EDTA, and 2 mM EDTA) containing protease (cOmplete™, EDTA-free Protease Inhibitor Cocktail) and phosphatase (PhosSTOP) inhibitors as described in [66]. After incubation on ice for 30 min, samples were centrifuged at 20,000g for 15 min at 4 °C and supernatants were transferred to new pre-chilled tubes. Proteins were denatured for 10 min at 70 °C in NuPAGE LDS Sample Buffer (× 4) (Invitrogen; NP0007) and NuPAGE Sample Reducing Agent (× 10) (Invitrogen; NP0009) and separated on NuPAGE 4-12% Bis-Tris Protein Gels (Invitrogen; NP0343BOX) for 30 min in MES buffer (Invitrogen; NP0002) at 200 V. Gels were blotted on PVDF membranes (Immobilon-PSQ Membrane, Merck Millipore; ISEQ00010), and membranes were stained with the following primary antibodies: p-IRE1α Ser724 (NB100-2323, Novus Biological), IRE1α (3294, Cell Signaling Technology), XBP1s (83418S, Cell Signaling Technology), GAPDH (ab125247, Abcam), HSP60 (12165, Cell Signaling Technology), and TOM20 (42406, Cell Signaling Technology). Protein expression was measured with chemiluminescence and quantified using Image Studio Lite software (version 5.2, LI-COR). Background-subtracted densitometric signals from SHR.BN-(3S) and SHR.BN-(3L) samples were normalized against the loading control (and the unphosphorylated protein form in case of p-IRE1α Ser724) and statistically significant differences between SHR.BN-(3S) and SHR.BN-(3L) samples were determined using unpaired Student’s t tests. Uncropped, full-size western blots including ladders can be found in Additional file 1: Figure S6.
Stoichiometry of the cardiac thin filament
The thin filament is composed of five sarcomere subunits—Actc1, Tpm1, Tnnc1, Tnnt2, Tnni3—where each unit has a known proportion of 7:1:1:1:1 [44]. So as to study how the production rates of the five thin filament proteins deviate from the compositionally stoichiometric optimal ones in the HXB/BXH RI and the congenic rat samples, we estimated the observed proportions by correcting the DESeq2-normalized counts by CDS length and by gene turnover rate. Gene turnover rates for Actc1, Tpm1, Tnnc1, Tnnt2, and Tnni3 have been previously estimated to be 10.3, 5.3, 3.2, 3.5, and 5.5 days, respectively [45].
Excluding a technical basis for the length effect
Theoretically, sample-specific gene length biases can artificially induce length-related expression differences that in turn contribute to incorrect enrichment of GO terms related to short (e.g., ribosomal) or long (e.g., ECM) proteins [124]. However, for multiple reasons, we deem it highly unlikely that a technical or analytical bias could be responsible for the length-dependent effect observed in our study. First, the RI lines are all genetic mosaics, and the length dependency is specific for a single locus. Second, the length effect is specific to the heart and absent in liver. Third, data generation, normalization, and statistical analysis are all identical for all sequencing samples analyzed. Fourth, no single documented technical covariate explains any of the variance across samples (e.g., date of tissue processing, library preparation batch, sequencing flow cell, or RNA integrity of the sample; see Additional file 2: Table S1). Fifth, Ribo-seq and polysome fractionation experiments in congenic lines fully reproduce the translation phenotype, indicating a model- and technology-independent effect. Sixth, the effect is absent in RNA-seq data and the correlation with length is stronger for CDS length than for total transcript length. Lastly, previous work on SNORD24 revealed a highly similar polysome half-mer phenotype accompanied by a length-dependent effect on TE [79].
Thermodynamic properties of 5′ untranslated regions
5′ UTRs of mRNAs are known to be important regulators of translation. The folding free energy is the difference in free energy between an unfolded and folded state. For a given 5′ UTR, a lower folding free energy corresponds to a more stable secondary structure, and it is associated with low rates of translation initiation [60, 125]. Here, we calculated folding free energies for 5′ UTRs using the RNAfold program (v.2.4.13) from ViennaRNA package [126].
Reanalysis of translational machinery mutant datasets
We downloaded processed [75, 78,79,80, 83,84,85] and raw [82, 86] data from nine Ribo-seq studies that have investigated the consequences of various translational machinery mutants in translation. For each dataset, RNA-seq and Ribo-seq counts were normalized and translational efficiencies were calculated following a similar approach than for the rat datasets (see “Methods”—“Quantifying mRNA expression and translation”). We divided the datasets into wild type and mutant, averaging the number of normalized counts when multiple replicates were available. For each human, mouse, and yeast gene, we calculated the maximum CDS length by extracting all CDSs from Ensembl v.85.
We highlighted ten datasets where the TE calculations were strongly influenced by a (likely technical) length-specific artifact in the RNA-seq data (Pearson’s correlation r > 0.15) that was absent in Ribo-seq data—an effect presumably amplified by length normalization of poly(A)-purified RNA of low integrity [124, 127] (Additional file 1: Figure S5D).
General remarks on quantification and statistical analyses
The generation of figures and execution of statistical tests were performed using R [128]. GO enrichment analyses were performed using gProfiler2 v0.1.8 [129]. A detailed list of software used for data processing, quantification, and analysis is stated in the respective “Methods” sections. We used DESeq2 v1.26.0 [103] to perform differential gene expression analyses for mRNA-seq and Ribo-seq data. Differentially expressed genes were defined with an FDR ≤ 0.05 and a log2 fold change ≤ 1/1.2 or ≥ 1 × 1.2 for downregulated and upregulated genes respectively. Correlation coefficients between coding sequence (CDS) length and fold changes (FC) in gene expression were based on the Standardized Major Axis (sma) Estimation model (R package “smatr”) [130]. Only CDS with a minimum length of 100 nucleotides and an average number of DESeq2-normalized counts higher than 10 were considered for correlation analyses and plotting. Statistical parameters such as the value of n, mean/median, standard deviation (SD), and significance level are reported in the figures and/or in the figure legends.