Comparative regulomics supports pervasive selection on gene dosage following whole genome duplication

Whole genome duplication (WGD) events have played a major role in eukaryotic genome evolution, but the consequence of these extreme events in adaptive genome evolution is still not well understood. To address this knowledge gap, we used a comparative phylogenetic model and transcriptomic data from seven species to infer selection on gene expression in duplicated genes (ohnologs) following the salmonid WGD 80–100 million years ago. We find rare cases of tissue-specific expression evolution but pervasive expression evolution affecting many tissues, reflecting strong selection on maintenance of genome stability following genome doubling. Ohnolog expression levels have evolved mostly asymmetrically, by diverting one ohnolog copy down a path towards lower expression and possible pseudogenization. Loss of expression in one ohnolog is significantly associated with transposable element insertions in promoters and likely driven by selection on gene dosage including selection on stoichiometric balance. We also find symmetric expression shifts, and these are associated with genes under strong evolutionary constraints such as ribosome subunit genes. This possibly reflects selection operating to achieve a gene dose reduction while avoiding accumulation of “toxic mutations”. Mechanistically, ohnolog regulatory divergence is dictated by the number of bound transcription factors in promoters, with transposable elements being one likely source of novel binding sites driving tissue-specific gains in expression. Our results imply pervasive adaptive expression evolution following WGD to overcome the immediate challenges posed by genome doubling and to exploit the long-term genetic opportunities for novel phenotype evolution.


Background
Whole genome duplication (WGD) events have played a major role in eukaryotic evolution by increasing genomic complexity and functional redundancy [1]. This can allow gene duplicates (referred to as ohnologs) to escape selective constraints and thereby accumulate previously forbidden mutations that may become adaptive [2]. In agreement with this idea, WGD has been associated with the evolution of adaptive traits in yeast [3], plants [4,5], and vertebrates [6][7][8]. At the same time, it is also evident that most polyploids go extinct shortly after formation [9] and that becoming a successful new polyploid likely requires new adaptations to overcome fitness costs stemming from having a doubled genome [10,11]. Yet, the importance of selection in shaping polyploid genome evolution in the aftermath of WGDs is still not well understood [1,12].
Gene expression levels are relatively easy to measure and compare, and represent a major source of complex trait variation [13] and novel adaptive phenotypes [14,15]. Hence, there has been substantial interest in understanding consequences of WGDs on gene regulatory evolution. Comparative transcriptomics has both revealed immediate plastic responses to adjust gene dosages [16], as well as widespread regulatory divergence at evolutionary timescales (e.g., [17][18][19][20]). Ohnolog regulatory evolution is also mostly asymmetric, with one copy retaining an ancestral-like regulation, and the other copy losing and/or gaining expression in one or more tissue [12]. Although this observation can be reconciled with adaptive evolution of gene regulatory phenotypes following WGD, methodological limitations have made it difficult to distinguish between the outcomes of selection and neutral drift [12,21].
Here we take a novel approach to improve our understanding of how selection shapes novel gene regulatory phenotypes following WGD. We first developed a flexible and user friendly version of a phylogenetic Ornstein-Uhlenbeck (OU) model of gene expression evolution [22,23] in R (https://gitlab.com/sandve-lab/evemodel). The crux of this model is that it allows us to evaluate if changes in expression evolution deviate from the null hypothesis of stabilizing selection, and thereby identify putative adaptive shifts in expression regulation. We then used this model to analyze the liver transcriptome of four salmonids and three non-salmonid fish species to assess the impact of the 80-100million-year-old salmonid-specific WGD (Ss4R) [24,25]. We find that this WGD led to a burst of gene expression evolution, leading to rare tissue-specific gains in expression and pervasive tissue non-specific dosage selection, reflecting both adaptive possibilities afforded by genome doubling and immediate challenges that must be overcome to succeed as a polyploid lineage.

Adaptive shifts in expression levels following WGD
To study expression level evolution following WGD, we generated RNA-seq datasets from livers (four biological replicates) of four salmonids and three non-salmonid outgroup species (Fig. 1a). We then computed gene trees to identify retained ohnologs from the salmonid WGD. In total, we included 10,154 gene trees in our analyses (Additional file 1: Figure S1), of which 65% (6689 trees) contained ohnologs derived from the salmonid WGD. For each gene tree, we then applied a phylogenetic Ornstein-Uhlenbeck (OU) process model to test for adaptive shifts in expression evolution (referred to simply as 'shifts') in the ancestor of the salmonids included in this study (Fig. 1b, Additional file 1: Figures S2, S3 and S4).
Two major observations arise from this analysis. First, it is evident that the rate of adaptive gene expression evolution is increased for salmonid ohnologs. Forty percent of trees (1649) with retained ohnologs display evolution of novel expression levels in at least one ohnolog compared to only 20 % of trees with a single copy gene (Fig. 1c). Secondly, there is a clear difference in the nature of the expression evolution between ohnologs and singleton genes. Ohnologs are strongly biased towards evolving decreased expression levels following WGD (Fig. 1c), with 75% (1234/1649) of the ohnolog pairs displaying a shift down in either one or both copies. Conversely, singletons show a Tissues with higher expression in up-shifted copy  Fig. 1 Expression level evolution following WGD. a Phylogenetic tree of the species included in the study, with the estimated time of the salmonid-specific whole genome duplication (Ss4R) indicated. b Conceptual illustration of the expression level evolution tests. c Proportion of complete singleton (top) and ohnolog (bottom) gene trees with significant shifts in expression level in a salmonid ancestor. d, e Heatmaps show tissue expression, from an independent tissue atlas in Atlantic salmon, of ohnolog pairs where one copy has shifted up (d) or down (e) in liver. Barplots show the distribution of the number of tissues where the shifted copy has lower or higher expression than the conserved copy. Only ohnologs from complete orthogroups (panel c) are included in the heatmap. Each ohnolog pair (row) is scaled so that red signifies the highest expression across the two copies and blue the lowest. The color bar indicates the number of tissues that are experiencing a shift in expression in the same direction as that of liver (down (d), up (e)) between the shifted and conserved copy. f Proportion of partial gene trees (i.e., trees with some gene loss) with significant shifts in expression level in a salmonid ancestor. The shadings indicate that we report here up/down shifts for the complete salmonid clade and the partial salmonid clade separately, which is in contrast to panel c where both salmonid clades are complete and therefore indistinguishable. g Cumulative proportion of dN/dS for ohnologs with one copy shifted down, versus their conserved counterpart. Results are shown for all ohnologs with one copy shifted down (down-shift) and for the subset that is downshifted in all tissues in the tissue atlas (down-shift all tissues affected). h Cumulative proportion of TE content in promoters of ohnologs with one copy shifted down small bias towards evolving increased expression (Fig. 1c). This difference could not be explained by differences in statistical power related to systematic differences in gene expression levels between singletons and ohnologs (Additional file 1: Figure S5). To test if the identified expression level shifts following WGD were tissue-specific, we analyzed RNA-seq data from 15 Atlantic salmon tissues (Additional file 1: Figure  S6A). We find that most cases of expression evolution are not liver-specific (Fig. 1d, e), and that this is true both for genes evolving increased and decreased expression following WGD. When one ohnolog copy had evolved a shift in liver expression level, this copy also displayed similar trends in the majority of the other 14 tissues compared to its conserved ohnolog partner (shift down 77% (682/885), shift up 70% (221/317)). Hence, evolution of liver-specific changes in ohnolog expression following WGD is rare, irrespective of the directionality of change.
Upon reaching a new optimal ohnolog gene dosage, the expectation is that the copy with the highest expression level contributes the most to the proteome and cell function, which will result in reduced purifying selection pressure on the more lowly expressed copy [26]. Several lines of evidence support this expectation. Firstly, species-specific gene loss events (expected for genes evolving under relaxed selection) are associated with increased probability of evolving lower liver expression in one copy (Fig. 1f) and with increased probability of the downshifted copy to have reduced expression levels across all the other 14 tissues (Fisher's exact test, p = 3.1e−07, Additional file 1: Figure S6B). Secondly, we find that the down-shifted copy shows increased signatures of relaxed purifying selection on coding sequences in the form of elevated dN/dS rates (Fig. 1g, p = 2.1e −6, N = 732, one-sided paired Wilcoxon test, Additional file 1: Figure S7). Lastly, we also observe that down-shifted ohnolog copies have a significantly higher load of potentially destructive transposable element (TE) insertions in promoters compared to the conserved partner (Fig. 1h, one-sided paired Wilcoxon test, p = 6.5e −4, Additional file 1: Figure S8). Importantly, the effect size of increased dN/dS and TE-load were larger when only considering ohnologs with signatures of down-shift across all tissues (Fig. 1g, h).
Pervasive differences in purifying selection pressure within individual ohnolog pairs raise the question of whether these ohnologs might belong to duplicated genome blocks experiencing large-scale differences in selective constraints. This could lead to uneven ohnolog loss rates, a process that is referred to as biased fractionation [27]. In line with previous studies on teleosts [28,29], we found significant biases in local gene loss, albeit only in 9 of 47 syntenic duplicate blocks. However, we did not find equivalent large-scale biases in expression loss (Additional file 1: Figure S9), thus rendering regional differences in selection constraints an unlikely explanation for the large number of ohnologs experiencing loss of expression in one copy.
In conclusion, we find widespread signatures of adaptive regulatory evolution in retained ohnologs following WGD; however, most adaptive events were associated with ohnolog gene dose reduction across many tissues. Thus, ohnolog copies that evolve lower expression levels compared to their partner continue to evolve under relaxed purifying selection pressure, following a likely path towards pseudogenization.

Strong selection on housekeeping gene dose after WGD
To test if selection on gene regulation following WGD was linked to particular cellular functions or pathways, we performed KEGG enrichment analyses for two ohnolog gene sets that had evolved either increased (up) or decreased (down) expression levels. Genes with increased expression level were enriched (Fisher's exact test, p < 0.05) in three pathways: "fatty acid elongation," "fatty acid metabolism," and the "cell cycle" (Additional file 1: Table S1). Detailed analysis identified 29 up-shifted genes encoding proteins with essential cell division functions. These genes were highly enriched in protein-protein interactions conserved in both unicellular and multicellular eukaryotes (Additional file 1: Table  S2, Additional file 1: Figure S10) and suggest compensatory regulatory adaptation to maintain a functional cell division and ensure genome stability.
Down-shifted genes had comparatively stronger functional signatures (Additional file 1: Table S1) with nine enriched pathways (Fisher's exact test, p < 0.05). The three pathways with the strongest enrichment were "oxidative phosphorylation" (p = 0.003) involved in mitochondrial-associated cellular energy production, "ribosome biogenesis in eukaryotes" (p = 0.008) which consists of genes involved in assembly of the ribosome, and "ribosome" (p = 5.6e−9) which consists of ribosomal subunit genes (Supplementary figures 11, 12 and 13). These results support strong selection on gene dosage for many housekeeping functions following WGD, which aligns well with our observation (Fig.  1d, e) that most expression level shifts occurred across most tissues.
The gene balance hypothesis predicts that selection operates to maintain stoichiometry of interacting gene products [30], and this is believed to result in long-term retention of ohnologs. Using the human orthologs of salmonid genes, we queried the CORUM database of protein complexes and found that the proportion of ohnologs in protein complexes was slightly higher (22%) than the proportion of singletons (18%) (Fisher's exact test, p = 1.04e−5, Additional file 1: Figure S14A). We also found that complexes tended to contain only singletons (p = 0.03) or only duplicates (p = 1E−3, Additional file 1: Figure S14B) more often than expected by chance. It is also plausible that stoichiometric imbalances could be rescued through evolution of novel gene dosage. Under this model, we predict that singletons in protein complexes that contain ohnologs should be enriched for shifts up in expression, while shifts down are predicted for ohnologs in complexes with singletons. These predictions are not well supported for singletons (Fisher's exact test, p = 0.07) nor ohnologs (Fisher's exact test, p > 0.48) (Additional file 1: Table S3).
Taken together, we find strong evidence for dosage selection following WGD on genes involved in basic cellular maintenance and cell division. In addition, we find evidence for selection to retain stoichiometric balance both at the sequence and expression level.
Mechanism driving ohnolog regulatory divergence is associated with functional constraints Our analysis allows us to assign ohnolog pairs to different regulatory categories (Fig. 2a) that potentially represent distinct evolutionary routes to new gene dosage optimums after WGD. Indeed our results show that ohnolog pairs with expression evolution shifts in the same direction evolve more symmetrically (down+down and up+up) while ohnologs where expression shifts occur in only one copy or in opposite directions display stronger asymmetric divergence (e.g., up/down+conserved) (Fig. 2a). To explore the links between these modes of regulatory divergence and gene function, we performed KEGG enrichment on each expression evolution category. Twenty-seven pathways were found enriched across these categories (Fig. 2b, Additional file 1: Table S4), which is more than twice as many as when grouping ohnologs into up-or downshifted genes (Additional file 1: Table S1). This supports that different pathways are biased towards either symmetric or asymmetric regulatory evolution. The three most enriched pathways were the same as when testing up-and down-shifted genes only, but our stratification on regulatory categories of ohnologs reveals that ribosomal subunit ohnologs ("Ribosome") evolved lower gene dosage through highly symmetrical down-shifts, while "oxidative phosphorylation" and "ribosome biogenesis in eukaryotes" are biased towards asymmetric divergence (Fig. 2c). Fig. 2 Symmetry of regulatory divergence. a Ohnolog expression evolution categories and expression evolution asymmetry for ohnologs in each evolutionary category. The expression asymmetry is calculated as the absolute value of the mean difference between ohnolog pair expression levels in all salmonid species. One sided Wilcoxon test pvalues are reported for significant asymmetry differences between symmetric and asymmetric regulatory categories. b KEGG pathways significantly enriched (p < 0.05) in different expression evolution categories. Larger circles indicate a higher proportion of genes in the pathway with the shift. c Expression asymmetry between salmonid ohnolog pairs in selected pathways, calculated by taking the absolute value of the mean difference in expression between ohnolog pairs in all salmonid samples. d Correlation between expression asymmetry (see (c) for details) and the dN/dS of the ortholog in the pike sister lineage. e Predicted bound TFBS from TF-footprinting in promoters of ohnologs in the five expression evolution categories as well as those ohnologs with no significant shift in expression levels. For each ohnolog pair in each category, copies are grouped based on the lowest (to the left) and highest (to the right) p-value in the OU-test for expression level shift. p-values from significant paired Wilcoxon tests are indicated above boxplots: *** < 1e−03, **** < 1e−04, ***** = 0 As ribosome subunit genes are known to be extremely slowly evolving genes (i.e., high sequence evolution constraints), we tested whether there is a broader correlation between sequence constraints and regulatory symmetry. Indeed, we find that ohnolog expression level symmetry is significantly correlated with the level of purifying selection on coding sequences (Spearman correlation, p = 1e−8, Fig. 2d).
To further dissect regulatory mechanisms driving ohnolog expression level evolution, we generated high coverage ATAC-seq data from the liver of Atlantic salmon and identified bound transcription factor binding sites (bTFBSs) using a footprinting approach (Additional file 1: Figure S15). We hypothesized that ohnolog regulatory evolution symmetry is shaped by the relative importance of selection on cis-versus trans-mutations. One simple prediction from this is that ohnolog pairs where one copy has evolved novel expression would have higher promoter divergence than ohnolog pairs with symmetric evolution. The divergence of bTFBSs in promoters (− 3000/+ 200 bps from transcription start site) largely matched this prediction (Fig. 2e) with ohnologs having more asymmetric expression shifts (up+cons and down+cons) differing more with respect to the number of bTFBSs in their promoters compared to symmetrically evolving ohnologs (up+up, down+down, and cons+cons) (Fig. 2e). This offers a simple explanation of expression divergence after WGD, where genes with decreased expression level have lost TFBSs, and genes with increased expression have gained TFBSs, compared to the ancestral promoter structure. Comparing the overall similarity of promoters, computed as the correlation of bTFBS between symmetrically evolving (down+down) and asymmetrically evolving (down+cons) ohnolog pairs, did not reveal a similar trend (Wilcoxon test, p = 0.234, Additional file 1: Figure S16), which is consistent with high turnover of bTFBS even for highly conserved genes [31].
Together these results support that evolutionary constraints at the coding sequence divert ohnologs down different evolutionary routes towards novel gene dosage-either in an asymmetric or symmetric fashion.
Adaptive gain in liver expression through acquisition of tissue-specific cis-regulatory elements Although the vast majority of adaptive expression evolution was associated with selection on lower gene dosage, our OU-analyses did reveal 30 ohnolog pairs where one copy had evolved liver-specific adaptive gains in expression following WGD. These genes are predicted to be involved in a variety of functions such as developmental processes, cell fate specificity, and more liver-centric functions such as endocrine signaling and lipid-and fatty-acid metabolism (Additional file 1: Table S5). To better understand the regulatory mechanisms involved in the evolution of these potential novel liver functions, we used our TF-footprinting data to test the hypothesis that adaptive gains in liver expression are linked to the acquisition of binding sites for TFs controlling liverspecific regulatory networks. Indeed, we found that promoters of up-shifted copies were occupied by many more liver-specific TFs than their non-shifted partners (Fig. 3a, Wilcoxon paired test, p = 7.7e-05). These liver-specific TFs are thus candidates for being involved in regulatory rewiring of up-shifted ohnologs (Fig. 3b). Interestingly, many TFs with the strongest bias towards occupying the promoters of up-shifted ohnolog copies have known general liver functions (i.e., hepatocyte nuclear factors; FOX1A, HNF4A) [32] and roles in lipid metabolism (RXR, PPARG, KLF15) [33,34] (Fig. 3c, see the "Methods" section for details).
Next, we hypothesized that liver-specific increases in expression are driven by gains in new TFBSs. One way promoters can gain novel TFBSs is through insertions of TEs that either contain a functional TFBS or subsequently accumulate mutations that give rise to new TFBSs [35]. Indeed, we did find that TFBSs predicted to be bound by liverspecific TFs overlapped TEs more often in up-shifted copies than in conserved copies (Wilcoxon paired test, p = 0.037, Additional file 1: Figure S17A). Furthermore, at the level of TE superfamilies we found that the TIR TC1-Mariner TE superfamily were associated with gain in liver-specific bTFBS in up-shifted copies (p = 0.018, Additional file 1: Figure S17B), which included known liver and lipid metabolism transcription factors such as HNF4A, KLF15, and RXRA (Additional file 1: Table S6). b Tissue expression of the 30 ohnolog pairs where one copy has evolved a liver-specific gain in expression (color bar: up-shifted copies are red and conserved copies are gray) and 22 liver-specific TFs predicted to bind at least one-third of the targets (purple). TFs are named according to their motif(s) in JASPAR. Liver-specific genes are defined as having liver expression levels in the 90% quantile and tau-scores > 0.6. Each gene (row) is scaled so that red signifies the highest expression across the tissues and blue the lowest. c Regulatory network reconstructed for the ohnologs and selected TFs from b using footprinting data. Ohnologs are represented by circles sized by their regulatory complexity (in-degree) and colored according to their evolutionary expression shift with red signifying up-shift and blue downshift. TFs are represented by diamonds with the nine most up-shift-biased TFs shown. A directed gray edge means that the TF has at least one bTFBS in the promoter of the gene. A dotted undirected green edge connects ohnologs In conclusion, we find that adaptive gain in liver-specific expression is strongly associated with gain in liver-specific bound TFBSs, some of which have been facilitated by transposable element insertions.

Discussion
The consequence of WGDs for evolution of novel adaptations, including gene expression levels, has been an actively debated topic within evolutionary biology [1]. A key challenge has been to distinguish neutral from adaptive evolution in systems where experimental evolution is not possible [12]. Here, we generated a large comparative transcriptomics dataset and for the first time applied a formal phylogenetic model to infer selection on gene expression in the aftermath of a vertebrate WGD that occurred 80-100 million years ago.

Selection on gene dosage ameliorates immediate polyploid fitness costs
Newly formed polyploids often display augmented rates of abnormal mitosis, chromosome loss, and gross chromosomal rearrangements [36,37]. Hence, a primary challenge for the evolutionary success of polyploids is to maintain genomic stability. In line with this, we find that adaptive evolution of gene expression was highly biased towards cellular functions not specific to the liver (Figs. 1e, f and 3b) and with a clear potential impact on genome stability. Firstly, we find genes directly involved in the cell cycle to be enriched for adaptive evolution (higher dosage). Related genes have experienced selective sweeps following WGD in plants [38,39]. Furthermore, we find strong evidence for selection on genes involved in oxidative phosphorylation (lower dosage). Polyploidization in plants, fungi, and mammalian cells have been shown to increase levels of reactive oxygen species, which is causally linked to increased cellular stress, cell cycle failure, and increased genome instability [40][41][42]. Lastly, we find adaptive expression evolution (lower dosage) for genes involved in translation (ribosome subunits and ribosome assembly) after WGD. Regulation of translation also interacts with cell cycle regulation, with potential implications for genome stability [43]. However, selection for decreased expression of translation-related genes could also be linked to direct fitness costs of wasteful protein translation or harmful effects linked to the over-production of particular proteins. Overall, our study provides evidence for a scenario where a critical first step in becoming a successful polyploid lineage is pervasive adaptive evolution on gene dosage to ameliorate fitness costs linked to genome stability.

Long-term ohnolog retention and selection on gene dosage
Following an early phase of selection on gene dosage, the long-term fates of ohnologs can be shaped by various adaptive processes [21,44], including adaptive regulatory evolution. One potential outcome is adaptive divergence between ohnologs, resulting in two functionally non-redundant ohnologs under purifying selection. Our results demonstrate that tissue-specific shifts (up+cons) in expression are rare (Fig. 1c), and interpret this to mean that adaptive evolution of novel tissue-specific regulation likely has very little impact on genome wide ohnolog retention.
Selection on molecular stoichiometry has been proposed to play a major role in genome evolution after WGDs [30]. This narrative is supported by our finding that molecular complexes are both enriched for retained ohnologs and biased to include only singletons or only ohnologs (Additional file 1: Figure S14). However, selection on molecular stoichiometry could also drive evolution of gene expression. Indeed, we do find some (but weak) support for selection on stoichiometric balance also operating through selection for higher expression levels of singleton genes that are in complexes with ohnologs (Additional file 1: Table S3, p = 0.07). Moreover, it is plausible that the strong bias of "oxidative phosphorylation"-ohnologs towards highly asymmetric expression regulation also is linked to selection on stoichiometry (Fig. 2b, c). These genes are nuclear encoded genes involved in energy-related functions in mitochondria. As WGDs do not double the plastid numbers it has been proposed that in plants stoichiometric imbalances between nuclear and plastid genomes act as selection pressure to reduce the ratio of nuclear to plastid gene dosage following WGD [45]. In line with this reasoning, the driver behind the strong asymmetric down shift of "oxidative phosphorylation"-ohnologs could be the reinstatement of stoichiometric balance between the nuclear and mitochondrial genes.
At the other end of the ohnolog expression evolution symmetry spectrum, we find ohnologs belonging to the "ribosomal protein" pathway evolving lower expression in a highly symmetric fashion (Fig. 2b, c). We also demonstrate a significant correlation between constraints at the coding sequence level and symmetry of ohnolog regulatory evolution (Fig. 2d). This is in line with findings from plants that ribosomal proteins are retained over long evolutionary times and evolve slowly at both sequence and expression levels following WGD [46]. One potential explanation for this pattern could be the "toxic effects model" where long-term conservation of ohnologs is intrinsically linked to the "danger" of accumulating highly toxic coding sequence mutations [47,48]. We therefore hypothesize that in situations where lowering the total gene dosage increases fitness, and the tolerance for accumulation of deleterious mutations is low (i.e., the toxic effect), symmetric ohnolog evolution towards lower gene dosage could be favored over pseudogenization of one copy. Eventually, mutations can arise that create completely non-functional pseudogenes without toxic-effects, and these can then be fixed in the population. This would result in an enrichment of singletons among genes that are likely to produce toxic effects, as observed in plants [45].

Divergence of chromatin landscapes and ohnolog expression
Regulatory divergence after gene duplication is hypothesized to be linked to evolution of local chromatin landscapes [18,49]. Using ATAC-seq data we show that signals of adaptive expression level shifts are associated with the numbers of bound TFBSs (Fig.  2e), consistent with a billboard-like model of gene regulation [50]. Furthermore, we find that both loss of expression (Fig. 1h) and tissue-specific gains in expression level (Additional file 1: Figure S17) is linked to TE activity, highlighting the dual role of TEs in regulatory evolution following WGD.

Conclusion
Our study supports pervasive selection on gene dosage across millions of years following WGD, in particular for genes involved in basic cellular maintenance and genome stability. Interestingly, many of the homologous genes and pathways also show similar responses in gene dosage adjustments immediately after polyploidization in plants [16]. Reconciling these immediate effects of polyploidization with our findings strongly supports the following model: Plastic genome regulatory response to polyploidization alleviate immediate fitness costs following genome doubling. Since gene loss is absent in early generations polyploids, and all genes are duplicated and in stoichiometric balance, early plastic changes in gene regulatory phenotypes is likely a result of deleterious fitness effects due to suboptimal absolute gene dosages. Over evolutionary time-scales however, selection will favor and fix regulatory mutations that can "hard code" novel transcriptional phenotypes to optimize gene dosages (as seen following the salmonid WGD). Together, this paper points to critical genome regulatory adjustments for becoming a successful polyploid lineage.

Ortholog inference
For ortholog inference, we used thirteen species including six salmonids (Thymallus thymallus, Hucho hucho, Salmo salar, Salvelinus sp., Oncorhynchus mykiss, and Oncorhynchus kisutch), four telosts as outgroups to the salmonids (Danio rerio, Oryzias latipes, Gasterosteus aculeatus, and Esox lucius), one non-teleost fish (Lepisosteus oculatus) and two mammals as outgroups to the teleosts (Homo sapiens and Mus musculus). We only report the genus name for the char (Salvelinus sp.) because it was recently discovered that the material used for sequencing Salvelinus alpinus could have been a very closely related sister species (Salvelinus malma) or a hybrid between the two [51]. Protein sequences were obtained from ENSEMBL (release 92) for H. sapiens, M. musculus, L. oculatus, D. rerio, O. latipes, and G. aculeatus, from NCBI RefSeq assemblies for S. salar (GCF_000233375.1), Salvelinus sp. (GCF_002910315.2), O. mykiss (GCF_002163495.1), O. kisutch (GCF_002021735.1), and E. lucius (GCF_000721915.3), from the genome paper for T. thymallus [52] and from an in-house annotation using Transdecoder (https://github.com/TransDecoder/TransDecoder/wiki) for H. hucho (GCA_003317085). The single longest protein per gene was assigned to gene ortholog groups (orthogroups) using OrthoFinder (v2.3.1) [53]. For each orthogroup, the corresponding CDS sequences were aligned using MACSE (v2.03) before gene trees were generated and reconciled against the species tree using TreeBest (v1.9.2). The gene trees were then split at the level of monophyletic teleost clades, defining what we refer to as trees in this article, and again at the level of the salmonid clade (excluding T. thymallus and H. hucho), defining the Ss4R duplicate clades. Trees were then selected based on their topology (Additional file 1: Figure S1). Specifically, this filtered any trees that showed more than two salmonid clades or that contained additional paralogs inside the salmonid clades or in the outgroup species. Trees with all orthologs retained in the salmonid clade(s) were designated as complete, and otherwise as partial. In addition, trees were excluded from further analysis if (1) one or both salmonid clades had no expressed genes (zero mapped reads, RNA-seq data described below), (2) the E. lucius ortholog was missing or not expressed, and (3) both the D. rerio and O. latipes orthologs were missing or not expressed.

RNA-sequencing data
Liver tissue samples were collected from adult individuals of D. rerio (zebrafish), O. latipes (medaka), E. lucius (pike), O. mykiss (rainbow trout), S. alpinus (Arctic char), and O. kisutch (coho salmon) (Fig. 1a). Samples were taken in replicates of four, or three in the case of rainbow trout. All fish were raised in fresh water under standard rearing conditions in aquaculture facilities (salmonids), animal laboratory facilities (zebrafish and medaka), or restocking hatcheries (pike). Total RNA was extracted from the liver samples using the RNeasy Plus Universal Kit (QIAGEN). Quality was determined on a 2100 Bioanalyzer using the RNA 6000 Nano Kit (Agilent). Concentration was determined using a Nanodrop 8000 spectrophotometer (Thermo Scientific). cDNA libraries were prepared using the TruSeq Stranded mRNA HT Sample Prep Kit (Illumina). Library mean length was determined by running on a 2100 Bioanalyzer using the DNA 1000 Kit (Agilent) and library concentration was determined with the Qbit BR Kit (Thermo Scientific). Paired-end sequencing of sample libraries was completed on an Illumina HiSeq 2500 with 125-bp reads. Raw RNA-seq and processed count data have been deposited into ArrayExpress under the projects E-MTAB-8959 and E-MTAB-8962. For S. salar (Atlantic salmon), RNA-seq data was obtained from a feeding trial using four samples from individuals in freshwater fed a marine based diet [54], available in the European Nucleotide Archive (ENA) under project PRJEB24480 (samples: ERS2101563, ERS2101567, ERS2101568, ERS2101569).
To generate gene expression data, RNA-seq reads were mapped to the annotated reference genomes using the STAR aligner with default settings [55]. RSEM [56] was used to estimate read counts and Transcripts Per Million reads (TPM)-expression values that are normalized for average transcript lengths and the total number of reads from each sample.
The trimmed mean of M values (TMM), from the R package edgeR [57], was used to compute normalization factors for the gene expression data. The replicates were first normalized within each species and then between species (Additional file 1: Figure S2). Between-species normalization was accomplished by first computing species-specific normalization factors using genes from singleton orthogroups (i.e., groups containing only one gene from each species) and their mean expression values (i.e., mean of the replicates within each species), and then by normalizing the individual replicates from each species using these normalization factors. All expression values were log transformed (log2(TPM+0.01)) prior to testing for expression shifts.

Evolutionary shifts in gene expression
The EVE model [22] was used to test for shifts in gene expression levels in the salmonid clade(s) within each gene tree. For this paper, we developed and implemented a user friendly version of the EVE algorithm in R (https://gitlab.com/sandve-lab/ evemodel). This method models an OU process, i.e., random drift in expression level that is constrained around an optimal level. The test compares a model with two optimal expression levels, one for the salmonid branch and another for the outgroup species, against the null-model which has the same optimal expression level across the entire tree (Additional file 1: Figure S3C). For ohnolog gene trees which contain two duplicate salmonid clades, each clade was tested separately by removing the other salmonid clade.
EVE was given the expression data for each species (four samples/replicates per species) and the species tree produced by OrthoFinder. For every ortholog, a likelihood ratio test (LRT) score is calculated, representing the likelihood of the alternative hypothesis over the null hypothesis. LRT scores were compared to a chi-squared distribution with one degree of freedom and scores above the 95% quantile were considered to be significant. EVE reports estimates of the expression optimum for the salmonid branch and the rest of the tree (i.e., outgroup species), and the difference between salmonid estimates and outgroup estimates provided the direction of the expression shift.

Tissue atlas
Gene expression data from an Atlantic salmon tissue atlas [17] was clustered using Pearson correlation and the R function hclust with method = "ward.D". Heatmaps were drawn using the R function pheatmap with scale = "row".

Coding sequence selection pressure
We estimated branch-specific selection pressure on coding sequences in ohnolog gene trees by calculating dN/dS measured at the branch from the WGD node to the root of each duplicate clade using the aBSREL (adaptive Branch-Site Random Effects Likelihood) method [58] in Hyphy (Hypothesis Testing using Phylogenies) [59]. A one-sided paired Wilcoxon test was then performed to test if there is a difference in selection pressure between ohnolog pairs classified as asymmetrically shifted at the expression level.

Transposable elements
Transposable element (TE) annotations were taken from [17]. For Atlantic salmon genes, we calculated the proportion of gene promoter sequence (+ 2 kb/−200b from TSS) that was overlapped with TEs using bedtools intersect of promoter and TE annotations. We used a one-sided paired Wilcoxon test to test the hypothesis that, for ohnologs with an asymmetric shift down in expression, the shifted copy had a higher proportion of TE overlap than the conserved copy.

Gene function enrichment
We assigned KEGG pathway annotations to the orthogroups based on the Northern pike ortholog and its KEGG annotations. We then tested each set of ohnologs within an expression shift category for the enrichment of KEGG pathways using the kegga function from the R package limma, with all tested ohnologs as the background.

Protein complexes
We assigned orthogroups as being in a protein complex or not based on the human ortholog and its protein complex annotations from the CORUM database [60]. We used the Fisher's exact test, for singleton and ohnolog genes, to test whether more genes within an expression shift category were in a protein complex than expected by chance. To test if complexes were biased towards only containing singletons or only ohnologs, we randomized the singleton/ohnolog label 10,000 times and reported empirical p-values.

ATAC-seq generation and TF footprinting
Four Atlantic salmon (freshwater stage, 26-28 g) were euthanized using a Schedule 1 method following the Animals (Scientific Procedures) Act 1986. Around 50-mg homogenized brain and liver tissue was processed to extract nuclei using the Omni-ATAC protocol for frozen tissues [61]. Nuclei were counted on an automated cell counter (TC20 BioRad, range 4-6 um) and further confirmed intact under microscope. A total of 50,000 nuclei were used in the transposition reaction including 2.5 μL Tn5 enzyme (Illumina Nextera DNA Flex Library Prep Kit), incubated for 30 min at 37°C in a shaker at 200 rpm. The samples were purified with the MinElute PCR purification kit (Qiagen) and eluted in 12 μL elution buffer. qPCR was used to determine the optimal number of PCR cycles for library preparation [62] (8-10 cycles used). Sequencing libraries were prepared with short fragments and fragments > 1000 bp removed using AMPure XP beads (Beckman Coulter, Inc.). Fragment length distributions and confirmation of nucleosome banding patterns were determined on a 2100 Bioanalyzer (Agilent) and the library concentration estimated using a Qubit system (Thermo Scientific). Libraries were sent to the Norwegian Sequencing Centre, where paired-end 2 × 75 bp sequencing was done on an Illumina HiSeq 4000. The raw sequencing data for brain and liver is available through ArrayExpress (Accession: E-MTAB-9001).
Reads were mapped using BWA-MEM [63]. Duplicate reads and reads mapping to mitochondrial or unplaced scaffolds were removed. Peaks were called using MACS2 [64]. TF footprinting was performed with TOBIAS [65] based on the aligned reads, peaks, and TF motifs from JASPAR (JASPAR 2020 non-redundant vertebrate CORE PFMs) [66]. TOBIAS performs Tn5 bias correction, generates footprint scores for each base within the peaks, scans for TFBSs using the given TF motifs, and finally classifies each TFBS as bound or unbound based on the footprint scores.
For the analysis of ohnolog pairs with evolved liver-specific expression increases in one copy, we identified 30 up+cons pairs (60 target genes) where the liver expression of the up-copy was at least 90% of the maximum expression in the tissue atlas and the up-copy had a tissue specificity score (tau) > 0.6 [17]. To identify regulators of these genes, we BLASTed UniProt TF sequences with a motif in JASPAR to the Atlantic salmon proteome, and retained the top four hits with E-value <1E−10 and alignment length > 100. We then filtered these TFs for having bTFBS in the promoter of at least 20 of the target genes and for having liver-specific expression (same criteria as for uptargets). This resulted in 22 liver-specific TFs predicted to bind 17 different JASPAR motifs in 52 target promoters (Fig. 3b, c). Finally, to draw the network in Fig. 3c, we (1) selected, for each JASPAR motif, the single TF with the strongest evolutionary shift in expression; (2) removed JASPAR motifs with highly similar binding profiles (> 80% overlap in target genes, retaining the TF with the strongest evolutionary shift); and (3) merged TFs associated with more than one JASPAR motif into one node and selected the nine TFs with the strongest bias towards up-shifted targets.