Exploiting the GTEx resources to decipher the mechanisms at GWAS loci

The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-020-02252-4).


Terminology
For clarity and to reduce ambiguities, we provide the definition of some of the key terms used in the manuscript.
Trait: Here, trait (or complex trait) is used for observable, quantitative trait of individuals, such as presence of a disease or an anthropometric measurement. When speaking about traits, we do not include molecular phenotypes like gene expression or intron splicing quantification.
LD block/LD region: Region of the genome containing variants in LD among themselves, as determined from empirical LD patterns observed in 1000 Genomes [Berisa and Pickrell, 2016]. Variants in different LD blocks are unlikely to be correlated.

GWAS locus:
This term is, in general, used somewhat loosely to refer to a region with a significantly associated variant which may span from tens to hundreds of kilobases depending on the LD of the region. However, here for quantification, we define it as one of the approximately independent LD blocks from [Berisa and Pickrell, 2016] that harbor a GWAS significant association. If multiple traits exist for a GWAS significant association in the block, we count them as distinct.

eQTL, eVariant:
Here an eVariant is a genetic variant that is associated (FDR< 0.05) with the expression of a gene. eQTL refers to the variant-gene pair, in which the variant is an eVariant for the gene. sQTL, sVariant: An sVariant is a genetic variant that is associated (FDR< 0.05) with the splicing (quantified as intron excision ratio) of a gene. sQTL refers to the variant-gene pair, in which the variant is an sVariant for the gene.

Fine-mapped variant:
We call fine-mapped variant to the proxy for causal variant which we selected using dap-g's posterior inclusion probabilities. These variants that are within credible sets with total posterior inclusion probability of at least 0.25 and have variant-level pip> 0.01. Within each credible set, one such variant is selected for our analysis.
LD contamination: This phenomenon occurs when the variant that alters the expression or splicing is distinct from the one that alters the complex trait, but they are in LD. In these circumstances, the QTL will be associated with the GWAS trait and the GWAS variant will be associated with the molecular trait, but there is no causal relationship between the gene and the complex trait.

Posterior inclusion probability (pip):
This is the probability that a variant has a causal effect on a trait. These probabilities are calculated by Bayesian fine-mapping approaches such as dapg and susier.
PrediXcan: This term refers to the family of methods that seeks to identify causal genes by correlating the genetic component of gene expression (mRNA level and splicing) with the trait. This family includes S-PrediXcan (which uses GWAS summary statistics rather than individual level data) and MultiXcan (which aggregates evidence of associations across all tissues leveraging the fact that e/sQTLs are shared across tissues). We use PrediXcan as a generic term to refer to this family of methods.
6 Silver standard genes: To test the ability of colocalization and association methods to identify true causal genes, we curated a set of 'causal' genes. To emphasize the imperfect nature, we use the term silver standard genes. In this context, the term OMIM gene is used as the causal gene for the trait.

Genotype-Tissue Expression (GTEx) Project
All processed Genotype-Tissue Expression (GTEx) Project v8 data have been made available on dbGAP (accession ID: phs000424.v8). Primary and extended results generated by consortium members are available on the Google Cloud Platform storage accessible via the GTEx Portal (see URLs).The GTEx Project v8 data, based on 17,382 RNA-sequencing samples from 54 tissues of 948 post-mortem subjects, has established the most comprehensive map of regulatory variation to date. In addition to the larger sample size and greater tissue coverage compared to v6, v8 data also included whole-genome sequencing data, facilitating high resolution QTL map of 838 subjects for 49 tissues with at least 70 samples. The GTEx consortium mapped complex trait associations for 23,268 cis-eGenes and 14,424 cis-sGenes [The GTEx Consortium, 2020]. We did not include trans QTLs in our analyses due to limited power after correcting for confounders and potential pleiotropic effect in complex trait associations. Below, we briefly describe the whole-genome sequencing, RNA-sequencing and QTL data processing protocols. Detailed description of subject ascertainment, sample procurement, and sequencing data processing are available elsewhere [The GTEx Consortium, 2020].

Whole-genome sequence data processing and quality control
Out of 899 WGS samples sequenced at an average coverage of 30x on HiSeq200 (68 samples) and HiSeqX (all other samples), variant call files (VCF) for 866 GTEx donors were included in downstream analyses after excluding one each from 30 duplicate samples and three donors. Of these, 838 subjects with RNA-seq data were included for QTL mapping and subsequent complex trait association analyses in our study. All wholegenome sequencing data were mapped to GRCh38/hg38 reference.

Harmonization of GWAS summary statistics
The process followed for the harmonization and imputation are depicted in Fig. S1. For each standardized GWAS summary statistics, we mapped all variants to hg38 (GRCh38) references using pyliftover (see URLs). For missing chromosome or genomic position information in the original GWAS summary statistics file, we queried dbSNP build 125 (hg17), dbSNP build 130 (hg18/GRCh36), and dbSNP build 150 (hg19/GRCh37) using the provided variant rsID information and the original reference build of the GWAS summary statistics file. Variants with missing chromosome, genomic position, and rsID information were excluded from further analyses. Only autosomal variants were included in our analyses. Missing allele frequency information was filled using the allele frequencies estimated in the GTEx (v8) individuals of genotype-based European genetic ancestry (here onwards, GTEx-EUR) whenever possible. We excluded variants with discordant reference and alternate allele information between GTEx and the GWAS study. We included only the alleles with the highest MAF among multiple alternate alleles if the variant was reported as multiallelic in GTEx. When more than one GWAS variant mapped to a given GTEx variant (i.e., the same chromosomal location in hg38), only the one with the highest significance was retained. For binary traits, if the sample size was present but the number of cases was missing, we filled the missing count with the sample size and number of cases reported in the paper. For continuous traits, if the file contained the sample size for each variant, the reported number was used. If not, we filled this value using the number reported in the corresponding publication. If only some variants were missing sample size information, we filled the missing value with the median of all reported values.

Imputation of GWAS summary statistics
To standardize the number of variants across tissue-trait pairs, all processed GWAS results were imputed. We implemented the Best Linear Unbiased Prediction (BLUP) approach [Lee et al., 2013;Pasaniuc et al., 2014] in-house (https://github.com/hakyimlab/summary-gwas-imputation) to impute z-scores for those variants reported in GTEx without matching data in the GWAS summary statistics. This algorithm does not impute raw effect sizes (β coefficients). The imputation was performed in specific regions assumed to have sufficiently low correlations between them, defined by approximately independent linkage disequilibrium (LD) blocks [Berisa and Pickrell, 2016] lifted over to hg38/GRCh38.
Only GTEx variants with MAF > 0.01 in GTEx-EUR subjects were used in downstream analyses. Covariance matrices (reference LD information) were estimated on these GTEx-EUR subjects. The corresponding (pseudo-)inverse matrices for covariances C were calculated via Singular Value Decomposition (SVD) using ridge-like regularization C + 0.1I. To avoid ambiguous strand issues homogeneously, palindromic variants (i.e. CG) were excluded from the imputation input. Thus, an imputed z-score was generated for palindromic variants available in the original GWAS; for them, we report the absolute value of the original entry with the sign from the imputed z-score. The sample size that we report for the imputed variants is the same as the sample size for the observed ones if it is reported as constant across variants, or their median if it changes across the observed variants, which occurs in the case of meta-analyses.
We initially considered publicly available GWAS summary statistics for 114 complex traits provided by large-scale consortia and the UK Biobank [Bycroft et al., 2018] (Additional Table S1). Of these, 27 studies with a relatively small intersection of variants with the GTEx panel (number of variants< 2 × 10 6 , compared to almost 9 × 10 6 variants available in GTEx) exhibited significant deflation of their association p-values (Fig. S3). Thus, all analyses focused on 87 traits where missing variants could be properly imputed unless otherwise stated explicitly (Table S2). We observed noteworthy association prediction performance across the selected 87 traits (e.g., with a median r 2 = 0.90 (IQR = 0.0268) between the original and imputed zscores on chromosome 1). The median slope was 0.94 (IQR = 0.0164), as the imputed zscore values tend to be more conservative than the original ones. Imputation quality was consistent across traits, depending    Table S1.  Table S2.

IGAP GWAS
We used summary results from an Alzheimer's Disease study from International Genomics of Alzheimer's Project (IGAP). IGAP is a large two-stage study based upon genome-wide association studies (GWAS) on individuals of European ancestry. In stage 1, IGAP used genotyped and imputed data for 7,055,881 single nucleotide polymorphisms (SNPs) to meta-analyze four previously-published GWAS datasets consisting of 17,008 Alzheimer's disease cases and 37,154 controls (The European Alzheimer's disease Initiative -EADI the Alzheimer Disease Genetics Consortium -ADGC The Cohorts for Heart and Aging Research in Genomic Epidemiology consortium -CHARGE The Genetic and Environmental Risk in AD consortium -GERAD). In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer's disease cases and 11,312 controls. Finally, a meta-analysis was performed combining results from stages 1 & 2.

NHGRI-EBI GWAS catalog
In addition to the GWAS summary statistics described above, we obtained the list of trait-associated SNPs from the GWAS catalog [Buniello et al., 2019] (downloaded on 9/7/2018), which, at download, contained 80,727 entries. To measure the enrichment of e/sQTL in the GWAS Catalog, we computed the proportion of e/sQTL in the GWAS catalog relative to the proportion of e/sQTL among all GTEx V8 variants. We then obtained a measure of the uncertainty in the proportion and enrichment-fold using block jackknife. See [The GTEx Consortium, 2020] for details.

Correlated t-test to summarize across traits and tissues
Most statistics shown in these analyses are at the tissue-trait level. There are 4,263 statistics, generated from 49 tissues and 87 traits. Typical statistical tests assume the data from which the statistic is computed is sampled independently and identically distributed (IID). Among different tissues, there are wide ranges of standard errors and different patterns of correlation. Because of this, the IID assumption can not be applied to the tissue-trait statistics. Therefore, we describe our derivation of standard errors when statistics are summarized across traits for a given tissue, and when statistics are summarized across tissue and trait pairs. In the following paragraphs, we use S tp to indicate a statistic estimated in tissue t and trait p. This statistic has standard error se(S tp ).
Summarizing across traits for a given tissue. When we have one statistic per tissue-trait pair and summarize across traits in a given tissue, we assume the traits are independent, but we take into account the differences in standard errors. For each tissue t, we summarized S t1 , · · · , S tP by fitting the following linear model: Soμ t S is an estimate for the statistic S summarized across all traits in tissue t, and this estimate has standard error se(μ t S ). This is essentially a weighted average across traits.
Summarizing across trait and tissue pairs. When we summarize across all tissue-trait pairs, S 11 , · · · , S tp , · · · , S T P , we fit a similar linear model, which allows for correlation between tissues and correlation between traits, and corrects for differences in the standard errors.
Here, µ t S is the tissue-specific random intercept, and µ p S is the trait-specific random intercept. These components account for features common across traits that are specific to tissue t and features common across tissues that are specific to trait p respectively. The estimateμ S is the weighted average of S tp across all tissue-trait pairs, and its standard error is se(μ S ).
Testing whether two statistics have different mean. We would often like to test whether two statistics are different, e.g. enrichment signal measured for sQTL as µ S1 versus enrichment signal measured for eQTL as µ S2 . For this, we need to construct a test aggregating pairwise differences across all tissue-trait pairs. For this purpose, we constructed the following paired test. Our test statistic is We calculateμ T by summarizing across all tissue-trait pairs as described in the previous paragraph. Under the null H 0 : µ S1 = µ S2 and µ T ∼ N (0, se(μ T )).

Enrichment of QTLs among trait-associated variants
To estimate the proportion of SNPs considered as associated with expression (for at least one gene) at various p-value thresholds, we used the most significant p-value (tested using all GTEx individuals) for each SNP from all associations in all tissues (including all genes and variants tested). We observed that the proportion of variants associated with expression and splicing at different significance threshold was much larger for trait-associated variants from the GWAS catalog than for the full set of tested common variants ( Fig. 2). At a nominal threshold, the proportion of common variants associated with the expression of a gene in some tissue increased from 92.7% in the V6 release [GTEx Consortium et al., 2017] to 97.3% in V8. For splicing, the proportion was 97.7%. These results should serve as a cautionary note that assigning function to a GWAS locus based on QTL association p-value alone, even with a more stringent threshold, could be misleading.

Cis-region and covariates used in fine-mapping and prediction of expression and splicing traits
For each gene, we considered all variants within the cis-window (1Mbps) with MAF>0.01, and used the same covariates as in the GTEx v8 main eQTL analysis: sex, WGS plaform, WGS library preparation protocol, top 5 genetic principal components, and PEER factors. The number of PEER factors was determined from the sample size: 15 for n < 150, 30 for 150 ≤ n < 250, 45 for 250 ≤ n < 350, 60 for 350 ≤ n. section 6. This yielded a list of clusters (variants related by LD), and posterior inclusion probabilities (pip) that provide an estimate of the probability of a variant being causal. We repeated this process for splicing ratios from Leafcutter, using a cis-window ranging from 1Mbps upstream of the splicing event start location to 1Mbps downstream of the end location. We used individual-level data for GTEx-EUR subjects both for expression and splicing. We note that the main report of the GTEx v8 included individuals of non-European descent and reported only expression QTL fine-mapping. Sample sizes ranged from 65 in kidney cortex to 602 in skeletal muscle tissues. All results are made publicly available (https://github.com/hakyimlab/gtex-gwas-analysis).

Mediation analysis to quantify the dose-dependent effects of expression and splicing on traits
Enrichment of expression and splicing QTLs suggest a causal role of molecular trait regulation on complex traits. However, confounders such as LD contamination could be inflating these results limiting their interpretation. Here, we sought to gather stronger evidence for a causal link. We tested whether there is a dose-dependent effect of expression and splicing QTLs on complex traits and also whether independent QTLs provided similar measures of the mediated effects.

Selection of fine-mapped variants as instrumental variables and their effect sizes
To investigate the relationship between GWAS and QTL effect sizes in the transcriptome, we generated a set of fine-mapped QTL signals derived from dap-g fine-mapping performed in the GTEx-EUR individuals to serve as proxy for causal QTLs. For splicing, we utilized sQTLs at the splicing event/variant level rather than the gene/variant level. We considered only variants within credible sets with at least 25% total probability. Within each credible set, the variant with highest posterior inclusion probability was selected as the finemapped variant. Only variants with variant-level pip of at least 0.01 were considered. For each of the selected QTLs, we used the QTL effect size estimated from the marginal test (using the GTEx-EUR individuals) and the GWAS effect size reported by the study or if missing, calculated from the imputed z-score from the GWAS imputation byβ ≈ z/ f (1 − f )N , where f is the allele frequency and N is the GWAS sample size.

Correlation between GWAS and QTL effect sizes
To get a first-order approximation to the mediated effect sizes without imposing any modeling assumptions, we calculated the Pearson correlation of the magnitude of observed GWAS effect size and of cis-eQTL effect size, Cor(|δ k |, |γ k |), for the list of selected fine-mapped QTLs. This was done for each tissue-trait pair separately. The observed Pearson correlation captures the mediated effect (see details in Section 8.5). To obtain a null distribution for the correlation that accounts for the potential confounding effect of different local LD score values, we computed the Pearson correlation under the shuffled data within each LD-score bin defined by quantiles (100 bins were used). The significance of the difference between observed and null distribution was calculated using the correlated t-test method described in Section 4.

Modeling effect mediated by regulatory process
We compared the magnitude of GWAS and cis-QTL effect sizes, which is the basis of multi-SNP Mendelian randomization approaches [Bowden et al., 2015].
To formalize the relationship between the GWAS effect size (δ) and the QTL effect size (γ), we assumed an additive genetic model for the GWAS trait. Specifically, for variant k, where X k is the allele count of variant k, Y is the trait, and is the un-explained variation. We decomposed GWAS effect size into its mediated and un-mediated components, where G k represents the set of genes regulated by variant k with corresponding QTL effect size as γ k,g , and ν k is the un-mediated effect of variant k on trait. And β g is the downstream effect of gene g on the trait.

Transcriptome-wide estimation of mediated effects
To estimate the transcriptome-wide contribution of the mediated effects on complex traits, we proposed a mixed-effects model on the basis of Eq. 8, where b 0 , b 1 are the fixed effect capturing the un-mediated effect and β g is the mediated effect of the gene or splicing event g. In short, we assumed a random effects model to account for the heterogeneity of the β's and aimed at estimating σ 2 gene as the transcriptome-wide average of the mediated effect. For each tissue-trait pair, we fitted the model using selected fine-mapped QTLs, as described in Section 8.1, along with the correspondingδ k (GWAS effect for variant k),γ k,g (QTL effect for variant k, gene g). To obtain the distribution of σ 2 gene under the null, we performed the same calculation using shuffled GWAS effect sizes. The effect allele choice is arbitrary and we chose them so that all the GWAS effects are positive. This choice made the modeling of the effect of local LD more straightforward since we expect that variants in high LD regions may tag more causal variants and end up with a larger estimated GWAS effect, which would result in a positive b 1 . The square root of LD-score represents better the potential effect of LD score on the effect size. Using absolute value of the GWAS effects (|δ k |) and sign(δ k ) · γ k,g in equation (9) is a convenient way to implement the recoding of the effect allele.

Robustness of the estimation of the mediating effect to LD contamination
We illustrate the intuition behind the LD-contamination correction when the average mediated effects are estimated using the approximate method (correlation of absolute values) or the mixed-effects approach.
Additional Fig. A1: Schematic representation of LD contamination. SNP1 has a causal effect on the expression level of a gene but not on the trait (Disease here), δ 1 = 0 and γ 1 = 0. SNP2 has a causal effect on the trait but not on the expression of the gene, δ 2 = 0 and γ 2 = 0.
Consider the LD-contamination scenario where SNP 1 and SNP 2 are in LD with correlation R 2 (suppose LD is fixed) and have a non-zero effect on gene expression and trait, respectively (as shown in Fig. A1). The marginal effect estimates of SNP 1, i.e.δ 1 andγ 1 , are given bŷ where Eq. 12 holds because the marginal effect size depends on LD. To determine the covariance of the magnitude of the GWAS and QTL estimates for SNP 1, we consider E(|δ 1 ||γ 1 |).
where Eq. 16 holds since the last three terms in the previous line are zeros, due to the independence among GWAS , QTL , and true effect sizes, δ and γ. Hence, the covariance of the GWAS and QTL effect sizes under the LD contamination scenario is which implies that conditioning on LD, the observed correlation betweenδ andγ should be very small.

Additional Fig. A2: Diagram representation of mediation model.
Similarly, we can derive the correlation between GWAS and QTL effect size estimates under the simple mediation model shown in Fig. A2, where we havê where Eq. 22 follows by definition of the mediation model considering no direct effect. So, So, if we consider a gene locus, which naturally conditions on local LD and gene-level effect β, we can conclude that

Concordance of mediated effects for allelic series of independent eQTLs
Under the mediation model in Eq. 8, we expect that for a given gene with multiple QTL signals, these signals should share the same downstream effect, β g . Since the number of splicing events with multiple QTL signals was limited, we restricted this analysis to eQTLs only. We tested for concordance of downstream effect size obtained from the primary and secondary eQTL of a gene (ranked by QTL significance or QTL effect size estimate). Specifically, for a given trait and gene g, we defined the observed downstream effect for the kth variant asβ k,g =δ k /γ k,g . Thus, for each gene, we obtainedβ prim andβ sec as the observed downstream effect for the primary and secondary eQTLs if more than one eQTL signal was detected by dap-g. Ideally, for a mediating gene in a causal tissue (or a good proxy tissue), we would expect thatβ prim andβ sec should be similar. We measured the concordance in two ways: 1) correlation betweenβ prim andβ sec ; 2) percent concordant, defined as the fraction of eQTL pairs having the same sign inβ prim andβ sec . The results of 1) were reported in [The GTEx Consortium, 2020].
Visualizing the concordance among colocalized genes. To visualize the concordance of β prim andβ sec , we first scaledδ andγ by their standard deviation among all eQTLs selected in Section 8.1.
Then, we extracted the set of genes with at least two dap-g eQTLs (defined in 8.1) and labelled the top two eQTLs (rank by QTL effect size magnitude) as primary and secondary based on QTL significance or QTL effect size. We computedβ prim andβ sec and removed the genes withβ prim orβ sec in the top and bottom 5%. As a control, we also simulated random δ to compute simulated β sim for downstream analysis. We further filtered the genes by selecting only those with enloc rcp > 0.1.

Identifying patterns of regulation of expression across tissues
We used FLASH Sparse Factor Analysis  to identify latent factors specific to different tissue clusters. We ran flashr on a set of top eQTLs (obtained from all GTEx individuals) per gene which had been tested in all 49 tissues (around 16,000 eQTLs in total were selected) and shown strong evidence of being active in at least one tissue. Then, for each selected variant-gene pair, the marginal effect size estimates were extracted for all 49 tissues regardless of whether it was significant in that tissue or not. The resulting estimated effect-size matrix (of dimension ∼ 16, 000 × 49) was the input to flashr (with normal prior on loading and uniform with positive support as prior on factor) to obtain the sparse factors.
The flashr run yielded 31 FLASH factors (Fig. S13), which were used to assign the tissue-specificity of an eQTL.
We defined the eQTL cross-tissue patterns by projecting the estimated effect-size vector across 49 tissues onto the FLASH factors and computed the quality of the projection, PVE, as PVE k = β k 2 2 β 2 2 . PVE k represented the quality score for using FLASH factor k to explain the cross-tissue pattern of eQTL. The eQTL was assigned to a FLASH factor k if PVE k was maximal among all FLASH factors and PVE k > 0.2 and for those with PVE k ≤ 0.2 in all FLASH factors, NA (short for not assigned) was assigned instead. These "not assigned" eQTLs had more complex tissue-sharing pattern than the factors captured in the FLASH analysis. To obtain an interpretable tissue-specificity category, we labeled Factor1 as the shared factor, Factor2, Factor13, Factor14, Factor29, and Factor30 as brain-specific factors, and the rest of the factor assignment as other factors.
We applied the multivariate adaptive shrinkage implemented in mashr  to smooth cis-eQTL effect size estimates (obtained from all GTEx individuals) by taking advantage of correlation between tissues. To fit the mashr model, we used the set of ∼ 16, 000 cis-eQTLs as stated in Section 9 to learn the mashr prior, and then fit the mashr model using ∼ 40, 000 randomly selected variant-gene pairs for the same set of eGenes. We learned data-driven mashr priors in three ways: 1) FLASH factors as described above; 2) PCA with number of PC = 3; 3) empirical covariance of observed z-scores. The data-driven covariances were further denoised by calling cov_ed in mashr. Furthermore, we included the set of canonical covariances as described in  as an additional mashr prior. We fit the mashr model using the set of randomly selected variant-gene pairs with the error correlation estimated by applying estimate_null_correlation function in mashr and the priors obtained above. The resulting mashr model was used to compute the posterior mean, standard deviation, and local false sign rate (LFSR) for any variant-trait pair.

Causal gene prioritization
Two classes of methods can be used to identify the target genes of GWAS loci. One class is based on the colocalization of GWAS and QTL loci, which seeks to determine whether the causal variant for the trait is the same as the causal variant for the molecular phenotype. The other class is based on the association between the genetically regulated component of gene expression (or splicing) with the trait. We applied representative examples of each class of methods.

Colocalization
For a given variant associated with multiple traits such as gene expression (eQTL) and complex disease (trait-associated variant), extensive LD makes it challenging to identify the underlying true causal mechanisms. Colocalization approaches attempt to address this problem. Here, we conducted colocalization analysis using two independent approaches: coloc [Giambartolomei et al., 2014] and enloc [Wen et al., 2017]), to estimate whether a gene's expression or a splicing event shares a causal variant with a trait.

enloc
We computed Bayesian regional colocalization probability (rcp) using enloc, to estimate the probability of a GWAS region and a gene's cis window sharing causal variants. We used the dap-g results described in 7, which was based on EUR individuals only. We split the GWAS summary statistics into approximately LD-independent regions [Berisa and Pickrell, 2016], each region defining a GWAS locus. For each tissuetrait combination, we computed the rcp of every overlapping GWAS locus to a gene's or splicing event's cis window with enloc's default execution mode.
For each trait, we counted the number of GWAS loci that contain a GWAS significant hit, and among these, the number of loci that additionally contain a gene with enloc colocalization rcp > 0.5. As shown in Fig. A4C, across traits, a median 29% of loci with a GWAS signal contain an enloc colocalized signal. Given enloc's conservative nature, we caution that rcp < 0.5 does not mean that there is no causal relationship between the molecular phenotype and the complex trait; rather, it should be interpreted as lack of sufficient evidence with current data. We summarize the findings in Fig. A5. We observed a smaller proportion of GWAS loci containing a colocalized splicing event (median 11% across traits).

coloc
We computed coloc on all cis-windows with at least one eVariant (cis-eQTL per-tissue q-value< 0.05) or sVariant. For each gene's cis-window, we used summary statistics from the GWAS traits and the main GTEx eQTL/sQTL analysis. For binary traits, case proportion and 'cc' trait type parameters were used. For continuous traits, sample size and 'quant' trait type parameters were used. In both cases, imputed or calculated z-scores were used as effect coefficients in Bayes factor calculations.
Coloc is very sensitive to the choice of priors. We used enloc's enrichment estimates to define databased priors in a consistent manner. First, we defined likely LD-independent blocks of variants using definitions provided previously [Berisa and Pickrell, 2016]. The probability of eQTL signal, Pr(d i = 1), was estimated using dap-g [Wen, 2016]. Subsequently, we calculated priors p 1 , p 2 , and p 12 for colocalization analyses as follows: × Pr(d i = 1), and where α 0 and α 1 indicate intercept effect estimate and log odds ratio estimate for the enrichment using enloc, respectively. We ran coloc using variants in the cis-window for each gene and the intersection with each GWAS trait, obtaining five probabilities for each gene-tissue-trait tuple: P0 for the probability of neither expression nor GWAS having a causal variant; P1 for the probability of only expression having a causal variant; P2 for only the GWAS having a causal variant; P3 for the GWAS and expression traits to have distinct causal variants; P4 for the GWAS and expression traits to have a shared causal variant. We repeated this process using sQTL results.

Fine-mapping of height GWAS using summary statistics
To investigate the robustness of fine-mapping, we fine-mapped "height" from the GIANT GWAS metaanalysis and "standing height" from the UK Biobank using susieR . We performed fine-mapping using susie_bhat within each LD block [Berisa and Pickrell, 2016]. We used GWAS effect sizesβ imputed from z-scores byβ = z/ N f (1 − f ) and se(β) =β/z, where f is allele frequency and N is GWAS sample size. The GTEx-EUR individuals were used to calculate the reference LD panel. We recorded 95% credible set which has posterior probability 95% to capture a causal signal. To compare the fine-mapping results of two GWASs, we defined their 95% credible sets as "overlapped" if they shared at least one variant. To see how 95% in GIANT GWAS is colocalized with UK Biobank GWAS, we calculated colocalization probability as i:varianti∈95%CS of GIANT PIP i,GIANT × PIP i,UKB .

Predicting the genetically regulated components of expression and splicing
To predict expression, we constructed linear prediction models [Barbeira et al., 2020], using only individuals of European ancestry, and variants with MAF > 0.01, for genes annotated as protein-coding, pseudo-gene, or lncRNA. For each gene-tissue pair, we selected the variants with highest pip in their cluster, and kept those achieving pip > 0.01 in dap-g [Wen, 2016]. We used mashr  effect sizes (as computed in 9) for each selected variant. For each model, we computed the covariance matrix between variants using only individuals of European ancestries, with sample sizes ranging from 65 (kidney -cortex) to 602 (skeletal muscle). This allowed us to build LD panels for every tissue. For every gene, we also computed the covariance of all the variants present across the different tissue models, compiling a crosstissue LD panel to compute the correlation between predicted expression levels across tissues. We refer to these models as fine-mapped-mashr models. We compared the number of mashr models to the number of Elastic Net models from GTEx version 7 (Fig. S5). We generated analogous prediction models for splicing ratios, as computed by Leafcutter , applying the same model-building methodology to the data from the sQTL analysis. Expression phenotypes were adjusted for unwanted variation using the following covariates: sex, sequencing plaform, the top 3 principal components from genotype data, and PEER factors. The number of PEER factors was determined from the sample size: 15 for n < 150, 30 for 150 ≤ n < 250, 45 for 250 ≤ n < 350, 60 for 350 ≤ n. We obtained 686,241 models for different (gene, tissue) pairs.
We also generated analogous prediction models for splicing ratios, with the same model-building methodology applied to the data from the sQTL analysis, obtaining 1,816,703 (splicing event, tissue) pairs.

PrediXcan
We performed PrediXcan analysis [Barbeira et al., 2018] on the 87 complex traits, using the GWAS summary statistics described in 3.2, to identify trait-associated genes (typically p < 2.5×10 −7 ). We used the 49 models and LD panels described in 12.1, separately on each trait, to obtain 59,485,548 gene-tissue-trait tuples. Repeating this process to generate splicing event ratio models, we obtained 154,891,730 splicing event-tissue-trait tuples; for each trait, the Bonferroni-significance threshold was p < 9.5 × 10 −8 .

Colocalized and significantly associated genes
We assessed how many genes present evidence of trait association and colocalization, using both expression and splicing event. First, we counted the proportion of genes that showed a colocalized expression signal with any trait in any tissue, and observed 15% such genes at rcp> 0.5. Then, for each gene, we considered the splicing event with highest colocalization value in any trait or tissue, and found evidence for 5% at rcp> 0.5.
Then we repeated this process for PrediXcan associations at different signifcance thresholds. About 30% of genes showed a significant PrediXcan association to any trait, and only 8% when filtered for associations with rcp> 0.5. When using the highest splicing association and colocalization value for a gene, these proportions were 20% and 3%, respectively.
These proportions gauge our power to predict causal genes affecting complex traits on the GTEx resource, with expression yielding more findings than splicing.

Fig S6. Proportion of genes with a colocalized or associated signal using expression or splicing event.
A shows the proportion of genes with colocalization evidence in expression data, for different rcp thresholds. 3,477 genes show evidence at rcp> 0.5 (15% out of 23,963 genes with enloc results). B shows the proportion of genes with colocalization evidence in splicing data; 1,277 genes (5% of all 23,963) show evidence at rcp> 0.5. C shows the proportion of genes with association evidence in expression data, additionally filtered by colocalization on different thresholds. About 30% of genes show associations at the bonferroni threshold (p < 0.05/686, 241), while 8% also show colocalization evidence. D shows the proportion with association and colocalization evidence in splicing data; about 20% show association evidence (p < 0.05/1, 816, 703) and 3% are also colocalized.

S-MultiXcan
Given the substantial sharing of eQTLs across tissues [GTEx Consortium et al., 2017], we aggregated PrediXcan results across tissues using S-MultiXcan [Barbeira et al., 2019]. MultiXcan has been shown to exploit the tissue sharing of regulatory variation, to improve our ability to identify trait-associated genes. The method extends the single-tissue PrediXcan approach, leveraging GWAS summary statistics and taking into account the correlation between tissues. We obtained association statistics for 1,958,220 gene-trait pairs and 11,986,329 splicing event-trait pairs.

PrediXcan replication in BioVU
We replicated the significant gene-level associations for a prioritized list of traits (Additional Table A1) using BioVU [Denny et al., 2013], Vanderbilt University's DNA Biobank tied to a large-scale Electronic Health Records (EHR) database. We sought BioVU replication in the exact discovery tissues for the significant gene-trait associations. We restricted our analysis to subjects of European ancestries, using principal component analysis as implemented in EIGENSOFT (version 7.1.2; [Price et al., 2006]). First, we estimated the genetically determined component of gene expression in the BioVU individuals using the PrediXcan imputation models. We then conducted association analysis for the prioritized traits using logistic regression, with sex and age as covariates.

Additional Fig. A3: Causal gene prioritization using PrediXcan and enloc. (A)
Fine-mapped-mashr model predictions: summary of GWAS loci that also contain an associated PrediXcan or colocalized signal, for expression (left) and splicing (right). Significance was defined at Bonferroni-adjusted threshold for number of tests in each trait: p < 0.05/(gene-tissue pairs) = 7.28 × 10 −8 for expression, p < 0.05/(intro-tissue pairs) = 2.75 × 10 −8 for splicing. Colocalization status was defined as enloc rcp> 0.5. (B) Elastic net predictions: summary of GWAS loci that also contain an associated PrediXcan or enloc signal, for expression (left) and splicing (right), using Elastic Net models. Significance was defined at Bonferroni-adjusted threshold for number of tests in each trait: p < 0.05/(gene-tissue pairs) = 1.77 × 10 −7 for expression, p < 0.05/(intro-tissue pairs) = 9.51 × 10 −8 for splicing. Colocalization status was defined as enloc rcp> 0.5. The number of loci with potential target genes according to both S-PrediXcan and enloc went up from 1989 with Elastic Net models to 2125 with the improved fine-mapped-mashr models. The number of S-PrediXcan backed loci increased only 7, so that the added loci were mostly due to an increased overlap with enloc.

Summary-data-based Mendelian Randomization (SMR) and HEIDI
For comparison, we also performed top-eQTL based Summary-data-based Mendelian Randomization (SMR) [Zhu et al., 2016] analysis of the 4,263 tissue-trait pairs. SMR, which integrates summary statistics from GWAS and eQTL data, has been used to prioritize genes underlying GWAS associations.  [Berisa and Pickrell, 2016]) with at least one GWAS-significant variant (dark gray), and among them those with at least one gene reaching rcp > 0.5 (dark green). Panel C shows the proportion of loci with at least one GWAS-significant hit that contain at least one colocalized gene. Across traits, a median of 21% of the GWAS loci contain colocalized results. See trait abbreviation list in Table S2. These results were also presented in [The GTEx Consortium, 2020] and are shown here for completeness.

Additional
Additional Fig. A5: Colocalization of splicing QTLs for each of the 87 GWAS traits aggregated across the 49 tissues. The traits are ordered by number of GWAS-significant variants. GWAS loci are shown in gray, colocalized results are shown in dark green. Panel A shows the number of colocalized splicing event, achieving enloc rcp > 0.5 in at least one tissue, for each GWAS trait. As with gene expression results, the number of colocalized results tends to increase with the number of GWAS-significant variants. Panel B shows the number of loci (approximately independent LD regions from [Berisa and Pickrell, 2016]) with at least one GWAS-significant variant (dark gray), and among them those with one splicing event achieving rcp > 0.5 (dark green). Panel C shows the proportion of loci with at least one GWAS-significant hit loci with at least one colocalized splicing event. Across traits, a median of 11% of the GWAS loci contain a colocalized result, lower than the gene expression counterpart (29%), indicating a decreased power in the sQTL study. See trait abbreviation list in Table S2.  [Berisa and Pickrell, 2016]) with a significant GWAS association (gray), a significant S-MultiXcan association (purple), and a significant S-MultiXcan association that is colocalized (dark green). Anthropometric and Blood traits tend to present the largest number of associated loci, with Height from two independent studies leading the number of associations. Panel C) shows the proportion of loci with significant GWAS associations (gray) that contain S-Multixcan (purple) and colocalized S-MultiXcan associations (dark green). Across traits, a median of 70% of GWASassociated loci show a S-MultiXcan detection, while 19% show a colocalized S-MultiXcan detection. See trait abbreviation list in Table S2.

Fig S8. PrediXcan splicing associations aggregated across tissues.
This figure summarizes S-MultiXcan associations for each of the 87 traits using splicing models. The traits are ordered by number of GWAS-significant variants. Panel A) shows in purple the number of S-MultiXcan significant splicing events, and in dark green the subset also achieving enloc rcp > 0.5 in any tissue. The proportion of colocalized, significantly associated splicing events is typically 2%, much lower than the proportion from gene expression (12%). Panel B) shows the number of loci (approximately independent LD regions [Berisa and Pickrell, 2016]) with a significant GWAS association (gray), a significant S-MultiXcan association (purple), and a significant S-MultiXcan association that is colocalized (dark green). As in the case of expression models, Anthropometric and Blood traits tend to present the largest number of associated loci. Panel C) shows the proportion of loci with significant GWAS associations (gray) that contain S-Multixcan (purple) and colocalized S-MultiXcan associations (dark green). Across traits, a median of 63% of GWASassociated loci show an S-MultiXcan association, while 11% show a colocalized S-MultiXcan association. These proportions are lower than the corresponding ones for expression (70% and 19% respectively). See trait abbreviation list in Table S2.

Assessing the performance of association and colocalization methods to identify causal genes
To assess the performance of colocalization and association methods to identify causal genes, we curated two sets of 'causal' gene-trait pairs. One set is based on the OMIM database and the other one is based on rare variant association results from exome-wide association studies. To quantify the performance, we framed the causal gene identification problem as one of classification and used the standard tools such as ROC and precision recall curves, which have the advantage of not needing ad-hoc thresholds and show the full trade-off between true positives and false positives as well as precision vs. power. Throughout this section, we limited our scope to only the protein-coding genes.  To obtain a curated set of trait-gene pairs from the OMIM database [Hamosh et al., 2005], we mapped our GWAS traits to the OMIM traits and linked them to the corresponding genes in the OMIM database. The mapping process is illustrated in Fig. S9 for a specific example GWAS trait, fasting glucose by the MAGIC consortium. First, the GWAS trait was mapped to the GWAS catalog trait names by searching for relevant keywords (defined manually Additional Table A3) in the description field of the GWAS catalog. Second, the GWAS catalog trait names were linked to phecodes using the mapping in the phewas catalog [Denny et al., 2013]. Third, we mapped phecodes to OMIM traits ids (MIM) as described in [Bastarache et al.,  The keywords used for the each of the initial selected set of 114 GWAS traits is listed in Additional Table A3). For a subset of datasets with GWAS results from more than one source (public GWAS vs UKB) in our collection, we kept the dataset with higher number of GWAS loci to avoid double counting. The number of GWAS loci was determined based on counting the lead variants, using the PLINK V1.9 command -clump-r2 0.2 -clump-p1 5e-8 at genome-wide significance 5 × 10 −8 ) for each trait. Furthermore, for this analysis, we excluded GWAS traits with fewer than 50 GWAS loci. The full list of OMIM based trait-gene pairs is listed in Additional Table S4.
With this procedure, we curated a list of 1,592 gene-trait pairs with evidence of causal associations in the OMIM database (hereafter, OMIM genes), which was downloaded from omim.org/downloads (accessed on Aug 12th 2019). After matching traits, we retained 29 unique traits and 631 unique genes that were within the same LD block [Berisa and Pickrell, 2016] as the GWAS hit (Additional Table S4).

Rare variant association-based curation of causal genes
In addition to the OMIM-based curation, we collected a set of genes in which rare protein-coding variants were reported to be significantly associated with our list of complex traits. Given the power of existing rare variant association studies, we focused on height and lipid traits (low-density lipid cholesterol, high-density lipid cholesterol, triglycerides, and total cholesterol levels) [Marouli et al., 2017;Liu et al., 2017;Locke et al., 2019]. We collected significant coding/splicing variants reported previously [Marouli et al., 2017] and kept variants with effect allele frequency < 0.01 (table S6 therein: ExomeChip variants with Pdiscovery <2e-07 in the European-ancestry meta-analysis (N=381,625)). Similarly, we collected significant variants reported by [Liu et al., 2017] (table S12 therein: Association Results for 444 independently associated variants with lipid traits) and filtered out variants with minor allele frequency < 0.01. For the whole-exome sequencing study conducted in Finnish isolates [Locke et al., 2019], we extracted significant genes identified by a genebased test using protein truncating variants (table S9 therein: Gene-based associations from aggregate testing with EMMAX SKAT-O with P<3.88E-6) and significant variants (table S7 therein: A review of all variants that pass unconditional threshold of P<5E-07 for at least one trait) with gnomAD MAF < 0.01. The full list of trait-gene pairs constructed from the process is available in Additional Table S5.

Setting up the classification problem to quantify performance for identifying causal genes
Fig S10. Selection of genes to assess ability to identify silver standard genes. The GWAS summary statistics were binned into independent LD blocks (boundaries of LD block are shown as gray vertical lines).
Only genes within LD blocks that contain both a silver standard gene (red triangle) and a GWAS significant variant (points above − log 10 (p) > − log 10 (5 · 10 −8 )) were used in the calculation of performance (ROC and PR curves).

Additional Fig. A6: Schematic representation of data used for classification.
We partitioned the genome into approximately independent LD blocks [Berisa and Pickrell, 2016] and for each GWAS trait, we kept only genes located in LD blocks where there were at least one silver standard gene and a GWAS significant hit for the trait as illustrated in Fig. S10. Then, we labelled the silver standard genes as 1 and all the others were labelled as 0, as represented schematically in Fig. A6. We calculated the ROC and precision recall curves for classifying the silver standard gene correctly. Note that we used a universal cutoff across all GWAS loci, hence highly correlated genes would be classified as causal or non-causal as a cluster.
In more detail, for each of tested gene-trait pairs, we obtained the gene-level statistics for the corresponding trait from the application of various methods, i.e. enloc, coloc, SMR, and PrediXcanmashr. Since we had results across tissues, we selected the 'best' scores (highest regional colocalization probability (rcp) in enloc; highest posterior probability under hypothesis 4 in coloc; smallest p-value in SMR and PrediXcan-mashr ) to build the Table A6. For splicing (with statistics reported at the intron excision event level), we obtained gene-level statistics by taking the 'best' score among all splicing events of the gene, across all tissues. Additional Table A6: Count of GWAS loci with predicted causal effects overlapping likely functional genes. The number of GWAS loci and the number of silver standard genes included for analysis after taking the intersection between GWAS loci and silver standard genes are shown.

Additional
Additional Fig. A7: Distribution of the number of tested genes per GWAS locus overlapping OMIM-and rare variant-based silver standard. The distributions of the number of candidate genes per GWAS locus are shown for OMIM-based curation (top) and rare variant associationbased curation (bottom).
The number of GWAS loci and silver standard genes that remained after the above filtering steps can be found in Table A6. The number of genes tested per LD block is shown in Fig. A7. is in the gene, otherwise BPS from the gene boundary, rank_proximity: ranking by proximity within LD block (rank starts from 0 and the closer the lower rank), percentage_proximity: rank_proximity / number of genes in the locus, predixcan_mashr_eur_score: -log10 p-value (most significant across tissues is used) of PrediXcan-MASH trained on European data, enloc_score: rcp (max across tissues), predixcan_mashr_eur_rank: PrediXcan significance ranking within LD block (rank starts from 0 and the higher significance the lower rank), enloc_rank: enloc rcp ranking within LD block (rank starts from 0 and the higher rcp the lower rank), predixcan_mashr_eur_percentage: predixcan_mashr_eur_rank / number of genes in the locus, enloc_percentage: enloc_rank / number of genes in the locus, gene_name: Official gene symbol, gene_type: Gencode annotsted gene type, chromosome: Chromosome for the gene, start: Gencode annotated gene start position. All isoforms are combined, end: Gencode annotated gene end position. All isoforms are combined, strand: Gencode annotated gene strand. Table A8: PrediXcan and enloc results for presumed causal genes in the rare variant based silver standard. (See Additional file 9) Columns are: lead_var: the most significant variant within the LD block, trait: trait name, gene: Ensembl ID for the gene, is_ewas: Is included in the EWAS . TRUE if included, FALSE if not, proximity: 0 if variant is in the gene, otherwise BPS from the gene boundary, rank_proximity: ranking by proximity within LD block (rank starts from 0 and the closer the lower rank), percentage_proximity: rank_proximity / number of genes in the locus, predixcan_mashr_score: -log10 p-value (most significant across tissues is used) of PrediXcan-MASH trained on European data, enloc_score: rcp (max across tissues), predixcan_mashr_rank: PrediXcan significance ranking within LD block (rank starts from 0 and the higher significance the lower rank), enloc_rank: enloc rcp ranking within LD block (rank starts from 0 and the higher rcp the lower rank), predixcan_mashr_percentage: predixcan_mashr_eur_rank / number of genes in the locus, enloc_percentage: enloc_rank / number of genes in the locus, gene_name: Official gene symbol, gene_type: Gencode annotsted gene type, chromosome: Chromosome for the gene, start: Gencode annotated gene start position. All isoforms are combined, end: Gencode annotated gene end position. All isoforms are combined, strand: Gencode annotated gene strand.

Additional
And the resulting data tables are Table A7 and Table A8.

AUC of the ROC curves
For expression, the areas under the curve (AUC) of were, in increasing performance, 0.553, 0.591, 0.669, and 0.672 for coloc, SMR, enloc, and PrediXcan using the OMIM silver standard ( Figure 4C). AUC were higher when using the rare variant silver standard with SMR at the bottom of the ranking followed by coloc, PrediXcan, and enloc at the top (Additional Table A9). For splicing enloc had higher 0.650 vs. 0.632 for PrediXcan using OMIM silver standard and 0.714 and 0.686 using the rare variant silver standard.

ROC and PR curves under permuted data
To examine if the shapes of PR and ROC curves were driven by the bias buried in the data, we plotted the PR and ROC curves under the permuted data. Specifically, for each GWAS locus, we permuted the genes that overlapped with the locus while keeping the scores unchanged. We compared the PR and ROC curves for observed and permuted data for OMIM silver standard as shown in Fig. S11

Assessing the contribution of proximity, colocalization, and association significance
To investigate the usefulness of the colocalization and association statistics reported by enloc and PrediXcan respectively, we performed logistic regression, as described in Eq. 28, to fit log odds of being a 'causal' gene against the ranking of: 1) proximity to GWAS lead variant (from close to distal), 2) rcp from enloc (from high to low), and 3) gene-level association p-value from PrediXcan-mashr or SMR (from significant to non-significant). logit(Pr(causal i )) = β 0 + β 1 · rank(proximity i ) + β 2 · rank(rcp i ) + β 3 · rank(P-value i ), in which non-zero β k meant that the kth variable contributed independently on predicting whether a gene was causal. Moreover, negative β k indicated that the direction of contribution of the variable was as expected.
We note that here the analysis is performed by LD blocks rather than genome-wide as was done for calculating the ROC and precision recall curves. More specifically, the ranking within each LD block is used rather than genome-wide.

Causal tissue analysis
To identify tissues of relevance for the etiology of complex traits, we investigated the patterns of tissue specificity and tissue sharing of PrediXcan association results across 49 tissues. For each trait-gene pair, the PrediXcan z-score can be represented as a 49 × 1 vector with each entry being the gene-level z-score in the corresponding tissue (if the prediction model of the gene is not available in that tissue, we filled in zero). To explore the tissue-specificity of the PrediXcan z-score vector, we proceeded by assigning the z-score vector to a tissue-pattern category and tested whether certain tissue-pattern categories were overrepresented among colocalized PrediXcan genes as compared to non-colocalized genes. We used the FLASH factors identified from matrix factorization applied to the cis-eQTL effect size matrix, as described in Section 9 (as PrediXcan and cis-eQTL shared similar tissue-sharing pattern, data not shown). To obtain a set of detailed and biologically interpretable tissue-pattern categories from the 31 FLASH factors, we manually merged them into 18 categories as shown in Fig. S13. For each trait, we projected the z-score vector of each gene to one of the 31 FLASH factors (as described in Section 9) so that the gene was assigned to the corresponding tissue-pattern category. We defined a 'positive' set of genes as the ones with PrediXcan p-value that meets Bonferroni significance at α = 0.05 in at least one tissue and enloc rcp > 0.01 in at least one tissue, which could be thought as a set of candidate genes affecting the trait through expression level. We chose a rather low threshold used for the rcp due to the stringent conservative nature of colocalization probabilities. We also constructed a 'negative' set of genes with enloc rcp = 0, which could be thought as a set of genes whose expressions were unlikely to affect the trait. We proceeded to test whether certain tissue-pattern categories were enriched in 'positive' set as compared to 'negative' set. Since the main focus of this analysis was tissue-specific patterns, we excluded Factor1 (the cross-tissue factor) and Factor25 (likely to be a tissue-shared factor capturing tissues with large sample size). Additionally, we excluded Factor7 (testis), as it was unlikely to be the mediating tissue but might introduce false positives. We tested the enrichment of each tissue-pattern category by Fisher's exact test ('positive'/'negative' sets and in/not in tissue-patter category). Among 87 traits, 82 traits had enloc signal and the enrichment of these was calculated accordingly.
Fig S13. Patterns of tissue sharing identified via factor analysis using flashr. Tissue-pattern categories generated from FLASH applied to the cis-eQTLs are shown. Factor 1 represents cross tissue category covering all tissues, with higher weight for larger sample size tissues. These tissue categories (on y-axis) were used in the analysis of causal tissue identification. Tissues are ordered by sample size.