MARS: leveraging allelic heterogeneity to increase power of association testing

In standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenotype, considering the causal status of variants, only requiring the existing summary statistics to detect associated risk loci. Utilizing extensive simulated data and real data, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while controlling the type I error. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-021-02353-8).


Background
Over the past decade, genome-wide association studies (GWAS) have successfully identified many variants significantly associated with diseases and complex traits. Unfortunately, those variants only explain an extremely small proportion of phenotypic variation [1,2] and there are many more variants with even smaller effects that we have yet to identify [1,[3][4][5]. Detecting all loci that harbor associated risk loci can help elucidate the biological mechanisms of diseases and complex traits. All biological follow-up studies have been performed on loci that harbor at least one significant variant. The standard association test used in GWAS examines one variant at a time to identify associated variants; we refer to this method as univariate testing.
Previous works have shown that many loci in the genome harbor more than one causal variant for a given disease or a trait [6][7][8][9][10][11][12][13][14][15]. The phenomenon is known as allelic heterogeneity, which is very common in Mendelian traits [16] and recent works have demonstrated widespread allelic heterogeneity in expression quantitative trait loci (eQTLs) and complex traits [17,18]. The univariate test may be underpowered for a locus containing multiple causal variants with small effect sizes. Alternatively, an approach that considers the effects of multiple causal variants simultaneously may have increased statistical power to detect signals for the locus by aggregating the effects of causal variants.
In this paper, we propose a new model-based method for identifying the association between multiple variants in a locus and a trait we call Model-based Association test Reflecting causal Status (MARS). Our approach builds upon recent progress in finemapping approaches that attempt to identify causal variants in a locus. Causal variants are responsible for the association signal at a locus; however, each locus often contains tens to hundreds of variants tightly linked (linkage disequilibrium, LD) to the reported associated single nucleotide polymorphism (SNP). Therefore, the LD hinders the identification of causal variants at the risk locus. CAVIAR [13] is a recent fine-mapping approach that estimates the probability of each variant being causal, thus allowing an arbitrary number of causal variants by jointly modeling the association statistics at all variants. We extend the likelihood model of CAVIAR to explicitly incorporate the LD structure of data utilizing multivariate normal (MVN) distribution conditional on the causal status of the variants. MARS computes a likelihood ratio of a null model, where none of the variants are causal against an alternate model, where at least one variant is causal. Then, an efficient re-sampling approach is applied for the significance test.
Our method does not require individual-level data, which is often not provided in GWAS. MARS only requires summary statistics such as Z-scores and the LD of variants in a locus, which can be obtained from a reference dataset such as HapMap [19,20] or the 1000 Genome project [21], and reports a p-value that indicates the significance of the association between the locus and the corresponding trait. This approach is related to set-based association tests that examine an association between a set of variants and a trait [22][23][24]. MARS outperforms these previous methods because its underlying model, which builds upon the model of CAVIAR, explicitly models the joint distribution of observed statistics given multiple signals of associations. Furthermore, MARS uses a significance level that corresponds to the standard GWAS significance level, thus facilitating interpretation.
When applied to several simulated data sets, we show that MARS robustly controls type I errors and has improved statistical power compared to the univariate test and widely utilized set-based association tests, a fast and flexible set-Based Association Test (fastBAT) [25], Deterministic Approximation of Posteriors (DAP-G) [26], and Sequence Kernel Association Test (SKAT) [27]. In addition, to show the performance of MARS on both eQTL studies and GWAS, we have applied MARS to representative eQTL and GWAS datasets; Genotype-Tissue Expression (GTEx) data and Northern Finland Birth Cohort (NFBC) data, respectively. Applied to the data of 44 tissues provided by the GTEx consortium [28,29], MARS identified more eGenes, which are genes with at least one variant significantly associated with cis compared to those reported by the GTEx consortium in most tissues, e.g., in the Whole Blood data, MARS identified 29% more eGenes than the consortium; 57% of the extra eGenes that had only been identified by MARS, i.e., not by consortium, were reported in studies elsewhere. To demonstrate the increased power of MARS on real data, we followed a strategy of applying MARS to an older data set and validated the additionally discovered loci using current datasets that have higher statistical power because they are much larger. Applied to the 2009 NFBC data, we show that MARS effectively identifies more association loci than the univariate test and show that many of the new loci have since been discovered in recent GWAS studies.

Overview of MARS
Causal variants are those that are responsible for the association signal at a locus. The ultimate goal of the standard association test, which examines the association between each variant and a trait, is to find causal variants; we refer to this method as univariate testing. However, multiple causal variants with small effect sizes often exist in a locus. For these cases, the univariate test may not detect those associations due to its low statistical power. Alternatively, we can examine the aggregated effect of multiple variants simultaneously on the trait to increase statistical power.
We developed a novel statistical method referred to as Model-based Association test Reflecting causal Status (MARS). MARS examines the association between a set of variants and a trait. MARS requires summary statistics estimated for variants (e.g., z-score) for a locus of interest and a correlation structure, LD, between the variants, which can be readily obtained from a reference dataset. To test the association between a set of variants of a locus and a trait, MARS estimates a likelihood ratio to compute a test statistic, which is referred to as Likelihood Ratio Test (LRT) statistic; LRT stat . We consider the likelihood of a null model (L 0 ) and the alternative model (L 1 ). Note that we are computing the likelihood ratio of a null model against the alternative model, not the full model, which is the standard form of the "Likelihood ratio test" uses. The null model assumes that there is no causal variant to the trait while the alternative model assumes that there is at least one causal variant to the trait. Then, we compute the LRT stat as L 1 /L 0 . Suppose that we test the association between m number of variants and a trait. Given the observed summary statistics, we can compute the LRT stat as follows: Here, S =[ s 1 , · · · , s m ] T indicates summary statistics of m variants and C indicates the causal status of m variants. C is a binary vector of length m, where 0 indicates that a variant is non-causal and 1 indicates that a variant is causal. Specifically, C 0 indicates the causal status where none of the variants are causal and ζ is a set that contains all possible casual statuses except for C 0 . Since there are m number of variants, there are 2 m possible causal statuses. In practice, we limited the number of allowed causal SNPs as well as the number of variants considered for a region to reduce the running time in experiments throughout the paper. We find that considering up to 3 causal variants and use 50 variants in a region to be reasonable in respect to both the running time and the accuracy in our experiments. However, a user may increase the numbers, which is amendable in high computing servers and may provide more accurate results in expense of the running time. For details, see the "Methods" section. To assess the association significance for a locus, we utilize the re-sampling approach, where we sample null statistics from a MVN distribution with a corresponding LD and estimate the LRT stats for the null statistics to generate a null panel of LRT stats ;LRT NULL stats . From the null panel, we estimate the significance of LRT DATA stat computed from the data. Figure 1 shows the basic overall process of MARS. The "Methods" section describes the details and techniques to make this process computationally feasible for big genomic data. Here, we assume that we are testing an association between a locus of m variants and a trait. The leftmost panel shows the input of MARS; m number of summary statistics for the locus and an n × m matrix that contains genotypes of m SNPs for n samples. The next two panels on the bottom show the re-sampling process in which we sample the null statistics K times from an MVN distribution with a variance-covariance matrix of that contains LD of the genotypes X. The rightmost panel shows the process by which we estimate LRT stats for the null panel from which we can compute a p-value for the data

MARS controls type I error while improving power in simulation studies
We demonstrate that MARS controls type I errors through simulations of null panels utilizing the GTEx data as a starting point and consider the SNPs ±1Mb around the transcription start site (TSS) of 10 genes of Whole Blood data from the GTEx consortium [28,29]. Half of the genes are randomly selected from those reported as eGenes by the GTEx consortium and the other half are randomly selected from other genes, i.e., non-eGenes, of the GTEx consortium [28,29]. For each gene locus, we simulate 10 7 null summary statistics according to the generative model described in the Materials and Methods section, which uses the LD structure estimated from the genotypes of the SNPs in the locus and applies MARS to compute the LRT stats .
To show that MARS controls type I error, the false-positive rates are estimated for different thresholds of α = 5×10 −6 to 5×10 −2 . Half of the simulated data is used to compute a threshold of LRT stats for the corresponding α; LRT threshold α and the other half of the simulated data is used to compute a quantile of LRT stats smaller than the LRT threshold α . Figure 2a shows that MARS robustly controls type I error for all examined gene loci as the false-positive rates for different gene loci are very close to the corresponding α = 5×10 −6 to 5 × 10 −2 , respectively.
To show that MARS increases the statistical power, we performed extensive simulation studies for various scenarios and compared the power of MARS with those of the univariate test. Here, we defined the univariate test as a set-based association test that uses the maximum summary statistic among SNPs in the locus we are testing (for details see the "Methods" section). The same gene loci from the previous section are used for the test and we estimate power of each gene locus for cases with two causal variants implanted with different effect sizes of λ = 4, 4.5, 5, 5.5, and 6. For a fair comparison of the powers between the univariate test and MARS, we utilized the standard GWAS p-value threshold of 5×10 −8 . We simulated 10 8 summary statistics under the null model of no effect to generate a null panel and 10 8 summary statistics under the alternative model of effect size λ for two causal variants to examine the power. To set a threshold for computing the power, we utilized the concept of the univariate test (see "The standard univariate test, fastBAT, and DAP-G" sub-section of the "Methods" section). For each null statistic, we selected the maximum p-value to get 10 8 maximum p-value from the null panel. Then, we ordered the maximum p-values to get the quantile, q, where the maximum p-values corresponds to 5 × 10 −8 ; q is used for setting the LRT thresholds. Then, the power is estimated as a quantile of alternative cases that show LRT stats greater than the LRT threshold. The details of the whole processes of computing the threshold and estimated the power is described in the "Power estimation" sub-section of the "Methods" section. The percentage of power improvement is defined as (power of MARS -power of the univariate test)/(power of the univariate test) ×100. Figure 2b shows that MARS has increased statistical power compared to the univariate test. While the extent of power improvements differs between the gene loci as LD structures differ between loci, it is clear for all cases that the powers are improved over the univariate test. Depending on the effect size λ implanted in the simulated data, the power has improved from 5.2 to 41.18% in our experiments and as expected, the smaller the effect size, the better MARS performs over the univariate test. The results do not show noticeable differences between the loci of the eGenes and of the non-eGenes used for the simulations.
In addition, we examine the cases where two and three causal variants, each with an effect size of λ = 4.5, are implanted in the simulated data. As the number of causal variants increases from two to three, MARS shows a bigger power improvement over the univariate test (Fig. 2c). The result shows that the more causal variants that exist in a locus, the better MARS performs over the univariate test.
Besides the univariate test, we compared MARS with the widely used set-based association test methods, fastBAT [25], DAP-G [26], and SKAT [27]. Due to the heavy I/O of fastBAT and DAP-G (an extended version of DAP), we used 10 5 simulations and a threshold of 10 −5 . We computed the power of MARS, DAP-G, and fastBAT for the different effect sizes of λ = 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6 and show that MARS outperforms fastBAT and DAP-G for all cases by improving the power from 0.07 to 23.38% and from 0.6 to 10.61%, respectively, depending on the effect sizes in the experiments (Fig. 2d). In addition, we compared MARS with SKAT, optimal unified SKAT (SKAT-O) [27] and Meta-SKAT [30], which is designed for meta-analysis but applicable for a single study as well. As SKAT does not allow z-scores that we have used for our simulated studies as its input, we generated phenotypes with a range of effect sizes β for the comparison. The results show that MARS outperforms SKAT, SKAT-O, and Meta-SKAT. Additional file 1: Fig. S1 describes the experiments and results in detail.

MARS detects novel eGenes in GTEx data
Recently, a larger number of expression quantitative trait loci (eQTLs) studies have been reported. In particular, numerous cis-eQTLs, which are eQTLs that map to the approximate location of their gene-of-origin, have been identified. As part of this effort, the GTEx consortium reported eGenes, which are genes with at least one cis-eQTL. We applied MARS to GTEx data to show that it can detect more eGenes than those reported by the GTEx consortium. Among the 44 tissues provided by the GTEx consortium, we first applied MARS to the Whole Blood data for evaluation as this data contains the largest number of samples among all tissues.
For simulation studies and GWAS, we applied a threshold that corresponds to the pvalues threshold of 5 × 10 8 utilizing the univariate test. However, for a fair comparison of the MARS results with those reported by the GTEx consortium, a different strategy has been used. We used 10,000 simulations, which is the number of simulations used by the GTEx consortium to compute their "empirical p-values" and select eGenes. To identify eGenes for MARS, we set the threshold as the border of empirical p-values between eGenes and other genes, referred to as non-eGenes, reported by the GTEx consortium. Figure 3 is a Venn diagram that compares the identifications of eGenes by MARS and those reported by the GTEx consortium. MARS identified 2043 extra eGenes that were not reported by the consortium, while MARS missed only 98 eGenes that were reported by the consortium [28,29]. MARS and the GTEx consortium detected 6686 eGenes in common.
To verify that the eGenes identified by MARS are true associations, we compared the extra eGenes with those reported by other studies that have larger sample sizes. Note that the results throughout the paper used data from GTEx version 6. GTEx version 7 has recently been published with more samples and improved technology in experiments. We expect more eGenes are detected in the newer version of the data as the power increases with the number of samples and etc. We compared the extra eGenes with those reported by GTEx version 7. In addition, we utilized a Whole Blood data of Framingham Heart Study (FHS) [31], which is independent of GTEx data but contains a larger sample size (5257 samples), to validate the extra eGenes. Figure 3b is a Venn diagram that compares  [32] and chronic lymphocytic leukemia [33]. Sille et al. have demonstrated that the expression level of SP140 is regulated by cis-eQTLs in lymphoblastoid cell lines [34]. Besides, the SP140 protein levels are shown to be downregulated by a cis-acting mechanism in peripheral blood mononuclear cells (PBMCs) from MS patients [35]. HSPB8 (ENSG00000152137) has been recently identified as an eGene using PBMCs and the expression level of HSPB8 is known to be regulated by several SNPs [36]. Surfactant protein D encoded by SFTPD (ENSG00000133661) gene is known to be regulated by in a cis-acting manner in human blood [37], and CD83 (ENSG00000112149) has been recently identified as cis-eQTLs gene in CD19+ B lymphocyte [38]. Additionally, in Additional file 1: Fig. S2, we thoroughly analyzed 100 randomly selected genes and compared the p-values for MARS, the univariate test, and those reported by the GTEx consortium to show that MARS could identify more eGenes with better p-values. These results show that MARS is capable of identifying novel eGenes that cannot be detected using standard association test approaches. Additional file 2: Table S1 lists the 2048 extra eGenes and their identifications in GTEx version 7 and FHS.
One advantage of MARS is that once the null panel of LRT stats for each gene has been established, this can be applied to the gene in any other tissues. Utilizing the null panel of LRT stats estimated from the Whole Blood data of the GTEx consortium, we computed the p-values of the genes in all 44 tissues of GTEx using their summary statistics and LD structures. Figure 4 shows that MARS identifies comparable or more eGenes than the univariate test in addition to those reported by the GTEx consortium in all tissues. As expected, the number of eGenes identified by the univariate test and those reported by the GTEx consortium are very close to each other in all of the tissues. The numbers of genes differ between tissues due to factors such as sample size differences and we only used the common genes in each tissue and the Whole Blood data of the GTEx consortium because the null panel of LRT stats was estimated for genes in the Whole Blood data. Additional file 3: Table S2 provides a comparison of eGenes identified by MARS, the univariate test, and the GTEx consortium for each tissue. In addition, we compared eGenes identified by MARS, GTEx version 6, and GTEx version 7 and provided Venn diagrams as in Fig. 3 for all tissues (Additional file 1: Fig. S3).

MARS detects more set-based associations in GWAS
We show the effectiveness of our method on GWAS by applying MARS to the Northern Finland Birth Cohort (NFBC) data [39]. The NFBC data consist of 10 traits collected from 5327 individuals, namely triglycerides (TG), high-density lipoproteins (HDL), low-density lipoproteins (LDL), glucose (GLU), insulin (INS), body mass index (BMI), and C-reactive protein (CRP) as a measure of inflammation, systolic blood pressure (SBP), diastolic blood pressure (DBP), and height. For NFBC data, we examined 51,762 loci, where each locus is defined as ±1 Mb of TSS of genes provided by the GTEx consortium.
MARS requires a lot of sampling to apply the standard GWAS p-value threshold of 5 × 10 −8 . To reduce the running time, we apply the idea of importance sampling [40] on MARS for the GWAS data, which well approximates the p-value estimated from the original sampling approach while reducing the sampling number dramatically, from 10 8  Fig. S4). For further details, see the "Fast and space-efficient sampling for MARS" sub-section of the "Methods" section. Figure 5a shows that, for all traits, MARS identifies more or comparable loci that are likely significantly associated with the traits. A total of 471 loci were only identified by MARS and not by the univariate test. Additional file 4: Table S3 lists significantly associated loci identified by MARS and by the univariate test and Additional file 1: Fig. S5 provides Venn diagrams that compare the identifications of MARS and the univariate test. To verify those extra loci, we searched the loci from other GWAS by utilizing the GWAS catalog [41]. As a result, several variants associated with 311 loci among those 471 extra loci have been previously reported [7,. For example, the rs6060369 locus associated with height was reported by large GWAS [49][50][51]. The rs1800961 locus related to HDL was previously reported by large GWAS and meta-analysis GWAS [59-61, 65, 66]. The rs6511720 locus related to LDL was discovered by several previous studies [45,55,56,60,61,65]. A Venn diagram in Fig. 5b compares the number of loci identified by MARS and the univariate test as well as showing the number of loci for which at least one associated variant has been reported by the GWAS catalog. The list of SNPs, the corresponding loci found by previous studies, and their detailed information including PubMed id and SNP position are provided in Additional file 5: Table S4. Note that loci are defined based on the gene map of GTEx (±1Mb of TSS), so some loci may overlap (Additional file 1: Fig. S6). For the height, the univariate test found no associations while MARS found 53 associations. To verify the 53 identifications of MARS on the height, we performed the univariate test on the genetic investigation of anthropometric traits (GIANT) consortium data set [67] that contains 131,547 samples and is thus much larger than the NFBC data set and expected to have greater power on the association test. As a result, we found 5788 associations where all 53 associations that MARS found are included to show that MARS's identifications on height are true-positive signals. These results demonstrate that MARS can efficiently identify novel associations in GWAS.

Discussion
Great efforts have been spent on finding the hidden heritability and many studies suspect that single-level variant tests miss signals due to the small effect sizes and power problems. Approaches that examine multiple variants together may have increased statistical power to detect risk loci with small effect sizes. Moreover, interpreting Genome-wide Association Studies (GWAS) at the gene level is an important step toward understanding the molecular processes that lead to disease [68,69]. Several statistical approaches have been proposed that test the association between a set of variants and a trait or disease status; however, they simply use naïve statistics such as the mean or sum of χ 2 for statistics in the risk loci [24,25,70,71].
Our method examines the association between a set of variants and a trait considering the causal status and LD between variants, utilizing the model used in a recent finemapping approach [13,72]. One of the advantages of MARS is that it may be applicable to data with only summary statistics, utilizing LD estimated from a global reference dataset such as 1000 genome data [73]. Note that a special attention may be required for the data from multiancestry or human leukocyte antigen (HLA) region, for which a reference dataset may not provide an accurate LD estimate due to population stratification [73] or complex LD patterns [74]. Another advantage of our method is that once a null panel of test statistics has been established for a locus, it can be applied to the locus in other studies, only if the analyzed variants (thus, the LD structures) of the locus are the same. For example, in our GTEx analysis, the null panel statistics for genes were established only once and applied to all of the 44 tissues. This may reduce the running time significantly as most of the running burden comes from the null panel generation.
Applied to extensive simulated data sets with different effect sizes and the number of causal variants, our method shows improved power compared to previous approaches including the widely used set-based association test, fastBAT and DAP-G, while successfully controlling type I errors. Especially, when there are many causal variants with small effect sizes, our method shows superior performance to the standard univariate association test approach. Applied to Genotype-Tissue Expression (GTEx) data, our method identifies more or comparable eGenes compared to the standard univariate approach as well as those reported by the GTEx consortium in all tissues. In addition, using Whole Blood data, we show that a large portion of the eGenes only identified by MARS have been reported by other larger studies and some of them have biological evidence of being eGenes based on previous literature. Lastly, utilizing the Northern Finland Birth Cohort (NFBC) data, we show the effectiveness of our method to the GWAS in that our method effectively identifies more association loci in GWAS compared to the standard association test approach. In the experiments, we have defined each locus as a gene; however, it could be defined as any set of variants that a user wants to apply to.
We note some limitations in our work. First, MARS is computationally costly compared to the standard GWAS method as MARS tests the significance of an association based on the re-sampling approach. However, in practice, we introduce fast and space-efficient sampling techniques including importance sampling to dramatically reduce the sampling time, which closely approximates the original result, while we were able to successfully handle big eQTL data sets that contain tens of thousands of genes and GWAS data sets with thousands of samples. Second, we limited the number of causal variants in a locus up to three in the simulated studies to reduce the running time in experiments. We believe this is a reasonable assumption, as it has been reported that a relatively small number of variants exist in a region [13] and MARS showed better results in practice, utilizing real eQTL and GWAS datasets: GTEx data and NFBC data. However, this may not be a general assumption and further investigations may require for the cases with a larger number of causal variants (>3), comparing with previous methods such as DAP-G and fastBAT, which may give better results for the cases. Third, our model is based on a linear model and can only be applied to common variants, not rare variants. For data that does not follow a normal distribution, we recommend to fit the data into a normal distribution using techniques such as inverse normal transformation. Also, we assume the summary statistics are corrected for population stratification. We can extend our likelihood ratio model using CAVIAR-gene [15] instead of CAVIAR to consider the population stratification in the future study. Lastly, MARS does not utilize existing functional data; some current methods utilize functional data to detect more eGenes [75][76][77]. We can extend the statistical framework of MARS to utilize functional data in future work. Despite these limitations, MARS is a novel statistical method that can detect newly associated loci and increase the number of loci in follow-up studies. Through this, MARS can increase our biological understanding of diseases and complex traits.

Methods
In this section, we assume that phenotypic values are continuous values to ease some of mathematical derivations of computing the summary statistics. MARS only require the joint distribution of summary statistics to follow multivariate normal distribution. It has been shown that for binary phenotypes the joint distribution of summary statistics follows a multivariate normal distribution [15,78,79]; thus, summary statistics obtained from case/control phenotypes are applicable for MARS as well.

GWAS statistics
Consider GWAS on a quantitative trait where we genotype n individuals and collect a phenotype for them. Let X i be a vector of length n with the standardized genotypic values (i.e., mean zero and variance one) of the ith marker that we are testing and Y be a vector of length n with the phenotypic values. We assume that the data-generation model follows the following linear additive model: Here, μ is the mean of the phenotypic values, 1 n is a vector of n ones, β i is their coefficients, and e is a vector of length n sampled from N (0, σ 2 I) accounting for the residual errors, where I is an n × n identity matrix.
Under this model, the phenotype follows a MVN with the following mean and variance: By maximizing the likelihood of the model, we can estimate β i as follows: and the summary statistic is computed as follows: , where λ i is non-centrality parameter (NCP) and is equal to β i σ e X T i X i . We obtained the estimated values for μ, e, and σ e asμ = μ1 n T X i n ,ê = Y − 1 nμ −β i X i , andσ e = ê Tê n−2 , respectively.

The effect of linkage disequilibrium on the statistics
Consider the case where that the ith SNP is causal to a phenotype and the jth SNP is noncausal but in LD with the ith SNP. The correlation between the two variants is r, which is approximated by 1 n X T j X i . The effect size of the jth SNP is computed as follows: e (X T j X) −1 and the statistics for the jth SNP are computed as follows: We can show that the covariance between the statistics is equal to the correlation of the genotypes as follows: Then, the joint distribution of the summary statistics for the two variants given their NCPs, λ i and λ j , follows a multivariate normal distribution as follow:

CAVIAR generative model
Now we consider the case with m SNPs. Given the true effect sizes of m SNPs, = [ 1 , 2 , · · · , m ], the summary statistics of m SNPs, S =[ S 1 , · · · , S m ] T , is as follows: Here, is a correlation matrix, where {i, j} = r ij . We utilize Fisher's polygenic model and assume that the effect sizes follow a normal distribution. Let C be a binary vector of length m that indicates the causal status of m SNPs; 0 indicates that a SNP is non-causal and 1 indicates that a SNP is causal. Given the causal status C, we assume that the true effect size is as follows: where is a diagonal matrix, where {i, i} = σ 2 if the ith SNP is causal and {i, j} = , otherwise. From Eqs. (1) and (2), the likelihood of summary statistics follows a multivariate normal distribution as follows: Then the likelihood function is given as follows: We use a simple model where the probability that an SNP begins causal is γ , which is independent from other SNPs. To compute the prior of causal status, we use the same assumptions that are widely used in fine-mapping methods, and γ is set to 0.01 [13,78,[80][81][82][83]. We have shown that the choice of γ does not make big differences on the results (for details, see Additional file 1: Fig. S7) We assume each SNP is independent and that the probability of a SNP to be causal is equal to 0.01 [81,82]. Therefore, we compute the prior probability as follows: Here, |c i | = 1 if the ith SNP is causal and |c i | = 0, otherwise. Although we use a simple prior, we can incorporate external information by using the SNP-specific prior γ i , which is the prior for the ith SNP, and then the prior probability for a more general case is

Model-based Association test Reflecting causal Status (MARS)
MARS examines the association between a set of SNPs and a phenotype of interest. For the test statistic, we utilize a likelihood ratio test (LRT). We consider the likelihoods of two models: that of the null model (L 0 ) and that of the alternative model (L 1 ). The null model assumes that there is no causal SNP to the phenotype while the alternative model assumes that there is at least one causal SNP for the phenotype. Then, we can compute the test statistic as LRT stat = L 1 /L 0 . Given the observed marginal association statistics S and correlation matrix , we can compute the LRT stat as follows: Here, we can compute the prior using Eq. (5) and the likelihood using Eq. (4). Since there are m SNPs, there are 2 m potential causal statuses. In practice, we limit the number of allowed causal SNPs to two or three as which is consistent with reports from previous studies that a relatively small number of causal SNPs exist in a region. In addition, as the size of genes are often very large-many genes contain more than 10,000 SNPs within ±1Mb of TSS for the GTEx data-we order the SNPs by the value of its summary statistics and only used the top 50 SNPs for computing the LRT stats to reduce the running time and the space. Figure 6a shows this practical implementation of MARS used for the experiments. This strategy dramatically reduces the running time while well approximating the results using all the SNPs in the loci (Additional file 1: Fig. S8) because the causal SNPs are expected to be included in the top 50 SNPs. When limiting the number of causal SNPs up to three and using care 3 i casual statuses to consider and ζ becomes a set that contains all the possible casual statuses with 1, 2, or 3 causal SNPs. However, depending on the available computational power and size and properties of the data, the number of possible causal variants for running MARS may increase using options in the MARS program.

eGene detection in GTEx data
To identify an eGene, we examine the association between the gene expression levels and SNPs within ±1Mb of TSS of the gene, which can be the candidates of cis-eQTLs for the gene. To assess the significance of a gene, we sample summary statistics from an MVN distribution under the null hypothesis, S ∼ N(0, ). Here, is a variance-covariance matrix estimated from the SNPs within ±1Mb of TSS of the gene. Based on the simulation data, we order the SNPs by values of its summary statistics and used only top 50 SNPs for computing the LRT stats ; LRT NULL stats using equation (6). Then, we also select the top 50 SNPs of summary statistics to compute the LRT stat of the gene; LRT DATA stat , using the equation (6). The p-value of the gene is estimated as the quantile of LRT DATA stat among LRT NULL stats . One of the advantages of MARS is that once the null panel, LRT NULL stats , has been estimated for a locus, the panel can be rapidly applied to the locus in any other tissues or traits to compute a p-value. We use the Whole Blood data, which contains the greatest number of samples among the 44 tissues, to estimate the null panels of 23,163 genes and applied the panels to all the other tissues. To compare the MARS results with GTEx's results, we use 10 4 simulations, the number used by GTEx Consortium to compute their "empirical p-values" to select eGenes. To identify eGenes for MARS, we set the threshold as the border of empirical p-value between eGenes and genes other than those eGenes, referred to as non-eGenes, reported by the GTEx consortium, which is differ by tissues as GTEx used the FDR approach to find their eGenes. A similar process can be applied to detect eGenes in the univariate test except using the maximum summary statistic as the test statistic instead of LRT stat .

Power estimation
To show MARS increases statistical power over the univariate test, we compare the power between MARS and the univariate test. For a fair comparison, we utilized the standard GWAS p-value threshold of 5 × 10 −8 . We sampled 10 8 number of summary statistics under the null hypothesis, S NULL ∼ N(0, ) and 10 8 number of summary statistics under the alternative hypothesis, S ALT ∼ N( , ). Here, is a vector of length m, where m is the number of SNPs that contain zeros except for the causal SNPs. For example, for a simulation, in which two SNPs (e.g., SNP 1 and SNP 2) with effect size λ are implanted in the data, is [ λ, λ, 0, ..., 0]. We examined the power for cases with two causal or three causal SNPs implanted in the simulated data, where the causal SNPs are randomly selected for each simulation. Then, we computed the p-value of S NULL using the univariate test, UNIp NULL , and found the quantile q, where the p-value equals to the standard GWAS p-value threshold of 5 × 10 −8 as follows: We compute the LRT stats of S NULL as LRT NULL stats , using MARS and set the LRT stat at the quantile q as the threshold of LRT stats as LRT THR stat , which satisfies the following equation: Here, LRT THR stat corresponds to the standard GWAS p-value threshold of 5 × 10 −8 . Now, we compute the LRT stats of S ALT as LRT ALT stats , and the power of MARS is defined as the number of LRT ALT stats that are greater than the LRT THR stat as follows: Power of MARS = Number of(LRT ALT stat > LRT THR stat ) 10 8 × 100 Similarly, the power of the univariate test is defined similarly by computing the p-value of S ALT using the univariate test; UNIp ALT , as follows: Power of the univariate test = Number of (UNIp ALT < 5 × 10 −8 ) 10 8 × 100 In the power comparison of MARS, fastBAT and DAP-G, the power estimation process is the same as that described above except for that 10 5 simulations and a threshold of 10 −5 is used instead of 10 8 simulations and a threshold of 5 × 10 −8 , respectively.

Fast and space-efficient sampling for MARS
To access the significance of associations, MARS uses a re-sampling approach that requires a lot of sampling from MVN distribution. There are two main obstructions to this standard re-sampling approach. One is that a locus may contain many SNPs; for example, many genes in the GTEx data contain >10,000 SNPs within ±1Mb of their TSS. When the number of SNPs m is very large, the standard re-sampling approach, S ∼ N(0, m×m ), using the Cholesky decomposition [84] is impractical. This takes a lot of time and space as m×m itself often requires a few gigabytes of space. We can reduce the space and time complexity dramatically by utilizing the fact that m×m is a covariance matrix of X; m×m = X T X/n, where n is the number of samples. Instead of sampling statistics from MVN with the variance-covariance matrix of m×m ; S ∼ N(0, m×m ), we sample statistics from MVN with the variance-covariance matrix of I n×n ; S * ∼ N(0, I n×n ). This neither takes time nor space because in general n << m and n is not large. Then we multiply S * by X T / √ N to compute the statistics S = X T √ N S * .
The other main obstruction of the standard sampling approach is that the number of sampling required to find a proper threshold for MARS may be very large. For the GTEx data, we compared the eGenes to those reported by the GTEx consortium and performed 10,000 samplings to determine the number of samples used for computing their empirical p-values. However, for the GWAS analysis, MARS needs to perform a lot of samplings to find a LRT threshold that corresponds to the standard GWAS p-value threshold of 5 × 10 −8 . Thus, for GWAS, we have applied importance sampling, which is an approximation method of standard sampling. The main idea of importance sampling is that it draws the sample from a distribution with thicker tails than a target distribution. Then, it uses importance weights so that the correct distribution is targeted [40]. The procedure is as follows. Instead of sampling from MVN with the variance-covariance matrix of I n×n ; S * ∼ N(0, I n×n ), we sample statistics from MVN with the variance-covariance matrix of √ 2I n×n ; S * imp ∼ N(0, √ 2I n×n ). Then, the new statistics from importance sam- We record an additional information, referred to as the importance weight, which defined as follows: Here, f indicates the probability density function of MVN. We repeat the process of sampling statistics S * imp and computing S imp and W, K times. We call each process as a run and after K runs we have a set of statistics {S imp 1 , S imp 2 , · · · , S imp K } and a set of weights {W 1 , W 2 , · · · , W K }. Then, we estimate a univariate p-value from each S imp and compute the p-value threshold as the ratio of the sum of weights that have the univariate p-value< 5 × 10 −8 over the sum of all the weights as follows: Here, i indicates the index of a run and C is a set containing the indices of runs where the univariate p-value< 5 × 10 −8 . Given summary statistics of a locus, we can access the significance of the locus by computing LRT stat of the summary statistics as LRT DATA stat . In addition, we compute the K number of LRT stats for the top 50 SNPs of the S imp as LRT NULL stats , as well. Then we compute the p-value of the locus as the ratio of the sum of weights where LRT NULL stats > LRT DATA stats over the sum of all the weights as follows: Here, D is a set containing indices of runs with LRT NULL stats > LRT DATA stats . The association is significant if the p-value estimated from the Eq. (8) is smaller than the p-value threshold estimated from the Eq. (7). Applied to 10 randomly selected genes, we find that the p-value estimated from the 10 4 number of importance sampling well approximates the p-values estimated from the 10 8 number of the original re-sampling (Additional file 1: Fig. S4). Utilizing the importance sampling, we can reduce the number of samplings dramatically from 10 8 to 10 4 in GWAS experiments. Figure 6 shows an overview of the fast and efficient association strategy for MARS (Fig. 6b).
For the GTEx data analysis, we used MARS as described in Fig. 6a, where 10 4 number of samplings were performed and up to two causal variants were considered. In this case, MARS took approximately 3.5 min to test the significance for an average-size gene with 7522 SNPs for 338 samples in our system. Using parallel processing, we were able to run the 23,163 genes over several hours, which was approximately 3 h for sampling and computing LRT stats and some extra time for pre-processing and post-processing the data. For the GWAS data analysis, we applied the fast and efficient strategy of MARS as described in Fig. 6b, where 10 4 number of importance sampling was performed and up to two causal variants were considered. In this case, MARS took approximately 50 min to test a significance for an average-sized locus with 299 SNPs and 5326 samples in our system. Using parallel processing, we were able to run the 56,319 number of genes in approximately tow days.

The standard univariate test, fastBAT, and DAP-G
To compare MARS with the standard approach of the set-based association test, we defined a univariate test that uses a maximum summary statistic among the SNPs in the analysis locus. In addition, the widely used set-based association tests fastBAT [25] and DAP-G [26] were used for the comparison. A Genome-wide Complex Trait Analysis (GCTA) [85] program was downloaded from the GCTA website (http://gcta.freeforums. net/thread/309/gcta-fastbat-based-association-analysis) and the "fastBAT" option was used to run GCTA-fastBAT. The DAP-G program was downloaded from the appropriate website (https://github.com/xqwen/dap/tree/master/dap_src), and summary statistics were used from the run option.

GTEx data
The summary statistics and genotypes of 44 tissues of GTEx data version 6 were downloaded from dbGap (https://www.ncbi.nlm.nih.gov/gap); these were used to generate all results throughout this paper. The eGene list of GTEx data version 7 was downloaded from dbGap and only used to validate eGenes that had been identified by MARS applied on GTEx data version 6. In total, 23,163 gene loci selected from the Whole Blood data were used for the analysis; these contain at least 50 SNPs in their ± Mb of TSS. We generated the null panel of LRT stats using Whole Blood data that contains the most samples, 338. The numbers of genes differ between tissues due to factors such as sample size differences; therefore, for eGene detection in 44 tissues, we used common gene regions in each tissue and the Whole Blood data.

Northern Finland Birth Cohort dataset
The genotypes and 10 phenotype values of triglycerides (TG), high-density lipoproteins (HDL), low-density lipoproteins (LDL), glucose (GLU), insulin (INS), body mass index (BMI), C-reactive protein (CRP) as a measure of inflammation, systolic blood pressure (SBP), diastolic blood pressure (DBP), and height of 5326 samples from the Northern Finland Birth Cohort (NFBC) dataset were downloaded from dbGap. PLINK, a wholegenome association analysis toolset (http://zzz.bwh.harvard.edu/plink/index.shtml), was used to compute the statistics. For the set-based association test, the gene map of the GTEx data that contains 56,319 gene positions was used to define the loci for analysis. SNPs ±1 Mb around the transcription start site (TSS) of the genes were searched in the NFBC genotype data and 51,762 regions with >50 SNPs were used for the analysis. 10 4 importance samplings were performed to generate the null panel to estimate the p-values of MARS and the univariate test.

GIANT consortium height dataset
To evaluate the identifications of MARS on the NFBC height data, we performed univariate tests on the GIANT consortium height dataset, which contains 131,547 samples. [67].
Additional file 1: Fig. S1-Power of MARS compared to SKAT, SKAT-O, and Meta-SKAT in simulated studies. Fig. S2-MARS detects more eGenes in GTEx Whole Blood data. Fig. S3-Venn diagrams comparing eGenes identified by MARS using GTEx v6, eGenes reported by GTEx v6, and eGenes reported by GTEx v7 for all the tissues. Fig. S4-Compare p-values estimated from the standard sampling and those estimated from the importance sampling. Fig. S5-Venn diagrams comparing set-based association identified by MARS and the univariate test for traits in NFBC data. Additional file 2: Extra eGenes identified by MARS but not by GTEx v6. have been reported by other studies. We used the Whole Blood data of GTEx v6 and identified 2,043 extra eGenes that were not reported by the GTEx consortium (GTEx v6). We compared those extra eGenes with a newer version of data recently published by the GTEx consortium, GTEx v7, as the newer version of the data is expected to contain more eGenes with a bigger sample size. In addition, we compared those extra eGenes with the eGenes reported by Framingham Heart Study (FHS) data that has a large number of samples, 5257 samples, using the Whole Blood data. In each column, '0' indicates the gene is not identified as an eGene and '1' indicates the gene is identified as an eGene.
Additional file 3: List of eGenes identified by MARS, the univariate test, and GTEx 23,163 genes from the Whole Blood data were used for the analysis, which contain at least 50 SNPs in their +/-1Mb of TSS. We generate null distribution of LRTstats using the Whole Blood data as which contains the maximum number of samples,338. The number of genes is different between tissues due to the sample size differences and etc, thus, for the eGene detection in 44 tissues, we use genes common in each tissue and the Whole Blood data. The first column shows the gene ID, the second column shows eGenes identified by MARS, the third column shows eGenes identified by the univariate test and last column shows the eGenes reported by GTEx consortium. In each column '0 'indicates the gene is not identified as an eGene and '1' indicates the gene is identified as an eGene.

Additional file 4:
List of genes significantly associated with traits in NFBC data detected by MARS and the univariate test. The first column shows the geneID. Only genes at least one of the method, MARS or the univariate test, is significantly associated with the trait is shown. The second column and the third column show whether SNPs +/-1Mb of TSS of the gene is significantly associated with the trait or not for MARS and the univariate test. Here, '1' indicates the gene is significantly associated with the trait and '0' indicates the gene is not significantly associated with the trait. Additional file 5: List of loci found by previous studies among the loci identified by MARS but not by the univariate test Additional file 6: Review history Peer review information Barbara Cheifet was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Review history
The review history is available as Additional file 6.
Authors' contributions JWJJ, FH, and EE designed this study. JWJJ and JJ performed the experiments. JWJJ, JJ, and FH wrote the paper in consultation with the other authors. All authors read and approved the final manuscript.