GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data
© Zhao et al.; licensee BioMed Central Ltd. 2013
Received: 14 March 2013
Accepted: 22 July 2013
Published: 22 July 2013
To characterize the genetic variation of alternative splicing, we develop GLiMMPS, a robust statistical method for detecting splicing quantitative trait loci (sQTLs) from RNA-seq data. GLiMMPS takes into account the individual variation in sequencing coverage and the noise prevalent in RNA-seq data. Analyses of simulated and real RNA-seq datasets demonstrate that GLiMMPS outperforms competing statistical models. Quantitative RT-PCR tests of 26 randomly selected GLiMMPS sQTLs yielded a validation rate of 100%. As population-scale RNA-seq studies become increasingly affordable and popular, GLiMMPS provides a useful tool for elucidating the genetic variation of alternative splicing in humans and model organisms.
KeywordsRNA-seq alternative splicing sQTL exon generalized linear mixed model
Alternative splicing (AS) is the process by which exons from precursor mRNA transcripts are differentially included during splicing, resulting in different mature mRNA isoforms from a single gene locus . AS is a major contributor to the control of gene expression and protein diversity. More than 90% of human genes are alternatively spliced . Changes in the relative ratio of alternatively spliced isoforms of a single gene can have significant phenotypic consequences and cause various diseases [3, 4].
The control of AS is mediated through extensive protein-RNA interactions involving cis regulatory elements and trans acting factors . Genetic polymorphisms that alter cis splicing regulatory elements can result in difference of alternative splicing among human individuals and subsequently affect gene expression or protein activity. Increasing evidence suggests that such natural variation of alternative splicing can influence complex traits or modify disease risks . For example, genetic variation of alternative splicing in the sodium channel gene SCN1A can influence the response to antiepileptic drugs . To date, most genome-wide surveys of alternative splicing variation in human populations were carried out on the HapMap lymphoblastoid B cell lines (LCLs), whose genomic variants have been extensively characterized by the HapMap  and 1000 Genomes projects . The first few studies utilized the Affymetrix exon array with approximately 6 million exon-targeted probes [10–12]. In these studies, the microarray probe intensities of individual exons were compared to those of whole genes to quantify exon inclusion levels and then associations with single-nucleotide polymorphisms (SNPs) were tested to identify splicing Quantitative Trait Loci (sQTLs). Another study used the same exon array platform to characterize tissue-specific control of alternative splicing in brain and peripheral blood mononuclear cell samples . These studies have shed light on the prevalence and functional importance of alternative splicing variation in human populations. The development of the high-throughput RNA sequencing (RNA-seq) technology has provided a powerful alternative to splicing sensitive microarray for exon level expression quantification. RNA-seq has several advantages compared to microarray, including a greater dynamic range of exon expression levels, the ability to detect novel transcripts not probed on the array, the ability to better quantify exon inclusion levels, single nucleotide level resolution, and less confounding effects from polymorphisms on the target exons [14, 15]. Several studies have used the RNA-seq technology to characterize transcriptome variation in HapMap LCLs at the whole-gene and/or individual exon level. Pickrell et al. and Montgomery et al. used low-coverage (4-25 million short reads per individual) single-end and paired-end RNA-seq to characterize gene expression and splicing in LCLs derived from 69 Nigerian  and 60 CEU (Utah residents of European descent from CEPH-Centre d'Etude du Polymporphisme Humain)  individuals. Cheung et al. independently generated an RNA-seq dataset on 41 CEU individuals at a deeper coverage of 28.4-66 million single-end reads per individual, although the authors restricted their data analysis to expression QTLs .
AS in the human population measured from RNA-seq data
We obtained the RNA-seq data from two published studies on the CEU population (of European ancestry) by Cheung et al.  and Montgomery et al. . Cheung et al. generated 28.4-66 million 50 bp single-end reads per individual on 41 CEU samples, while Montgomery et al. generated 3.5-17.1 million 37 bp paired-end reads per individual on 60 CEU samples. Twenty-nine individuals were shared between the two datasets. Because of the higher sequencing depth in the Cheung et al. dataset, the analysis in this manuscript was primarily conducted on the Cheung et al. data (referred to hereafter as the CEU dataset). We also used the low-coverage Montgomery et al. data (referred to hereafter as the CEU2 dataset) to evaluate the concordance of results between the two CEU sample datasets.
The RNA-seq reads were mapped to the human genome (hg19) and transcriptome (Ensembl gene annotation r65) using the software Tophat . To estimate the exon inclusion level (denoted as ψ for PSI, that is Percent Spliced In) from RNA-seq data, we used sequence reads mapped to splice junctions compiled from both the splice junctions in Ensembl gene annotations as well as the novel junctions found by Tophat. Based on the AS patterns, we classified the AS events into four categories (Figure S1 in Additional file 1): skipped exon (SE), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), and mutually exclusive exons (MXE). Using all splice junction reads, we can obtain a point estimate of the exon inclusion level (). We illustrate the estimate of ψ in our model using the SE event as the example (Figure 1a). Suppose IJ and SJ represent read counts of inclusion and skipping splice junctions, respectively, because IJ can come from both the upstream junction and the downstream junction, we treat the effective read count from the exon inclusion isoform and the effective read count from the exon skipping isoform as SJ. Given an observed total junction read count of n = IJ/2+SJ, the point estimate of . The median and coefficient of variation (CV) of of skipped exons from CEU and CEU2 (with within each of the two populations, see Materials and methods) are highly correlated with a Pearson correlation coefficient of 0.99 and 0.90, respectively, suggesting that the point estimate of provides a reasonable approximation to the exon inclusion level. However, we also noted that the total counts of splice junction reads for the same alternatively spliced exon typically vary substantially across different individuals (Figure S2 in Additional file 1), possibly due to the intrinsic randomness of RNA-seq technology as well as individual variation in gene expression levels. Such variability of read depth is expected to differentially affect the reliability of estimates across individuals. This motivated us to develop an improved statistical model that explicitly considers the variation of RNA-seq read depth across individuals.
Statistical model and simulation study of GLiMMPS
We first attempted to handle the individual variation of RNA-seq read depth by extending the previously used linear model (lm)  to a generalized linear model (glm) with a logit link function, which assumes the read count from the exon inclusion isoform (y) follows a binomial distribution , and logit(ψ) is linearly modeled by the SNP effect. This simple logistic regression model assumes that ψ is correctly modeled and thus: , and . However, we found that overdispersion (inflation of variance) is widespread in the experimental data (Supplementary Methods in Additional file 1). For the top sQTLs (Type I error <1% based on permutation) identified from glm in the CEU dataset, >90% sQTLs have significant overdispersion (Figure S3 in Additional file 1).
To model the overdispersion, we developed GLiMMPS, a generalized linear mixed model for detecting sQTLs. To deal with the overdispersion in the generalized linear model, we model the extra variance of ψ as a random effect for each individual i in the regression model with random effects, . Let , where , β j denoting the fixed effect for SNP j, the second level of the model can be written as: . GLiMMPS is essentially a hierarchical model that considers both the read depth variation and the exon inclusion level variation within the same genotype groups (Figure 1b). Details of the lm, glm, and GLiMMPS models were described in Materials and methods and Supplementary Methods in Additional file 1.
Performance of GLiMMPS in real human RNA-seq data
To experimentally assess the robustness of GLiMMPS predictions, we randomly selected 24 SE (skipped exon) type and 2 A5SS (alternative 5' splice site) type of sQTLs out of the 140 significant sQTLs detected in the CEU dataset and performed RT-PCR validation using quantitative fluorescent RT-PCR (Materials and methods). For the validation experiments, we used an independent panel of 86 HapMap LCLs covering diverse worldwide populations (Additional file 3). All 26 sQTLs were validated, yielding a validation rate of 100% (Additional file 4; Figure S6 in Additional file 1). In eight individuals analyzed by both RNA-seq and RT-PCR, the exon inclusion levels estimated by RNA-seq were highly correlated with RT-PCR measurements (Pearson correlation coefficient r = 0.87). It is noteworthy to mention that these 26 selected sQTLs have a wide range of P value rankings among the 140 significant sQTLs, as opposed to being selected from the top of the significant sQTL list. The interquartile range of their rankings is 38 to 95. This suggests that the vast majority of the sQTLs identified by GLiMMPS represent true signals of splicing variation in human populations.
GLiMMPS reveals positional features of sQTLs
To definitively identify causal SNPs underlying significant sQTLs, we tested the effects of individual SNPs on splicing using minigene reporter assays. It should be noted that since multiple SNPs can be in high linkage disequilibrium (LD) with each other, an sQTL signal SNP with high association to exon splicing may not necessarily be the causal SNP that affects splicing regulation. In fact, the 140 significant sQTLs (FDR ≤0.1) have on average 63 significant SNPs. Of the 26 RT-PCR validated sQTLs, the causal SNPs in four genes (CAST, DHRS1, HMSD, and ATP5SL) were confirmed previously in work by us  and others . The causal SNPs in CAST, DHRS1, and HMSD are located in the 5' SS , while the causal SNP in ATP5SL is located in the exon and disrupts two putative exonic splicing enhancers . For the remaining RT-PCR confirmed sQTLs, we randomly selected 14 for minigene experiments. Briefly, the target exon and 350-500 bp of surrounding intronic sequences on each side of the exon were sub-cloned into the minigene expression vector and site-directed mutagenesis was carried out to generate the alternative alleles. After transiently transfecting these plasmids into HEK293 cells, we performed quantitative RT-PCR analysis of wild-type and mutant minigene reporters to determine the effect of the SNPs on exon inclusion levels (see details in Materials and methods). In 10 of the 14 exons analyzed (NTPCR, KIAA1841, SP140, ITM2C, PARP15, PTK2B, BCL2A1, SHMT1, ITPA, and ARFGAP3), the minigene experiments identified at least one SNP that caused >10% change of the minigene exon inclusion levels, with the direction of change matching the RNA-seq/RT-PCR data (Figure S7 in Additional file 1). These include two exons where we found multiple SNPs with additive effects on splicing within one LD block (KIAA1841) or multiple LD blocks (ITPA). In another two exons (PPIL3 and NCAPG2), the minigene experiments failed to identify any SNP with strong effect on splicing. For PPIL3, the SNP rs111292412 in the 3' SS affected splicing of the minigene reporter in the same direction as in the RNA-seq data, but the change was minor (from 9% in AA to 5% in GG). In NCAPG2, the closest sQTL SNP was an intronic SNP 347 bp away from the splice sites, and it did not have any measurable impact on the splicing of the minigene reporter. It is possible that another proximal or distal SNP or indel not genotyped yet is responsible for this sQTL signal. Finally, in the last two exons analyzed (CLEC2D and MX1), the minigene exon inclusion levels were close to 100% for all alleles, suggesting that the cloned minigene reporters transfected to the HEK293 cells did not faithfully recapitulate endogenous exon splicing activities in the LCLs. Taken together, despite the inherent limitations of minigene reporter systems , we were able to use minigene experiments to identify the causal SNPs underlying 10 of the 14 sQTL signals analyzed. In all 10 exons, the causal SNPs confirmed by minigene analysis were located proximal to the alternatively spliced exons (that is, within 300 bp of the splice sites).
sQTLs explain GWAS signals of human traits and diseases
The list of sQTL signals linked to GWAS signals.
Target exonb (hg19)
GWAS trait (SNP)
< = 300 bp
Metabolic traits (rs211718)
Liver enzyme levels (gamma-glutamyl transferase) (rs1335645)
Chronic lymphocytic leukemia (rs13397985)
Multiple sclerosis (rs10201872)
Crohn's disease (rs7423615)
Alcohol dependence (rs13160562)
Crohn's disease (rs2549794)
Ankylosing spondylitis (rs30187)
Bipolar disorder (rs2242663)
Platelet counts (rs7296418, rs1727307)
Coffee consumption (rs6495122)
Coronary heart disease (rs2472299)
Response to hepatitis C treatment (rs11697186, rs6139030)
Ribavirin-induced anemia (rs1127354)
In a previous GWAS study, the SNP rs13160562 near CAST was discovered to be significantly associated with alcohol dependence . However, no functional implication of this SNP was discussed in the original study. Here, GLiMMPS identified this SNP as an sQTL signal SNP in CAST. It is significantly associated with the splicing of CAST exon 13 located 45 kb upstream of the SNP position. It is also in an LD block (r2 = 0.53) with another SNP rs7724759 located in the 5' SS of exon 13, which has been confirmed experimentally to alter the splicing of this exon [11, 24]. Thus, genetic variation of alternative splicing is the likely causal mechanism underlying the reported association of CAST and alcohol dependence. In ERAP2, GLiMMPS identified SNP rs2248374 as an sQTL signal SNP for exon 10. This SNP disrupts the activity of the 5' SS . This sQTL SNP is in high LD (r2 = 0.83) with a GWAS SNP rs2549794, previously identified as significantly associated with Crohn's disease . The skipping of this alternatively spliced exon from ERAP2 introduces a premature stop codon, resulting in nonsense-mediated decay of the exon skipping isoform and a dramatic reduction of overall transcript levels, which subsequently impacts antigen representation . Haplotype analysis of the sQTL SNP and its linked SNPs revealed evidence of strong balancing selection during human evolution , suggesting the functional and evolutionary importance of this sQTL. A third example is ATP5SL, identified as a GWAS locus associated with height in multiple populations . The peak signal SNP reported by GWAS is rs17318596 but the mechanism of this SNP was unclear in the original study. GLiMMPS identified a significant sQTL for exon 5 of ATP5SL. The sQTL SNP rs1043413 is strongly linked to the GWAS signal SNP rs17318596 (r2 = 0.84) (Figure S8 in Additional file 1). This sQTL SNP rs1043413 is located in exon 5 and disrupts two exonic splicing enhancers (ESEs) . Together, these data indicate that even at a very modest sequencing depth (28.4-66 million 50 bp single-end reads per individual), GLiMMPS recovered previously reported associations between SNP and splicing that may contribute to phenotypic variation in humans.
The identification of sQTLs can also help resolve apparent confusions about the causal mechanisms of GWAS signals. For example, the SNP rs11697186 located in gene DDRGK1 near the ITPA gene (Inosine Triphosphate Pyrophosphohydrolase) was significantly associated with response to hepatitis C treatment in a GWAS study, and later was found to be in high LD with SNP rs1127354 on ITPA exon 2 by fine mapping . Of note, this C-to-A SNP (rs1127354) on exon 2 has been well established in the pharmacogenetics field to be associated with ITPA enzyme deficiency or low-activity [51, 52], but the molecular mechanism was unclear. This non-synonymous SNP causes a proline to threonine change (P32T) in the IPTA protein product. However, based on the crystal structure of the human ITPA protein, the proline residue was far away from the active site of the enzyme . Moreover, recent biochemical studies of ITPA showed that the purified mutant protein with the P32T change has the same activity as the wild-type protein . Others have proposed the alternative mechanism that this exon 2 SNP causes mis-splicing of ITPA , but the properties of the gene product resulting from mis-splicing have not been examined. Our analysis of the CEU RNA-seq data identified the same ITPA exon 2 SNP as a significant sQTL signal SNP (P value = 5.80E-09) associated with the combined skipping of exons 2 and 3. This prediction is robustly validated by RT-PCR (Figure S9 in Additional file 1). Minigene experiments further confirmed that this exonic SNP as well as an adjacent intronic SNP (rs7270101) both reduced the inclusion levels of exons 2 and 3 (Figure S7 in Additional file 1). These results reinforce the proposed effect of this ITPA SNP at the RNA level , and suggest that future studies on the causal mechanism of this ITPA gene variant should compare the activities of the full-length protein isoform to the truncated isoform that lacks exons 2 and 3.
We have developed GLiMMPS, a generalized linear mixed model to detect genotype-splicing associations from RNA-seq data. The key advantage of GLiMMPS over previously used methods is that it models: (1) variation in exon-specific read coverage across individuals; and (2) overdispersion in RNA-seq read counts. Both issues are important for accurate exon-level expression quantitation. The coverage of RNA-seq reads for any given alternative exon is a critical factor for the precision of the exon inclusion level estimate [14, 56]. The importance of accounting for overdispersion in RNA-seq data analysis has also been well recognized . Methods based on the negative binomial model [58, 59] or the generalized linear model with Cox-Reid dispersion estimators [19, 20] have been developed for modeling dispersion in detecting differential gene or exon expression between biological states. Here in the sQTLs analysis, by modeling these two levels of variation in RNA-seq read counts, GLiMMPS achieves superior performance over competing statistical models, as demonstrated by analyses of simulated and real RNA-seq data. Importantly, even at a low coverage we observed a high level of concordance in the GLiMMPS results between the two human datasets (CEU and CEU2). Additionally, RT-PCR tests of 26 randomly selected significant sQTLs yielded a validation rate of 100%. Together, these results demonstrate that GLiMMPS is a robust and improved method to detect sQTLs from RNA-seq data.
Fine-scale analysis of sQTLs reveals positional features of SNPs that alter exon splicing. We found that the location of the SNPs is strongly correlated with potential impact on splicing (Figure 4b). Specifically, SNPs located within the 5' and 3' splice sites have the smallest (most significant) overall GLiMMPS P values, consistent with the importance of the splice sites in exon recognition during pre-mRNA splicing. Interestingly, the significance level of sQTLs is positively correlated with the proximity of the sQTL signal SNPs to target exons. As we increased the significance level cutoff for sQTLs, we observed a progressive increase of the proportion of sQTLs with at least one significant signal SNP within 300 bp of the splice sites (Figure 4a). The causal roles of these proximal sQTL SNPs on exon splicing were further confirmed by minigene splicing reporter assays. Collectively, these results support the hypothesis that the majority of cis regulatory information controlling alternative splicing is encoded in close proximity (for example, within 300 bp) of the target exons, consistent with a recent analysis of the mammalian splicing code . Nonetheless, it should also be noted that 20% of the significant sQTLs (FDR ≤0.1) lack any significant signal SNP within 300 bp of the splice sites, including sQTLs confirmed experimentally by RT-PCR (in NCAPG2 and PIGQ, see Figure S6 in Additional file 1). For such sQTLs, it is possible that the causal SNPs are indeed proximal, but are missing from current SNP annotations or fail to reach the significance level cutoff due to small sample size. Alternatively, we cannot rule out the possibility that a small fraction of sQTLs are indeed due to SNPs disrupting distal splicing regulatory elements, given that the physical binding sites of splicing factors on the pre-mRNA can be located deep into the introns . In the future, it would be interesting to confirm the identity and elucidate the regulatory mechanisms of causal sQTL SNPs acting in introns distal to target exons.
The detection of sQTLs is useful for interpreting signals from GWAS studies. Despite the success of GWAS in revealing the genetic basis of complex traits and diseases, elucidating the mechanistic implications of GWAS findings remains a major challenge . As many functional SNPs may affect gene expression and regulation instead of the final protein sequence, integrating transcriptome information with GWAS signals has proven to be an effective approach for pinpointing the functional causal variants underlying GWAS signals [62–64]. Here, from the CEU RNA-seq dataset we identified 140 unique sQTLs, including 10 significantly linked to previously identified GWAS signals (Table 1). This is probably only scratching the surface of trait-associated sQTLs, due to the low sequencing depth (28.4-66 million single-end reads per individual) and the small sample size (41 individuals). We anticipate that with more and deeper RNA-seq data generated for diverse human tissues and cell types, the catalog of sQTLs linked to phenotypic traits and diseases will rapidly expand in the near future.
The GLiMMPS framework provides the basis for several aspects of future extensions. Currently, GLiMMPS uses reads mapped to splice junctions to estimate exon inclusion levels. This is a commonly used approach in alternative splicing quantitation from RNA-seq data [56, 65, 66]. However, with proper normalization for lengths of isoform-specific segments, it is feasible to also incorporate reads mapped within the exons, which may further improve the power in detecting sQTLs. This could be particularly useful for strand-specific RNA-seq, where the origins of exon body reads can be unambiguously assigned to sense or antisense transcripts. Additionally, in paired-end RNA-seq data with tight distribution of insert size, reads that map to flanking constitutive exons can also provide useful information about the exon inclusion level . Furthermore, RNA-seq reads often display non-uniform distribution along mRNA transcripts due to sequence-specific bias in RNA sequencing, and several methods have been developed to model and correct for such biases [67–70]. In principle, we can use a suitable bias correction method to adjust the raw RNA-seq read counts, prior to analysis by GLiMMPS. However, we tested two well-known bias correction methods [67, 68] using a deep RNA-seq dataset with matching quantitative RT-PCR data for over 100 exons in two cell lines [66, 71], but did not observe improvement in the RNA-seq estimates of exon inclusion level as judged by the correlation of RNA-seq estimates with the RT-PCR measurements. Another area of improvement is to consider the potential impact of specific SNPs on exon splicing as the prior in the statistical model, an idea previously used for detecting expression QTLs [72–74]. For example, our results show a significant association between the SNP position and the potential impact on splicing (Figure 4), with SNPs located in the 5' and 3' splice sites most likely to influence exon splicing. It is possible to incorporate such positional information or more advanced predictive models of exon splicing  as the prior information to guide the detection of sQTLs.
RNA-seq has become a powerful and increasingly affordable technology for population-scale analysis of transcriptome variation. Here we report GLiMMPS, a robust statistical method for detecting splicing quantitative trait loci (sQTLs) from RNA-seq data. GLiMMPS is applicable to all major patterns of alternative splicing events. The GLiMMPS source code and user manual are freely available for download at . As the cost of high-throughput sequencing continues to decline, we anticipate that combined sequencing of genomes and transcriptomes will become a popular design in large-scale studies of traits and diseases. GLiMMPS provides a useful tool for genome-wide identification of sQTLs from population-scale RNA-seq datasets.
Materials and methods
We downloaded the RNA-seq datasets produced by  and . Both datasets came from the lymphoblastoid B cell lines from the Caucasian (CEU) population in the HapMap project . There were 28.4-66 million 50 bp single end reads sequenced for 41 individuals by Cheung et al., while there were only 3.5-17.1 million 37 bp paired end reads for 60 individuals by Montgomery et al. We denote the datasets from Cheung et al. and Montgomery et al. as CEU and CEU2, respectively. Because the sequencing depth in CEU is much higher than in CEU2, we focused our analysis on the CEU dataset, but also carried out comparison between CEU and CEU2. RNA-seq sequence reads were mapped to the reference human genome (hg19) using Tophat  with Ensembl gene annotations (Ensembl genes r65). The CEU and CEU2 datasets were mapped with the single end or the paired end mode respectively. Only uniquely mapped reads were retained for downstream analysis.
To search for sQTLs, we first identified all alternative splicing events and RNA-seq reads mapped to splice junctions using the MATS pipeline as described previously . We then focused our analysis on four types of alternative splicing events: skipped exon (SE), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), and mutually exclusive exons (MXE). Using splice junction reads, we can obtain a point estimate of the exon inclusion level (ψ). Given that we have an observed number of splice junction reads for one isoform (y) and total splice junction read counts (n), then (see Figure S1 in Additional file 1). We then filtered out exons with no or little change in exon inclusion level () or few total junction read counts (median n <5) in the population, and obtained 18,267 AS events from CEU and 7,747 AS events from CEU2 for the downstream sQTL analysis.
The genotype data for the 41 individuals in the CEU dataset were taken from the latest HapMap3 release (#28). Of these 41 individuals, 23 were also genotyped in the 1000 Genomes project . For SNPs uniquely reported by the 1000 Genomes project, we imputed the genotypes for individuals not in the 1000 Genomes project using Beagle . We filtered out low frequency SNPs with MAF (minor allele frequency) <0.05. For each alternatively spliced exon, we tested cis SNPs within 200 kb upstream or downstream of the target exon splice sites when searching for sQTLs. For the CEU2 dataset, the 60 individuals were all included in the 1000 Genomes project. Fifty-eight of them were sequenced in low coverage and two were in high coverage. To avoid genotype calling bias, we only included the 58 low-coverage individuals with genotypes taken directly from the 1000 Genomes project data (10/2010 release). The same MAF filtering was used as in the CEU dataset.
Statistical models for sQTL analysis
All statistical analyses were done in the R statistical environment . We evaluated three different models for sQTL analysis: linear model (lm), generalized linear model (glm), and our proposed generalized linear mixed model (GLiMMPS). The model details were provided in Supplementary Methods in Additional file 1. Here we only briefly describe the GLIMMPS model. GLiMMPS is a hierarchical model that uses the reads information from both exon inclusion and skipping isoforms instead of only a point estimate of exon inclusion level (as in the lm model used in [16, 17]) in sQTL analysis. Given the observed junction read counts as in Figure S1 in Additional file 1 we assume that these junction reads supporting two alternative isoforms follow a binomial distribution: . To deal with the overdispersion in the generalized linear model, we model the extra variance of ψ as a random effect for each individual i in the regression model with random effects, . Let , where , β j denoting the fixed effect for SNP j, the second level of the model can be written as: . Thus the joint likelihood for β, is given by:
where is the standard normal density. Function glmer() from R package lme4 was used to fit the model, where Laplace approximation is used for the parameter estimations and a likelihood ratio test was used to obtain the P values for the fixed effect β j for each SNP j.
For both the CEU and CEU2 datasets, using each of the statistical models (lm, glm, and GLiMMPS) mentioned above, we carried out the analysis for each exon with SNPs within 200 kb of the exon. To estimate the false discovery rate, we used the same permutation approach as in  to obtain the null distribution of the P values. The details are in Supplementary Methods in Additional file 1.
To validate the sQTLs found in the CEU datasets, we randomly selected 26 significant sQTL exons (FDR ≤0.1) for RT-PCR validation. We performed the validation experiments on an independent panel of 86 lymphoblastoid cell lines from the HapMap3 project (Additional file 3), which were purchased from the Coriell Institute for Medical Research, Camden, NJ, USA. Total RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA, USA) and reverse transcribed by the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA, USA). Fluorescently labeled RT-PCR was carried out as described before . Capillary electrophoresis (Georgia Genomics Facility, Athens, GA, USA) and 5% Urea TBE-PAGE were used for resolving PCR products. In capillary electrophoresis, band peak area was generated by GeneMapper 4.0 software (Applied Biosystems, Carlsbad, CA, USA). In 5% Urea PAGE, the signal was captured by Fujifilm FLA-7000 (Fuji Photo Film Co. Ltd., Tokyo, Japan) and quantified using the ImageQuant TL7.0 software (General Electric Company, Waukesha, WI, USA). Final exon inclusion level was calculated as the peak area or band intensity of the exon inclusion band(s) divided by the total peak areas or band intensities of all bands. To test the association of genotypes with the RT-PCR estimated exon inclusion levels, we used the most significant HapMap3 sQTL SNP for each target exon. A linear regression on the estimated exon inclusion levels with the SNP genotypes of the SNP was used to calculate P values and those with P value <0.05 were called as validated. All RT-PCR primer sequences are listed in Additional file 4 and individual exon inclusion levels are listed in Additional file 5.
We used the hybrid construct pI-11-H3 (provided by Dr. Russ P. Carstens, University of Pennsylvania, Philadelphia, PA, USA) for our minigene splicing reporter assays. Genomic DNAs were extracted from LCLs using UltraClean™ Tissue&Cells DNA Isolation kit (MO BIO Laboratories, Carlsbad, CA, USA). The target exon and its flanking 350-500 bp intronic regions were amplified by PCR (see Additional file 6 for the primer sequences). In-Fusion™ Advantage PCR Cloning Kit (Clontech, Mountain View, CA, USA) or restriction enzyme digestion and ligation strategy were used to clone PCR products into the vector. Site-directed mutagenesis was carried out following the manufacturer's instructions. The integrity of all constructs was confirmed by sequencing. To test minigene splicing, plasmids were transiently transfected into HEK293 cells. Fluorescently labeled RT-PCR was performed to evaluate the splicing impact of specific polymorphisms as described before .
We obtained 7,523 GWAS SNPs at genome-wide significance level of P value <10-5 from the Catalog of Published Genome-Wide Association Studies (accessed 03/30/2012) . Using all the 1000 Genomes SNPs from the CEU population, we obtained all SNPs that are in high linkage disequilibrium with the GWAS SNPs (r2>0.8 in the CEU population and within 200 kb window of the GWAS SNP). Because of the high SNP density and high recombination rate around the MHC region, we excluded genes from this region in this part of the analysis. We then identified sQTL signal SNPs overlapped with this expanded list of GWAS linked SNPs.
Data access and source code availability
The GLiMMPS model has been implemented and released in an easy to use package. The splice junction read counts, genotypes, and validation datasets, as well as the source code used for sQTL processing and analysis are provided at the companion website of this article .
alternative 3' splice site
alternative 5' splice site
Utah residents of European descent from CEPH-Centre d'Etude du Polymporphisme Humain
coefficient of variation
Generalized Linear Mixed Model Prediction of sQTL
generalized linear model
lymphoblastoid B cell line
minor allele frequency
mutually exclusive exons
Quantitative Trait Loci
receiver operating characteristic
splicing Quantitative Trait Loci
We thank Peng Jiang for assistance with primer design, Collin Tokheim for setting up the website, Sara Miller and Mallory Stroik for technical assistance with exon splicing validation and minigene mutagenesis. We thank Shihao Shen and Ying Nian Wu for discussions and comments on this work. This work was supported by NIH grant R01GM088342 (YX), Burroughs Wellcome Fund grant 1008841.01 (YX), and a March of Dimes Foundation Basil O'Connor Starter Scholar Research Award #5-FY10-60 (YX). JWP was supported by an NIH T32 postdoctoral fellow training grant T32HL007638.
- Nilsen TW, Graveley BR: Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010, 463: 457-463. 10.1038/nature08909.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456: 470-476. 10.1038/nature07509.PubMedPubMed CentralView ArticleGoogle Scholar
- Faustino NA, Cooper TA: Pre-mRNA splicing and human disease. Genes Dev. 2003, 17: 419-437. 10.1101/gad.1048803.PubMedView ArticleGoogle Scholar
- Wang GS, Cooper TA: Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet. 2007, 8: 749-761. 10.1038/nrg2164.PubMedView ArticleGoogle Scholar
- Wang Z, Burge CB: Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 2008, 14: 802-813. 10.1261/rna.876308.PubMedPubMed CentralView ArticleGoogle Scholar
- Lu Z-X, Jiang P, Xing Y: Genetic variation of pre-mRNA alternative splicing in human populations. Wiley Interdisciplinary Reviews RNA. 2012, 3: 581-592. 10.1002/wrna.120.PubMedPubMed CentralView ArticleGoogle Scholar
- Heinzen EL, Yoon W, Tate SK, Sen A, Wood NW, Sisodiya SM, Goldstein DB: Nova2 interacts with a cis-acting polymorphism to influence the proportions of drug-responsive splice variants of SCN1A. Am J Hum Genet. 2007, 80: 876-883. 10.1086/516650.PubMedPubMed CentralView ArticleGoogle Scholar
- Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, et al: Integrating common and rare genetic variation in diverse human populations. Nature. 2010, 467: 52-58. 10.1038/nature09298.PubMedView ArticleGoogle Scholar
- Consortium TGP: A map of human genome variation from population-scale sequencing. Nature. 2010, 467: 1061-1073. 10.1038/nature09534.View ArticleGoogle Scholar
- Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P, Hudson TJ, Sladek R, Majewski J: Genome-wide analysis of transcript isoform variation in humans. Nat Genet. 2008, 40: 225-231. 10.1038/ng.2007.57.PubMedView ArticleGoogle Scholar
- Coulombe-Huntington J, Lam KCL, Dias C, Majewski J: Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet. 2009, 5: e1000766-10.1371/journal.pgen.1000766.PubMedPubMed CentralView ArticleGoogle Scholar
- Fraser HB, Xie X: Common polymorphic transcript variation in human disease. Genome Res. 2009, 19: 567-575. 10.1101/gr.083477.108.PubMedView ArticleGoogle Scholar
- Heinzen EL, Ge D, Cronin KD, Maia JM, Shianna KV, Gabriel WN, Welsh-Bohmer KA, Hulette CM, Denny TN, Goldstein DB: Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 2008, 6: e1-PubMedView ArticleGoogle Scholar
- Katz Y, Wang ET, Airoldi EM, Burge CB: Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010, 7: 1009-1015. 10.1038/nmeth.1528.PubMedPubMed CentralView ArticleGoogle Scholar
- Benovoy D, Kwan T, Majewski J: Effect of polymorphisms within probe-target sequences on olignonucleotide microarray experiments. Nucleic Acids Res. 2008, 36: 4417-4423. 10.1093/nar/gkn409.PubMedPubMed CentralView ArticleGoogle Scholar
- Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, Pritchard J: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464: 768-772. 10.1038/nature08872.PubMedPubMed CentralView ArticleGoogle Scholar
- Montgomery S, Sammeth M, Gutierrez-Arcelus M, Lach R, Ingle C, Nisbett J, Guigo R, Dermitzakis E: Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010, 464: 773-777. 10.1038/nature08903.PubMedView ArticleGoogle Scholar
- Cheung VG, Nayak RR, Wang IX, Elwyn S, Cousins SM, Morley M, Spielman RS: Polymorphic cis- and trans-regulation of human gene expression. PLoS Biol. 2010, 8: e1000480-10.1371/journal.pbio.1000480.PubMedPubMed CentralView ArticleGoogle Scholar
- Anders S, Reyes A, Huber W: Detecting differential usage of exons from RNA-seq data. Genome Res. 2012, 22: 2008-2017. 10.1101/gr.133744.111.PubMedPubMed CentralView ArticleGoogle Scholar
- McCarthy DJ, Chen Y, Smyth GK: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012, 40: 4288-4297. 10.1093/nar/gks042.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMedPubMed CentralView ArticleGoogle Scholar
- Browne WJ, Subramanian SV, Jones K, Goldstein H: Variance partitioning in multilevel logistic models that exhibit overdispersion. J Roy Stat Soc a Sta. 2005, 168: 599-613. 10.1111/j.1467-985X.2004.00365.x.View ArticleGoogle Scholar
- Yeo G, Burge CB: Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004, 11: 377-394. 10.1089/1066527041410418.PubMedView ArticleGoogle Scholar
- Lu ZX, Jiang P, Cai JJ, Xing Y: Context-dependent robustness to 5' splice site polymorphisms in human populations. Hum Mol Genet. 2011, 20: 1084-1096. 10.1093/hmg/ddq553.PubMedPubMed CentralView ArticleGoogle Scholar
- Singh G, Cooper TA: Minigene reporter for identification and analysis of cis elements and trans factors affecting pre-mRNA splicing. Biotechniques. 2006, 41: 177-181. 10.2144/000112208.PubMedView ArticleGoogle Scholar
- Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, Kasarskis A, Zhang B, Wang S, Suver C, Zhu J, Millstein J, Sieberts S, Lamb J, Guhathakurta D, Derry J, Storey JD, Avila-Campillo I, Kruger MJ, Johnson JM, Rohl CA, van Nas A, Mehrabian M, Drake TA, Lusis AJ, Smith RC, Guengerich FP, Strom SC, Schuetz E, Rushmore TH, et al: Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 2008, 6: e107-10.1371/journal.pbio.0060107.PubMedPubMed CentralView ArticleGoogle Scholar
- Nicolae DL, Gamazon E, Zhang W, Duan SW, Dolan ME, Cox NJ: Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010, 6: e1000888-10.1371/journal.pgen.1000888.PubMedPubMed CentralView ArticleGoogle Scholar
- Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012, 22: 1790-1797. 10.1101/gr.137323.112.PubMedPubMed CentralView ArticleGoogle Scholar
- Ioannidis JP, Thomas G, Daly MJ: Validating, augmenting and refining genome-wide association signals. Nat Rev Genet. 2009, 10: 318-329.PubMedView ArticleGoogle Scholar
- Saccone SF, Bolze R, Thomas P, Quan JX, Mehta G, Deelman E, Tischfield JA, Rice JP: SPOT: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study. Nucleic Acids Res. 2010, 38: W201-W209. 10.1093/nar/gkq513.PubMedPubMed CentralView ArticleGoogle Scholar
- Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M: Linking disease associations with regulatory information in the human genome. Genome Res. 2012, 22: 1748-1759. 10.1101/gr.136127.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Xu ZL, Taylor JA: SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 2009, 37: W600-W605. 10.1093/nar/gkp290.PubMedPubMed CentralView ArticleGoogle Scholar
- Need AC, Ge DL, Weale ME, Maia J, Feng S, Heinzen EL, Shianna KV, Yoon W, Kasperaviciute D, Gennarelli M, Strittmatter WJ, Bonvicini C, Rossi G, Jayathilake K, Cola PA, McEvoy JP, Keefe RSE, Fisher EMC, St Jean PL, Giegling I, Hartmann AM, Moller HJ, Ruppert A, Fraser G, Crombie C, Middleton LT, St Clair D, Roses AD, Muglia P, Francks C, et al: A genome-wide investigation of SNPs and CNVs in schizophrenia. PLoS Genet. 2009, 5: e1000373-10.1371/journal.pgen.1000373.PubMedPubMed CentralView ArticleGoogle Scholar
- Li G, Bahn JH, Lee JH, Peng GD, Chen ZG, Nelson SF, Xiao XS: Identification of allele-specific alternative mRNA processing via transcriptome sequencing. Nucleic Acids Res. 2012, 40: e104-10.1093/nar/gks280.PubMedPubMed CentralView ArticleGoogle Scholar
- Hull J, Campino S, Rowlands K, Chan M-S, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J, Kwiatkowski D: Identification of common genetic variation that modulates alternative splicing. PLoS Genet. 2007, 3: e99-10.1371/journal.pgen.0030099.PubMedPubMed CentralView ArticleGoogle Scholar
- Lee Y, Gamazon ER, Rebman E, Lee Y, Lee S, Dolan ME, Cox NJ, Lussier YA: Variants affecting exon skipping contribute to complex traits. PLoS Genet. 2012, 8: e1002998-10.1371/journal.pgen.1002998.PubMedPubMed CentralView ArticleGoogle Scholar
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009, 106: 9362-9367. 10.1073/pnas.0903103106.PubMedPubMed CentralView ArticleGoogle Scholar
- Treutlein J, Cichon S, Ridinger M, Wodarz N, Soyka M, Zill P, Maier W, Moessner R, Gaebel W, Dahmen N, Fehr C, Scherbaum N, Steffens M, Ludwig KU, Frank J, Wichmann HE, Schreiber S, Dragano N, Sommer WH, Leonardi-Essmann F, Lourdusamy A, Gebicke-Haerter P, Wienker TF, Sullivan PF, Nothen MM, Kiefer F, Spanagel R, Mann K, Rietschel M: Genome-wide association study of alcohol dependence. Arch Gen Psychiatry. 2009, 66: 773-784. 10.1001/archgenpsychiatry.2009.83.PubMedPubMed CentralView ArticleGoogle Scholar
- Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R, Anderson CA, Bis JC, Bumpstead S, Ellinghaus D, Festen EM, Georges M, Green T, Haritunians T, Jostins L, Latiano A, Mathew CG, Montgomery GW, Prescott NJ, Raychaudhuri S, Rotter JI, Schumm P, Sharma Y, Simms LA, Taylor KD, Whiteman D, et al: Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat Genet. 2010, 42: 1118-1125. 10.1038/ng.717.PubMedPubMed CentralView ArticleGoogle Scholar
- Andres AM, Dennis MY, Kretzschmar WW, Cannons JL, Lee-Lin SQ, Hurle B, Schwartzberg PL, Williamson SH, Bustamante CD, Nielsen R, Clark AG, Green ED: Balancing selection maintains a form of ERAP2 that undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet. 2010, 6: e1001157-10.1371/journal.pgen.1001157.PubMedPubMed CentralView ArticleGoogle Scholar
- Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S, Ferreira T, Wood AR, Weyant RJ, Segre AV, Speliotes EK, Wheeler E, Soranzo N, Park JH, Yang J, Gudbjartsson D, Heard-Costa NL, Randall JC, Qi L, Vernon Smith A, Magi R, Pastinen T, Liang L, Heid IM, Luan J, Thorleifsson G, et al: Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010, 467: 832-838. 10.1038/nature09410.PubMedPubMed CentralView ArticleGoogle Scholar
- Di Bernardo MC, Crowther-Swanepoel D, Broderick P, Webb E, Sellick G, Wild R, Sullivan K, Vijayakrishnan J, Wang Y, Pittman AM, Sunter NJ, Hall AG, Dyer MJ, Matutes E, Dearden C, Mainou-Fowler T, Jackson GH, Summerfield G, Harris RJ, Pettitt AR, Hillmen P, Allsup DJ, Bailey JR, Pratt G, Pepper C, Fegan C, Allan JM, Catovsky D, Houlston RS: A genome-wide association study identifies six susceptibility loci for chronic lymphocytic leukemia. Nat Genet. 2008, 40: 1204-1210. 10.1038/ng.219.PubMedView ArticleGoogle Scholar
- Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, Moutsianas L, Dilthey A, Su Z, Freeman C, Hunt SE, Edkins S, Gray E, Booth DR, Potter SC, Goris A, Band G, Oturai AB, Strange A, Saarela J, Bellenguez C, Fontaine B, Gillman M, Hemmer B, Gwilliam R, Zipp F, Jayakumar A, Martin R, Leslie S, Hawkins S, Giannoulatou E, et al: Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature. 2011, 476: 214-219. 10.1038/nature10251.PubMedPubMed CentralView ArticleGoogle Scholar
- Bloch DB, de la Monte SM, Guigaouri P, Filippov A, Bloch KD: Identification and characterization of a leukocyte-specific component of the nuclear body. J Biol Chem. 1996, 271: 29198-29204. 10.1074/jbc.271.46.29198.PubMedView ArticleGoogle Scholar
- Dent AL, Yewdell J, Puvion-Dutilleul F, Koken MH, de The H, Staudt LM: LYSP100-associated nuclear domains (LANDs): description of a new class of subnuclear structures and their relationship to PML nuclear bodies. Blood. 1996, 88: 1423-1426.PubMedGoogle Scholar
- Sille FC, Thomas R, Smith MT, Conde L, Skibola CF: Post-GWAS functional characterization of susceptibility variants for chronic lymphocytic leukemia. PLoS ONE. 2012, 7: e29632-10.1371/journal.pone.0029632.PubMedPubMed CentralView ArticleGoogle Scholar
- Dosztanyi Z, Csizmok V, Tompa P, Simon I: IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics. 2005, 21: 3433-3434. 10.1093/bioinformatics/bti541.PubMedView ArticleGoogle Scholar
- Ellis JD, Barrios-Rodiles M, Colak R, Irimia M, Kim T, Calarco JA, Wang X, Pan Q, O'Hanlon D, Kim PM, Wrana JL, Blencowe BJ: Tissue-specific alternative splicing remodels protein-protein interaction networks. Mol Cell. 2012, 46: 884-892. 10.1016/j.molcel.2012.05.037.PubMedView ArticleGoogle Scholar
- Buljan M, Chalancon G, Eustermann S, Wagner GP, Fuxreiter M, Bateman A, Babu MM: Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell. 2012, 46: 871-883. 10.1016/j.molcel.2012.05.039.PubMedPubMed CentralView ArticleGoogle Scholar
- Tanaka Y, Kurosaki M, Nishida N, Sugiyama M, Matsuura K, Sakamoto N, Enomoto N, Yatsuhashi H, Nishiguchi S, Hino K, Hige S, Itoh Y, Tanaka E, Mochida S, Honda M, Hiasa Y, Koike A, Sugauchi F, Kaneko S, Izumi N, Tokunaga K, Mizokami M: Genome-wide association study identified ITPA/DDRGK1 variants reflecting thrombocytopenia in pegylated interferon and ribavirin therapy for chronic hepatitis C. Hum Mol Genet. 2011, 20: 3507-3516. 10.1093/hmg/ddr249.PubMedView ArticleGoogle Scholar
- Sumi S, Marinaki AM, Arenas M, Fairbanks L, Shobowale-Bakre M, Rees DC, Thein SL, Ansari A, Sanderson J, De Abreu RA, Simmonds HA, Duley JA: Genetic basis of inosine triphosphate pyrophosphohydrolase deficiency. Hum Genet. 2002, 111: 360-367. 10.1007/s00439-002-0798-z.PubMedView ArticleGoogle Scholar
- Maeda T, Sumi S, Ueta A, Ohkubo Y, Ito T, Marinaki AM, Kurono Y, Hasegawa S, Togari H: Genetic basis of inosine triphosphate pyrophosphohydrolase deficiency in the Japanese population. Mol Genet Metab. 2005, 85: 271-279. 10.1016/j.ymgme.2005.03.011.PubMedView ArticleGoogle Scholar
- Stenmark P, Kursula P, Flodin S, Graslund S, Landry R, Nordlund P, Schuler H: Crystal structure of human inosine triphosphatase. Substrate binding and implication of the inosine triphosphatase deficiency mutation P32T. J Biol Chem. 2007, 282: 3182-3187.PubMedView ArticleGoogle Scholar
- Stepchenkova EI, Tarakhovskaya ER, Spitler K, Frahm C, Menezes MR, Simone PD, Kolar C, Marky LA, Borgstahl GE, Pavlov YI: Functional study of the P32T ITPA variant associated with drug sensitivity in humans. J Mol Biol. 2009, 392: 602-613. 10.1016/j.jmb.2009.07.051.PubMedPubMed CentralView ArticleGoogle Scholar
- Arenas M, Duley J, Sumi S, Sanderson J, Marinaki A: The ITPA c.94C>A and g.IVS2+21A>C sequence variants contribute to missplicing of the ITPA gene. Biochim Biophys Acta. 2007, 1772: 96-102. 10.1016/j.bbadis.2006.10.006.PubMedView ArticleGoogle Scholar
- Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008, 40: 1413-1415. 10.1038/ng.259.PubMedView ArticleGoogle Scholar
- Hansen KD, Wu Z, Irizarry RA, Leek JT: Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011, 29: 572-573. 10.1038/nbt.1910.PubMedPubMed CentralView ArticleGoogle Scholar
- Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008, 9: 321-332.PubMedView ArticleGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139-140. 10.1093/bioinformatics/btp616.PubMedPubMed CentralView ArticleGoogle Scholar
- Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, Blencowe BJ, Frey BJ: Deciphering the splicing code. Nature. 2010, 465: 53-59. 10.1038/nature09000.PubMedView ArticleGoogle Scholar
- Yeo GW, Coufal NG, Liang TY, Peng GE, Fu X-D, Gage FH: An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol. 2009, 16: 130-137. 10.1038/nsmb.1545.PubMedPubMed CentralView ArticleGoogle Scholar
- Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Bragi Walters G, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, et al: Genetics of gene expression and its effect on disease. Nature. 2008, 452: 423-428. 10.1038/nature06758.PubMedView ArticleGoogle Scholar
- Hsu Y-H, Zillikens MC, Wilson SG, Farber CR, Demissie S, Soranzo N, Bianchi EN, Grundberg E, Liang L, Richards JB, Estrada K, Zhou Y, van Nas A, Moffatt MF, Zhai G, Hofman A, van Meurs JB, Pols HAP, Price RI, Nilsson O, Pastinen T, Cupples LA, Lusis AJ, Schadt EE, Ferrari S, Uitterlinden AG, Rivadeneira F, Spector TD, Karasik D, Kiel DP: An integration of genome-wide association study and gene expression profiling to prioritize the discovery of novel susceptibility Loci for osteoporosis-related traits. PLoS Genet. 2010, 6: e1000977-10.1371/journal.pgen.1000977.PubMedPubMed CentralView ArticleGoogle Scholar
- Hernandez DG, Nalls MA, Moore M, Chong S, Dillman A, Trabzuni D, Gibbs JR, Ryten M, Arepalli S, Weale ME, Zonderman AB, Troncoso J, O'Brien R, Walker R, Smith C, Bandinelli S, Traynor BJ, Hardy J, Singleton AB, Cookson MR: Integration of GWAS SNPs and tissue specific expression profiling reveal discrete eQTLs for human traits in blood and brain. Neurobiol Dis. 2012, 47: 20-28. 10.1016/j.nbd.2012.03.020.PubMedPubMed CentralView ArticleGoogle Scholar
- Brooks AN, Yang L, Duff MO, Hansen KD, Park JW, Dudoit S, Brenner SE, Graveley BR: Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res. 2011, 21: 193-202. 10.1101/gr.108662.110.PubMedPubMed CentralView ArticleGoogle Scholar
- Shen S, Park JW, Huang J, Dittmar KA, Lu ZX, Zhou Q, Carstens RP, Xing Y: MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic Acids Res. 2012, 40: e61-10.1093/nar/gkr1291.PubMedPubMed CentralView ArticleGoogle Scholar
- Hansen KD, Brenner SE, Dudoit S: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 2010, 38: e131-10.1093/nar/gkq224.PubMedPubMed CentralView ArticleGoogle Scholar
- Li J, Jiang H, Wong WH: Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology. 2010, 11: R50-10.1186/gb-2010-11-5-r50.PubMedPubMed CentralView ArticleGoogle Scholar
- Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology. 2011, 12: R22-10.1186/gb-2011-12-3-r22.PubMedPubMed CentralView ArticleGoogle Scholar
- Schwartz S, Oren R, Ast G: Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS ONE. 2011, 6: e16685-10.1371/journal.pone.0016685.PubMedPubMed CentralView ArticleGoogle Scholar
- Warzecha CC, Jiang P, Amirikian K, Dittmar KA, Lu H, Shen S, Guo W, Xing Y, Carstens RP: An ESRP-regulated splicing programme is abrogated during the epithelial-mesenchymal transition. EMBO J. 2010, 29: 3286-3300. 10.1038/emboj.2010.195.PubMedPubMed CentralView ArticleGoogle Scholar
- Stephens M, Balding DJ: Bayesian statistical methods for genetic association studies. Nat Rev Genet. 2009, 10: 681-690.PubMedView ArticleGoogle Scholar
- Veyrieras J-B, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M, Pritchard JK: High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 2008, 4: e1000214-10.1371/journal.pgen.1000214.PubMedPubMed CentralView ArticleGoogle Scholar
- Gaffney D, Veyrieras J-B, Degner J, Roger P-R, Pai A, Crawford G, Stephens M, Gilad Y, Pritchard J: Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 2012, 13: R7-10.1186/gb-2012-13-1-r7.PubMedPubMed CentralView ArticleGoogle Scholar
- GLiMMPS. [http://www.mimg.ucla.edu/faculty/xing/glimmps]
- Browning BL, Browning SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009, 84: 210-223. 10.1016/j.ajhg.2009.01.005.PubMedPubMed CentralView ArticleGoogle Scholar
- R Core Development Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. 2010Google Scholar
- Suhre K, Shin SY, Petersen AK, Mohney RP, Meredith D, Wagele B, Altmaier E, Deloukas P, Erdmann J, Grundberg E, Hammond CJ, de Angelis MH, Kastenmuller G, Kottgen A, Kronenberg F, Mangino M, Meisinger C, Meitinger T, Mewes HW, Milburn MV, Prehn C, Raffler J, Ried JS, Romisch-Margl W, Samani NJ, Small KS, Wichmann HE, Zhai G, Illig T, Spector TD, et al: Human metabolic individuality in biomedical and pharmaceutical research. Nature. 2011, 477: 54-60. 10.1038/nature10354.PubMedView ArticleGoogle Scholar
- Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P, Holm H, Sanna S, Kavousi M, Baumeister SE, Coin LJ, Deng G, Gieger C, Heard-Costa NL, Hottenga JJ, Kuhnel B, Kumar V, Lagou V, Liang L, Luan J, Vidal PM, Mateo Leach I, O'Reilly PF, Peden JF, Rahmioglu N, Soininen P, Speliotes EK, Yuan X, Thorleifsson G, Alizadeh BZ, et al: Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet. 2011, 43: 1131-1138. 10.1038/ng.970.PubMedPubMed CentralView ArticleGoogle Scholar
- Evans DM, Spencer CC, Pointon JJ, Su Z, Harvey D, Kochan G, Oppermann U, Dilthey A, Pirinen M, Stone MA, Appleton L, Moutsianas L, Leslie S, Wordsworth T, Kenna TJ, Karaderi T, Thomas GP, Ward MM, Weisman MH, Farrar C, Bradbury LA, Danoy P, Inman RD, Maksymowych W, Gladman D, Rahman P, Morgan A, Marzo-Ortega H, Bowness P, Gaffney K, et al: Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat Genet. 2011, 43: 761-767. 10.1038/ng.873.PubMedPubMed CentralView ArticleGoogle Scholar
- Scott LJ, Muglia P, Kong XQ, Guan W, Flickinger M, Upmanyu R, Tozzi F, Li JZ, Burmeister M, Absher D, Thompson RC, Francks C, Meng F, Antoniades A, Southwick AM, Schatzberg AF, Bunney WE, Barchas JD, Jones EG, Day R, Matthews K, McGuffin P, Strauss JS, Kennedy JL, Middleton L, Roses AD, Watson SJ, Vincent JB, Myers RM, Farmer AE, et al: Genome-wide association and meta-analysis of bipolar disorder in individuals of European ancestry. Proc Natl Acad Sci USA. 2009, 106: 7501-7506. 10.1073/pnas.0813386106.PubMedPubMed CentralView ArticleGoogle Scholar
- Ramsuran V, Kulkarni H, He W, Mlisana K, Wright EJ, Werner L, Castiblanco J, Dhanda R, Le T, Dolan MJ, Guan W, Weiss RA, Clark RA, Karim SS, Ahuja SK, Ndung'u T: Duffy-null-associated low neutrophil counts influence HIV-1 susceptibility in high-risk South African black women. Clin Infect Dis. 2011, 52: 1248-1256. 10.1093/cid/cir119.PubMedPubMed CentralView ArticleGoogle Scholar
- Amin N, Byrne E, Johnson J, Chenevix-Trench G, Walter S, Nolte IM, Vink JM, Rawal R, Mangino M, Teumer A, Keers JC, Verwoert G, Baumeister S, Biffar R, Petersmann A, Dahmen N, Doering A, Isaacs A, Broer L, Wray NR, Montgomery GW, Levy D, Psaty BM, Gudnason V, Chakravarti A, Sulem P, Gudbjartsson DF, Kiemeney LA, Thorsteinsdottir U, Stefansson K, et al: Genome-wide association analysis of coffee drinking suggests association with CYP1A1/CYP1A2 and NRCAM. Mol Psychiatry. 2012, 17: 1116-1129. 10.1038/mp.2011.101.PubMedPubMed CentralView ArticleGoogle Scholar
- Peden JF, Hopewell JC, Saleheen D, Chambers JC, Hager J, Soranzo N, Collins R, Danesh J, Elliott P, Farrall M: A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nat Genet. 2011, 43: 339-344. 10.1038/ng.782.View ArticleGoogle Scholar
- Ochi H, Maekawa T, Abe H, Hayashida Y, Nakano R, Kubo M, Tsunoda T, Hayes CN, Kumada H, Nakamura Y, Chayama K: ITPA polymorphism affects ribavirin-induced anemia and outcomes of therapy--a genome-wide study of Japanese HCV virus patients. Gastroenterology. 2010, 139: 1190-1197. 10.1053/j.gastro.2010.06.071.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.