BANDITS: Bayesian differential splicing accounting for sample-to-sample variability and mapping uncertainty

Alternative splicing is a biological process during gene expression that allows a single gene to code for multiple proteins. However, splicing patterns can be altered in some conditions or diseases. Here, we present BANDITS, a R/Bioconductor package to perform differential splicing, at both gene and transcript level, based on RNA-seq data. BANDITS uses a Bayesian hierarchical structure to explicitly model the variability between samples and treats the transcript allocation of reads as latent variables. We perform an extensive benchmark across both simulated and experimental RNA-seq datasets, where BANDITS has extremely favourable performance with respect to the competitors considered.

script level counts, are sampled via Gibbs sampler from their conditional distribution X|π, D, with D = D (1) , . . . , D (N ) , where D (i) denotes the input data for the i-th sample (i.e., the set of equivalence classes counts).
We add a pre-subscript to all parameters, to indicate the value at the current iteration of the MCMC. We initialize the hyper and hierachical parameters as follows: 0 δ k = 1 and 0 π (i) k = 1/K, for k = 1, . . . , K and i = 1, . . . , N . Note that the latent variables are not initialized because they are sampled from a Gibbs step, which does not require the value of the previous iteration. After initialising parameters, we update them according to the following scheme for R iterations.

S1.2 EC with multiple genes
If an equivalence class has transcripts from multiple genes, we apply a minor change to the algorithm described in Section S1.1.
Updates of δ and π are still performed separately for every gene as shown in Section S1.1.
In the sampling of X, however, we modify r−1 π T (i) .j , in formula (S1), to include all transcripts from the genes in the j-th EC, with transcript level probabilities being weighted by the number of reads associated to each gene.
Assume the j-th EC has transcripts from two genes, g 1 and g 2 , with K g1 and K g2 transcripts, respectively. At the r-th iteration of the MCMC, the probability vector r−1 π where the third subscript, g 1 or g 2 , indicates the gene, and k g representing the total number of reads attributed to gene g at the r-th iteration of the MCMC. The case with 3 or more genes is an EC is a natural extension of the one presented above.

S1.3 DTU test between 3 or more groups
When comparing 3 or more groups, parameters inference, which is performed separetely for each group, is identical to the case with 2 conditions, while DTU testing differs.
For simplicity consider the case with 3 groups, denoted by letters A, B and C, with average transcript relative expressionπ T A k ,π T B k andπ T C k , respectively. For gene level testing, we consider the following system of hypothesis: with (G 1 , G 2 , G 3 ) being a permutation of the three groups (A, B, C). In other words, to test if the average transcript proportions vary between groups, we choose a baseline group and compare the other two groups against it. The posterior distribution ofω = (ω 1 , . . . ,ω 2K ) can be approximated by a normal density [7], with meanω and variance matrixΣω, both inferred from the posterior chains. A multivariate Wald test [8]   DTU testing between more than 3 groups is a natural extension of the scenario illustrated above.

S1.4 Results details
The 3 vs. 3 simulated data is taken from Soneson et al. [9] were reads were simulated using Ensembl transcriptome version GRCh37.71; when simulating reads for the 6 vs. 6 simulation study we modified the pipeline in Soneson et al. [9] and kept the same Ensembl transcriptome version (GRCh37.71). Therefore, for the simulation studies, reads were then aligned using a filtered version of the GRCh37.71 genome and transcriptome: only transcripts with gene biotype equal to "protein coding" and from "canonical" chromosomes (1 to 22, X and Y) were kept; furthermore duplicated transcripts (i.e., transcripts with exactly the same sequence) were removed.
Instead, for the experimental data analyses, we used the (unfiltered) Ensembl genome and transcriptome GRCh38.92, which was the latest version available when we ran the analyses.
BayesDRIMSeq and cjBitSeq scores represent decision rule d 3 in Papastamoulis and Rattray (2017) [10], and correspond to field FDRraw from the output files. The conservative scores BayesDRIMSeq inv and cjBitSeq inv indicate decision rule d 4 and refers to fields f drT rust and F DR, respectively, from the output files.
In the simulations study with transcript pre-filtering, we filtered isoforms based on Salmon TECs: we kept transcripts with least 10 counts (across all samples) and an average relative abundance of at least 0.01. The filtering was computed via BANDITS filter transcripts function, with parameters min transcript proportion = 0.01, min transcript counts = 10 and min gene counts = 20.
In gene-level plots, we excluded genes with less than 20 estimated counts across all samples.
In transcript-level plots, we excluded transcripts with less than 10 estimated counts across all samples, and those belonging to a gene with less than 20 counts.
When stratifying results by gene expression, we computed the overall estimated abundance of each gene, across all samples, ranked them and split them into 3 equally sized groups. For the stratification in the 6 vs. 6 simulated data, we excluded genes with less than 1,200 estimated counts (i.e., 100 per sample on average), because no genes with less than 1,200 TECs are simulated to be differentially used.
In Figure 5 of the main manuscript, the blue component of panels A and C refers to the computational cost of STAR and Salmon (for BANDITS, BayesDRIMSeq, DEXSeq ECCs, DEXSeq TECs and DRIMSeq), or STAR and Salmon with 100 bootstrap replicates (for rats).

S1.5 Visual inspection of two genes
To add biological perspective, we performed an in depth visual inspection of two genes from the "Best et al." experimental data analysis. We selected two genes with adjusted p-value of 0.00 from BANDITS, one belonging to the set of 82 validated genes (ENSG00000147679) and one not previously validated (ENSG00000184432). Tables S9 and S10 report transcript-level adjusted p-values: BANDITS identifies two differentially used transcripts for gene ENSG00000184432 (ENST00000503326 and ENST00000507777) and three for gene ENSG00000147679 (ENST00000521071, ENST00000517820 and ENST00000521703).
Figures S13 and S16 illustrate the mean transcript-level proportions estimated from BAN-DITS, with 0.95 level profile Wald-type confidence intervals, while Figures S14, S15, S17 and S18 show sample-specific coverage and junction tracks obtained via IGV software [20]. BAN-DITS proportion plots show clear differences between groups in differentially used transcripts; some of these differences can also be easily visualized on the IGV plots, particularly those involving transcripts that are almost only expressed in one group.
These examples show how BANDITS can be effectively used to identify genes with differential transcript usage, as well as the individual transcripts that are affected.  Table S1 Main features of some of the most popular methods for DS, based on RNA-seq data. In the second column: DM = dirichlet-multinomial, MN = multinomial, NB = negative-binomial and LM = linear model. In the third column: ECCs = equivalence classes counts, TECs = transcript estimated counts, EBCs = exon bin counts. Note that "mapping uncertainty modelled" is missing in "DEXSeq", "limma" and "DEXSeq on ECs" rows, because inference is performed on EBCs and ECCs. Similarly, column "ECs with > 1 gene" is only applicable to methods working with equivalence classes (ECs). Note that "¿2 group comparison" excludes models, such as SUPPA2 and limma, that perform pairwise tests between all pairs of groups.  Genes were separated in three equally sized groups according to their expression: "Low", "Mid" and "High".  Table S6 Computational cost, expressed in minutes, of each individual step. Columns "Unfiltered" and "Filtered" refer to the analyses run on the original transcriptome and on the filtered one, respectively. "Salmon" and "Salmon boot" refer to running Salmon on the transcript alignments computed from STAR; "Salmon boot" additionally computes 100 bootstrap replicates (used by rats). "DEXSeq python" indicates the python functions dexseq prepare annotation.py and dexseq count.py, while "DEXSeq R" refers to the pipeline of DEXSeq computed in R.  Table S8 Maximum RAM, expressed in gigabytes, used in each individual step. Columns "Unfiltered" and "Filtered" refer to the analyses run on the original transcriptome and on the filtered one, respectively. "Salmon" and "Salmon boot" refer to running Salmon on the transcript alignments computed from STAR; "Salmon boot" additionally computes 100 bootstrap replicates (used by rats). "DEXSeq python" indicates the python functions dexseq prepare annotation.py and dexseq count.py, while "DEXSeq R" refers to the pipeline of DEXSeq computed in R.    FDR Note that, for cjBitSeq, in both panels, we considered the probability that a transcript is not differentially used, which does not guarantee FDR control.