Skip to main content

Table 3 Summary of features investigated in this study

From: MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing

Feature

Type

Description

Distance to nearest splice site

SNP-based

Distance between a given variant and the nearest 5′ or 3′ splice site in the target exon.

ESR change

SNP-based

Change in the frequency of ESR elements subsequent to a single base substitution. This includes:

ESE to neutral (ESE loss)

ESE to ESE (no change)

Neutral to ESE (ESE gain)

ESE to ESS (ESE loss and ESS gain)

Neutral to neutral (no change)

ESS to ESS

Neutral to ESS (ESS gain)

ESS to neutral (ESS loss)

ESS to ESE (ESS loss and ESE gain)

In ESE

SNP-based

Frequency of ESE binding sites (in the wild-type) that overlap with the location of the variant

In ESS

SNP-based

Frequency of ESS binding sites (in the wild-type) that overlap with the variant

ESR hexamer score (ESR-HS)

SNP-based

Hexamer scoring function to express the relationship between disease and neutral variants and their differential distributions with respect to loss or gain of an ESE or ESS

Spectrum kernel

SNP-based

Frequency of 3-mers and 4-mers over an 11 bp window (wild type and mutant)

Change in natural splice site strength

SNP-based

MaxEnt splice site score of natural splice site in mutant allele minus MaxEnt splice site score of wild-type allele

Maximum cryptic splice site

SNP-based

Maximum cryptic splice site (5′ and 3′) score (outside of the natural splice site) found overlapping the variant on the mutant allele

Evolutionarily conserved element

SNP-based

PhastCons conserved element probability for substitution site, based on multiple alignments of 46 placental mammals

Base-wise evolutionary conservation

SNP-based

PhyloP base-wise sequence conservation score at site of single base substitution based on multiple sequence alignment of 46 placental mammals

Natural wild-type splice site strength

Exon-based

MaxEntScan score of the natural 5′ and 3′ splice site of the wild-type target exon

Flanking intron size

Exon-based

Length in base-pairs of the upstream and downstream introns flanking the target exon

Intronic ESS density

Exon-based

Intronic ESS density was calculated for 100 bp upstream and 100 bp downstream of the target exon

Exonic ESS density

Exon-based

ESS density was calculated across the first 50 bp and the last 50 bp of the target exon. If the length of the exon was less than 100 bp, then the full length of the exon was used to calculate the ESS density

Exonic ESE density

Exon-based

Same as above but for ESEs

Internal coding exon

Exon-based

{true, false}, Is the target exon an internal coding exon (that is, the target exon is not the first or last coding exon)

Exonic GC content

Exon-based

Percentage of nucleotides that are either guanine or cytosine in the target exon

Exon size

Exon-based

Size of the target exon

Constitutive exon

Exon-based

Is the target exon constitutively spliced

Exon number

Gene-based

Number of exons in the transcript

Transcript number

Gene-based

Number of different reported isoforms that the target gene encodes