Skip to main content

Table 1 Overview of surveyed methods

From: Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data

Name

Reference sequencea

Principle

Released

BitSeq

Transcripts

Bayesian estimation of parameters of a model that explains the read-to-transcript alignment data. Reads are assumed to be sampled independently, without positional bias from transcripts, such that the probability of an alignment starting at a given position of a transcript is inversely proportional to the transcript length. Sub-optimal alignments are used to estimate the ‘background’ of spurious alignments.

2012 [67, 68]

CEM

Genome

Component elimination expectation-maximization approach to estimating the parameters of isoform abundance. For each gene it aims to find a ‘sparse’ solution, with few expressed isoforms. Read sampling from isoforms is assumed to obey a quasi-multinomial distribution, in which positional and other biases are modeled as an effective distribution which could be, for example, uniform (no positional bias) or exponential (modeling the process of RNA degradation).

2012 [69]

Cufflinks

Genome

Bayesian approach to estimating transcript abundances by explicitly modeling the length of the fragments expected from RNA-seq. It assumes that for a given gene, reads are sampled independently with uniform probability along transcripts and in proportion to the transcript abundance between transcripts. Thus, if a read can be assigned to two transcripts of different lengths, the transcript with a shorter effective length will have a higher probability of giving rise to the read.

2010 [70]

eXpress

Transcripts

Similar to Cufflinks, but it includes modeling of errors and indels and it has a different model for fragment length selection. Unlike Cufflinks and most other methods, eXpress processes read alignments ‘on-line’ so that it can be integrated into real-time analysis pipelines.

2012 [32]

IsoEM

Genome

Expectation-maximization approach to inferring isoform abundances that are consistent with the coverage of isoforms by reads. The coverage is assumed to be uniform along an isoform. Base quality scores are taken into account in computing the probabilities of alignments. In the E-step, the expected number of reads derived from a given isoform is computed and in the M-step, the relative frequencies of isoforms are estimated.

2011 [71]

MMSeq

Transcripts

Models the read data as Poisson-distributed variables with rates that depend on the abundance of the regions of the transcripts with which the reads are compatible and on the sequence-dependent bias in capturing the sequences. Priors on transcript abundances are Gamma-distributed. Sequencing errors are not modeled, there is only a filter on the minimal quality of considered alignments.

2011 [73]

RSEM

Transcripts

Models the probability of observing a read as the sum of the relative abundance of the transcript to which the reads maps times the probability of the read mapping to the transcript, and infers transcript abundances by expectation maximization.

2009 [34, 35]

rSeq

Transcripts

Models read data as Poisson-distributed variables with rates that depend on the abundance of the regions of the transcripts with which the reads are compatible.

2009 [75]

Sailfishb

Transcripts

Expectation-maximization method for explaining the abundance of k-mers inferred from the reads in terms on the abundance of the transcripts with the associated k-mer abundances.

2014 [76]

Scripture

Genome

Transcript abundance is calculated as reads per kilobase of exonic sequence per million aligned reads, given the alignments of the reads to the genome and the annotated/reconstructed transcript.

2010 [77]

TIGAR2

Transcripts

Models the read data in terms of a large number of parameters which include, beyond the relative abundance of the transcripts, the read length distribution, the nucleotides, and alignment state and quality at the first and second position of the read.

2013 [78, 85]

  1. The columns are: method name, sequences to which reads are compared (transcripts or genome), principle of the method, year of release, and associated reference(s)
  2. aFor methods operating on the genome sequence, genome annotation files (GTF/BED-formatted) were also provided
  3. bIn contrast to other methods operating on transcripts, Sailfish uses k-mer statistics rather than aligning reads to transcripts