TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
© Kim et al.; licensee BioMed Central Ltd. 2013
Received: 15 November 2012
Accepted: 25 April 2013
Published: 25 April 2013
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
RNA-sequencing technologies , which sequence the RNA molecules being transcribed in cells, allow exploration of the process of transcription in exquisite detail. One of the primary goals of RNA-sequencing analysis software is to reconstruct the full set of transcripts (isoforms) of genes that were present in the original cells. In addition to the transcript structures, experimenters need to estimate the expression levels for all transcripts. The first step in the analysis process is to map the RNA-sequence (RNA-seq) reads against the reference genome, which provides the location from which the reads originated. In contrast to DNA-sequence alignment, RNA-seq mapping algorithms have two additional challenges. First, because genes in eukaryotic genomes contain introns, and because reads sequenced from mature mRNA transcripts do not include these introns, any RNA-seq alignment program must be able to handle gapped (or spliced) alignment with very large gaps. In mammalian genomes, introns span a very wide range of lengths, typically from 50 to 100,000 bases, which the alignment algorithm must accommodate. Second, the presence of processed pseudogenes, from which some or all introns have been removed, may cause many exon-spanning reads to map incorrectly. This is particularly acute for the human genome, which contains over 14,000 pseudogenes .
In the most recent Ensembl GRCh37 gene annotations, the average length of a mature mRNA transcript in the human genome is 2,227 bp long, and the average exon length is 235 bp. The average number of exons per transcript is 9.5. Assuming that sequencing reads are uniformly distributed along a transcript , we would expect 33 to 38% of 100 bp reads from an RNA-seq experiment to span two or more exons. Note that this proportion increases significantly as read length increases from 50 to 150 bp (see Additional file 1 for more details).
In order to handle this problem, the software TopHat2 uses a two-step procedure. First, similar to TopHat1 , it detects potential splice sites for introns (detailed further in Methods). It uses these candidate splice sites in a subsequent step to correctly align multiexon-spanning reads. Some RNA-seq aligners, including GSNAP , RUM , and STAR , map reads independently of the alignments of other reads, which may explain their lower sensitivity for these spliced reads (see Results and discussion). MapSplice  uses a two-step approach similar to TopHat2.
RNA-seq read alignment is further complicated by the presence of processed pseudogenes in the reference genome. Pseudogenes often have highly similar sequences to functional, intron-containing genes. In most cases, the pseudogene versions are not transcribed , although this suggestion has recently been disputed . The crucial problem for alignment is that reads spanning multiple exons can be mapped perfectly or near-perfectly to the pseudogene version of a functional gene. For example, suppose a read r spans two exons of a given gene. If the aligner tries to align the read globally (end to end), then it will find an alignment to the pseudogene copy (Figure 1). If the spliced alignment phase, which usually occurs later, does not attempt to realign r, then the pseudogene copy will 'absorb' all reads spanning the splice sites for that gene. TopHat2 can feed r into the spliced alignment phase even when r has been aligned end to end, allowing it to circumvent this problem (see Methods and Results and discussion sections).
We also note that, in our analysis of RNA-seq reads from multiple human samples [11, 12], genes with processed pseudogenes seem to be expressed at higher levels compared with other genes (see Results and discussion). Although this observation has not been explored thoroughly, a plausible explanation is that genes with higher levels of expression may, over the course of evolution, have had an increased chance of being picked up by transposons and re-integrated into the genome, creating pseudogene copies.
Concerning the human genome, for which there are relatively comprehensive annotations of protein-coding genes, the annotations can be used to map reads more accurately by aligning the reads preferentially to real genes rather than pseudogenes. GSNAP  and STAR  also make use of annotation, although they use it in a more limited fashion in order to detect splice sites. TopHat2 can use the full-length transcripts defined by annotations during its initial mapping phase, which produces significant gains in sensitivity and accuracy (see Results and discussion).
Transcripts from a target genome may differ substantially from the reference genome, possibly containing insertions, deletions, and other structural variations [13, 14]. For such regions, previous spliced alignment programs (including the original TopHat) sometimes fail to find a proper alignment. In TopHat2, we implemented new procedures to ensure that reads are aligned with true insertions and deletions (indels). Indels due to sequencing errors will be discovered by Bowtie2 , the underlying mapping engine of TopHat2, which can detect short indels very efficiently. In addition, very large deletions, inversions on the same chromosome, and translocations involving different chromosomes are detected by the TopHat-Fusion algorithms , which are now incorporated into TopHat2 and available by a simple command-line switch.
TopHat2 also includes new algorithms to handle more diverse types of sequencing data. This includes the ability to handle reads generated by ABI SOLiD technology (Life Technologies, Carlsbad, CA, USA) using its 'color space' representation. To accomplish this, TopHat2 uses a reference genome translated entirely into color space in order to take advantage of the error-correction capability of that format. TopHat2 also handles datasets in which the reads have variable lengths, allowing the experimenter to merge datasets from multiple sequencing runs with different lengths.
Results and discussion
TopHat2 can use either Bowtie  or Bowtie2  as its core read-alignment engine. TopHat2 has its own indel-finding algorithm, which enhances indel-finding ability of Bowtie2 in the context of spliced alignments. In order to evaluate TopHat2 and compare it with other methods, we ran multiple computational experiments using both real and simulated RNA-seq data.
For the simulations, we created multiple sets of 40,000,000 paired-end reads, 100 bp in length, from the entire human genome (release GRCh37). Instead of trying to precisely mimic real RNA-seq experiments, which may not be possible in any practical sense, we generated data with relatively simple settings and expression levels, calculated using a model from the Flux Simulator system , as follows. For the first test set, we generated reads from the known transcripts on the entire human genome without introducing any mismatches or indels. We then generated additional datasets, in which we included 1) insertions and deletions into the known transcripts at random locations, and 2) insertions and deletions in the reads themselves to mimic sequencing errors (see Additional file 1). Each of these experimental errors was introduced to test different capabilities of TopHat2 and other RNA-seq aligners. Following the simulations, we evaluated the programs using a recent, real RNA-seq dataset.
Alignments of simulated reads (error-free)
Performance of TopHat2 and other spliced aligners on a set of 20 million 100-bp, single-end reads, simulated based on transcripts from the entire human genome.
No. of mapped reads
Correctly mapped reads, %
Incorrectly mapped reads, %
Unmapped reads, %
Correct junction reads, %a
Correct short-anchored reads, %b
TopHat2 + Bowtie1
TopHat2 + Bowtie2
Performance of TopHat2 and other spliced aligners on a set of 20 million pairs of 100-bp reads, simulated based on transcripts from the entire human genome.
No. of mapped pairs
Correctly mapped pairs, %
Incorrectly mapped pairs, %
Unmapped pairs, %
Correct junction pairs, %a
Correct short-anchored pairs, %b
TopHat2 + Bowtie1
TopHat2 + Bowtie2
We also tested 20,000,000 read pairs (40,000,000 reads), of which 9,491,394 (47.5%) had at least one read spanning multiple exons; 2,702,624 of these pairs (13.5%) had at least one short-anchored read that extended by10 bp or less into one of its exons. Table 2 shows the results of mapping these reads with TopHat2 and other programs.
As shown in Table 1, TopHat2 correctly aligned more than 98% of the reads, which was higher than with any of the other methods, whose accuracy ranged from 88 to 97%. The difference was more pronounced for junction reads, with TopHat2 being able to align more than 94% of the reads, whereas the other methods ranged in accuracy from 65 to 92%. GSNAP, RUM, and STAR had particular difficulty aligning short-anchored reads, aligning only 26%, 8.6%, and 3.5%, respectively, while MapSplice performed considerably better, aligning 75.6% of these reads. By contrast, using Bowtie1 as its main aligner, TopHat2 aligned 93.7% of the short-anchored reads (Table 1). Both TopHat2 and MapSplice use a two-step algorithm, first detecting potential splice sites, and then using these sites to map reads. This two-step method may explain their superior performance at mapping reads with short anchors.
The results for the paired reads (Table 2) were similar to those for the unpaired reads. TopHat2 aligned the highest percentage of reads (96.7%), followed by MapSplice (92%), with the other methods ranging from 79 to 88%. The difference widened again for junction reads, with TopHat2 aligning 93%, followed by MapSplice (86%), GSNAP (76%), STAR (69%), and RUM (56%). Most striking of all was the performance on short-anchored reads, which most of the methods had great difficulty aligning correctly; TopHat2 aligned 90% of these, MapSplice aligned 72%, and the other methods aligned only 3 to 22%.
We assessed alignment rates for reads, junction reads, and junction reads with small anchors for a variety of read lengths (50, 100, 150, and 200 bp) (see Additional file 1, Figure S1). TopHat2 consistently outperformed all the other aligners for each read length. Comparing the alignment performance for junction reads with one to three mismatches, TopHat2 and MapSplice showed the highest recall rates (see Additional file 1, Table S2).
Alignments of simulated reads with short indels
We tested the spliced alignment programs using reads with small indels (1 to 3 bp), using two sets of simulated reads: (1) true indels, in which the transcripts were modified by inserting or deleting one to three bases at random locations; and (2) indels caused by sequencing errors, in which indels are randomly inserted into the reads. As before, all transcripts were simulated from known genes from the entire human genome. We intentionally used a relatively high rate of indels to test the mapping capabilities of the programs in the presence of these types of mutations.
Performance of TopHat2 and other spliced aligners on single-end reads containing insertions and deletions (indels) of 1 to 3 bp.
Reads with true indels (1,428,499)a
Reads with sequencing-error indels (1,525,657)a
Accuracy on 351,465 reads with boundary indels, %b
Accuracy on 357,334 reads with boundary indels, %b
Performance of TopHat2 and other spliced aligners on paired reads in which at least one of the reads contained insertions and deletions (indels) of 1 to 3 bp.
Pairs with true indels (2,754,313)a
Pairs with sequencing-error indels (2,934,043)a
Accuracy on 685,937 pairs with boundary indels, %b
Accuracy on 695,771 pairs with boundary indels, %b
TopHat2 + Bowtie2
For paired-end reads with indels, GSNAP had the highest rate of correct alignments (77%), followed by TopHat2 (60 to 69%), RUM (60 to 61%), and STAR (53 to 54%). MapSplice showed the lowest accuracy for both single-end and paired-end reads.
Alignment of a large set of real RNA-seq reads
Any test of alignment algorithms should use real data to provide a measure of the likely performance in practice. For these experiments, we used a recently released set of RNA-seq reads gathered across a time-course experiment reported by Chen et al. (; GEO accession number: GSM818582). These data comprise 130,705,578 million paired-end reads in 65,352,789 pairs. All reads are 101 bp in length.
Because we did not know the true alignments for this RNA-seq dataset, we used the following objective criteria to evaluate each program: 1) the cumulative number of alignments with edit distances of 0, 1, 2, and 3 for each read; and 2) the cumulative number of spliced alignments that agree with the annotation for the corresponding human genes, taken from the Ensembl GRCh37 release of the human genome.
For each program, we aligned the paired-end reads with and without the known gene annotations, where possible. RUM is designed to run with these annotations, whereas MapSplice maps strictly without them. We then evaluated the mapping results in terms of the number of read or paired-read mappings.
Figure 2 shows the alignment performance for each program both with and without using annotations, where all the programs were configured to report alignments with edit distances of up to 3 (and more in some programs). We compared the de novo alignments of reads for edit distances of 0, 1, 2, and 3. As expected, all programs found more alignments as the maximum permissible edit distance increased. For an edit distance of 0, which allows only perfect matches, TopHat2 mapped noticeably fewer reads without its new realignment function than it did with the function. This occurs because TopHat2 first aligns reads end to end with Bowtie2 before trying spliced alignments. Thus if a read is aligned end to end with, for example, one to three mismatches, then without the realignment function, TopHat2 accepts that alignment and may miss a spliced alignment with fewer mismatches.
By contrast, TopHat2 with t = 0 mapped the largest number of reads for all edit distances, followed in most cases by GSNAP. Note that for alignments with an edit distance of up to 3, TopHat2 without realignment discovered almost as many alignments as GSNAP.
When alignment methods were run with the assistance of gene annotations (Figure 2, right panel), the results were somewhat better than the de novo alignments. TopHat2 with or without realignment produced the highest number of mappings, followed by GSNAP, RUM, and STAR. The realignment procedure gave a much smaller advantage to TopHat2 in these experiments.
One way to estimate the accuracy of mappings is to compare alignments to known splice sites. We compared all aligners on only those reads that required splitting, counting how many known (Figure 3, left) and known plus novel (Figure 3, right) splice sites they identified. For de novo alignment, TopHat2 with realignment had the highest sensitivity, followed by MapSplice. Consistent with our tests on simulated reads, GSNAP and STAR showed relatively lower alignment rates. When using annotation, TopHat2 without realignment showed the highest mapping rate, slightly outperforming TopHat2 with realignment. GSNAP and STAR, which performed less well, map reads against substrings containing splice sites rather than whole transcripts. Direct mapping against whole transcripts, as TopHat2 does, worked well, especially when mapping reads spanning small exons, where a single read might span more than two exons.
Based on these results, we suggest two alternative strategies for alignment with TopHat2. First, if gene annotations are available, as they are for the human genome and some model organisms, then these annotations should be used with TopHat2, even without realignment. Alternatively, if annotations are unavailable or incomplete, then we recommend using TopHat2 with its realignment algorithm to produce the most complete set of alignments.
The run time and the peak memory usage of the programs used in this study varied greatly. We compared performance on all programs using the Chen et al. data  of 130 million reads (see Additional file 1, Table S8). Overall, STAR is much faster (32 minutes) than the other programs, which required from 8 to 55 hours. However, STAR requires a large amount of real memory, at least 28 GB, whereas most of the other programs required less than 8 GB.
Effects of pseudogenes on RNA-seq mapping
Expression levels of genes with pseudogene copies from Chen et al. .a
Number of pseudogene copiesb
Gene with pseudogene copies (%)c
Normalized count, %f
Total (≥ 1)
The completeness of human gene annotations
Discovery of new genes and transcripts is a major objective in many RNA-seq experiments. Deep RNA-seq experiments continue to uncover previously unseen elements of the transcriptome, even in well-studied organisms. Mapping reads to the genome is a core step in such screens, and the accuracy of mapping software can determine the accuracy of downstream steps such as gene and transcript discovery or expression quantification.
We have described TopHat2, which provides major accuracy improvements over previous versions and over other RNA-seq mapping tools. Because TopHat2 is built around Bowtie2, it can now align reads across small indels with high accuracy, a feature crucial for studies assessing the effects of genetic mutations on gene and transcript expression. TopHat2 is engineered to work well with a wide range of RNA-seq experimental designs, and it is optimized for the widely available long paired-end reads. These reads pose new challenges because they can span multiple splice sites rather than just one or two; we estimate that nearly half of reads 150 bp long will span two or more human exons. The algorithmic improvements in TopHat2 address this challenge, maintaining both accuracy and speed. Other refinements to the algorithm increase accuracy for reads that span a junction with only a small (≤ 10 bp) overhang, reducing errors in downstream transcript assembly using tools such as Cufflinks. TopHat2 also makes powerful use of available gene annotations, which allow it to avoid erroneously mapping reads to pseudogenes, and generally improve its overall alignment accuracy. Annotation also allows TopHat2 to better align reads that cover microexons, non-canonical splice sites, and other 'unusual' features of eukaryotic transcriptomes.
We have shown that TopHat2 performs well over a wide range of read lengths, making it a good fit for most RNA-seq experimental designs. This scalability suggests that as read lengths grow, TopHat2 will continue to report accurate, sensitive alignment results and allow for robust downstream analysis. We believe that TopHat2 reports more accurate alignments than competing tools, using fewer computational resources. RNA-seq experiments are becoming increasingly common and are now routinely used by many biologists. We expect that TopHat2 will provide these scientists with accurate results for use with expression analysis, gene discovery, and many other applications.
After the transcriptome-mapping step, some reads remain unmapped because they are derived from unknown transcripts not present in the annotation, or because they contain many miscalled bases. In addition, there may be poorly aligned reads that have been mapped to the wrong location. TopHat2 aligns these unmapped or potentially misaligned reads against the genome (Figure 6, step 2). Any reads contained entirely within exons will be mapped, whereas other spanning introns may not be.
Using unmapped reads from step 2, TopHat2 tries to find novel splice sites that are based on known junction signals (GT-AG, GC-AG, and AT-AC). TopHat2 also provides an option to allow users to remap some of the mapped reads, depending on the edit distance values of these reads; that is, those reads whose edit distance is greater than or equal to a user-provided threshold will be treated as unmapped reads. To accomplish this, the unmapped reads (and previously mapped reads with low alignment scores) are split into smaller non-overlapping segments (25 bp each by default) which are then aligned against the genome (Figure 6, step 3). Tophat2 examines any cases in which the left and right segments of the same read are mapped within a user-defined maximum intron size (usually between 50 and 100,000 bp). When this pattern is detected, TopHat2 re-aligns the entire read sequence to that genomic region in order to identify the most likely locations of the splice sites (Figure 6). Using a similar approach, indels and fusion breakpoints are also detected in this step.
The genomic sequences flanking these splice sites are concatenated, and the resulting spliced sequences are collected as a set of potential transcript fragments. Any reads not mapped in the previous stages (or mapped very poorly) are then re-aligned with Bowtie2  against this novel transcriptome.
After these steps, some of the reads may have been aligned incorrectly by extending an exonic alignment a few bases into the adjacent intron (Figure 1; Figure 6, steps 3 to 5). TopHat2 checks if such alignments extend into the introns identified in the split-alignment phase; if so, it can realign these reads to the adjacent exons instead.
In the final stage, TopHat2 divides the reads into those with unique alignments and those with multiple alignments. For the multi-mapped reads, TopHat2 gathers statistical information (for example, the number of supporting reads) about the relevant splice junctions, insertions, and deletions, which it uses to recalculate the alignment score for each read. Based on these new alignment scores, TopHat2 reports the most likely alignment locations for such multi-mapped reads.
For paired-end reads, TopHat2 processes the two reads separately through the same mapping stages described above. In the final stage, the independently aligned reads are analyzed together to produce paired alignments, taking into consideration additional factors including fragment length and orientation.
For the experiments described in this study, the program version numbers were: TopHat2 2.0.8, TopHat1 1.1.4, GSNAP 2013-01-23, RUM 1.12_01, MapSplice 1.15.2, and STAR 2.3.0e. For the specific parameters for each program, see Additional file 1, Table S9, and for the source code of TopHat 2.0.8, see Additional file 2.
Insertion or deletion
We thank Lior Patcher and Adam Roberts for their invaluable contributions to our discussions on the TopHat2 pipeline. This work is supported in part by the National Human Genome Research Institute (NIH) under grants R01-HG006102 and R01-HG006677.
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.PubMedView ArticleGoogle Scholar
- Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB: The GENCODE pseudogene resource. Genome Biol. 2012, 13: R51-10.1186/gb-2012-13-9-r51.PubMedPubMed CentralView ArticleGoogle Scholar
- Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 2011, 12: R22-10.1186/gb-2011-12-3-r22.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMedPubMed CentralView ArticleGoogle Scholar
- Wu TD, Nacu S: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010, 26: 873-881. 10.1093/bioinformatics/btq057.PubMedPubMed CentralView ArticleGoogle Scholar
- Grant GR, Farkas MH, Pizarro AD, Lahens NF, Schug J, Brunk BP, Stoeckert CJ, Hogenesch JB, Pierce EA: Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics. 2011, 27: 2518-2528.PubMedPubMed CentralGoogle Scholar
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013, 29: 15-21. 10.1093/bioinformatics/bts635.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010, 38: e178-10.1093/nar/gkq622.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Harrison PM, Liu Y, Gerstein M: Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 2003, 13: 2541-2558. 10.1101/gr.1429003.PubMedPubMed CentralView ArticleGoogle Scholar
- Kalyana-Sundaram S, Kumar-Sinha C, Shankar S, Robinson DR, Wu YM, Cao X, Asangani IA, Kothari V, Prensner JR, Lonigro RJ, Iyer MK, Barrette T, Shanmugam A, Dhanasekaran SM, Palanisamy N, Chinnaiyan AM: Expressed pseudogenes in the transcriptional landscape of human cancers. Cell. 2012, 149: 1622-1634. 10.1016/j.cell.2012.04.041.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HY, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, Cheng Y, Clark MJ, Im H, Habegger L, Balasubramanian S, O'Huallachain M, Dudley JT, Hillenmeyer S, Haraksingh R, Sharon D, Euskirchen G, Lacroute P, Bettinger K, Boyle AP, Kasowski M, Grubert F, Seki S, Garcia M, Whirl-Carrillo M, Gallardo M, Blasco MA, et al: Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012, 148: 1293-1307. 10.1016/j.cell.2012.02.009.PubMedPubMed CentralView ArticleGoogle Scholar
- The Illumina Body Map 2.0 data. [http://www.ebi.ac.uk/arrayexpress/browse.html?keywords=E-MTAB-513&expandefo=on]
- Xing J, Zhang Y, Han K, Salem AH, Sen SK, Huff CD, Zhou Q, Kirkness EF, Levy S, Batzer MA, Jorde LB: Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 2009, 19: 1516-1526. 10.1101/gr.091827.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, et al: The diploid genome sequence of an individual human. PLoS Biol. 2007, 5: e254-10.1371/journal.pbio.0050254.PubMedPubMed CentralView ArticleGoogle Scholar
- Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012, 9: 357-359. 10.1038/nmeth.1923.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim D, Salzberg SL: TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011, 12: R72-10.1186/gb-2011-12-8-r72.PubMedPubMed CentralView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMedPubMed CentralView ArticleGoogle Scholar
- Griebel T, Zacher B, Ribeca P, Raineri E, Lacroix V, Guigo R, Sammeth M: Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic acids Res. 2012Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License