Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: SeqOthello: querying RNA-seq experiments at scale

Fig. 4

An illustration of fusion calling criteria using SeqOthello’s query results against TCGA RNA-seq data. a, b Examples of k-mer hit distribution as a result of fusion junction sequence query using SeqOthello. The presence of a small set of k-mers in large fraction of samples indicates background noise as a result of these k-mers being repetitive. For each fusion, we use δ98th, the k-mer hit at 98th percentile as an estimation of background noise. a Histogram of k-mer hits querying junction sequence spanning chr21:42880008-chr21:39956869 connecting gene pair TMPRSS2-ERG. The background noise is estimated at δ98th = 2. b Histogram of k-mer hits querying junction sequence spanning chr5:134688636-chr5:179991489 connecting gene pair H2AFY-CNOT6. The background noise is estimated at δ98th = 6. c The comparison of performance in recovering database-known fusion occurrences and detecting novel occurrences between noise-aware approach and SBT-like approach using θ-based containment query. Here μ is the minimum number of k-mer hits required beyond the fusion-specific noise level used in the noise-aware approach. The change in μ between two adjacent points is 1; θ is the minimum fraction of k-mer hits required to call the presence of a query as used in SBT containment query. The change in θ between two adjacent points is 0.05. d The distribution of the actual k-mer hits of all fusion occurrences called with the noise-aware approach

Back to article page