From: DE-kupl: exhaustive capture of biological variation in RNA-seq data through k-mer decomposition

The diversity of 31-mers in RNA-seq libraries exceeds that of reference sequences. a Intersection of k-mers present in GENCODE transcripts and RNA-seq data from three tissues: bone marrow, skin, and colon. The set of k-mers for each tissue was defined as the set of k-mers shared by all six individuals. b Intersection of k-mers present in GENCODE transcripts, the reference human genome (GRCh38), and RNA-seq data (same as in a). b1 Distribution of k-mer abundances for each tissue represented in a and b. k-mers shared with GENCODE are labeled as GENCODE. Among other k-mers, those shared with the human genome are labeled as GRCh38. The remaining k-mers are labeled as tissue-specific. The same procedure was applied in b2 and b3. b2 Repartition of k-mer diversity for each tissue. b3 Mapping statistics of k-mers labeled as tissue-specific in b2. These k-mers were first mapped to GENCODE transcripts, and unmapped k-mers were then mapped to the GRCh38 reference using Bowtie1, with a tolerance of up to two mismatches in a 31-mer

