Schematic overview of the read-processing steps used for our analyses. Shown is a schematic representation of a gene containing two exons and one intron. Each black line indicates a read and asterisks indicate positions of T-C substitutions. (A, B) The first step involved removal of all identical sequences in raw reads by collapsing the data (using pyFastqDuplicateRemover) and aligning the remaining cDNA sequences to the genome. (C) pyCalculateFDRs was used to calculate the minimum read coverage height required to obtain an FDR ≤0.01. (D) Contigs were generated from significantly enriched regions and T-C mutation frequencies were calculated (using pyCalculateMutationFrequencies). (E, F) We then used pyMotif to identify Nrd1-Nab3 motifs in contigs (E), and selected only those motifs where we could find at least one T-C mutation in overlapping reads (F). These are referred to as ‘cross-linked motifs’ throughout the manuscript.