Natural encoding of RBP-bound sites and graph-kernel features. (A) The region identified in the CLIP-seq experiment (yellow) is symmetrically extended by 150 nucleotides to compute representative secondary structure information. (B) The RNA secondary structure of each RBP-bound context is represented as a graph. Additional information on the type of substructures (that is whether a group of nucleotides is located within a stem or within one of the loop types) is annotated via a hypergraph formalism. (C) A very large number of features is extracted from the graphs using a combinatorial approach. A valid feature is a pair of small subgraphs (parametrized by a radius R) a small distance apart (parametrized by a distance D). The feature highlighted in orange is an example of a feature that can account for the simultaneous interdependencies between sequence and structure information at different locations. CDS, coding sequence; CLIP-seq, cross-linking and immunoprecipitation sequencing; nt, nucleotide; RBP, RNA-binding protein.