Skip to main content
Fig. 6 | Genome Biology

Fig. 6

From: Modeling double strand break susceptibility to interrogate structural variation in cancer

Fig. 6

Inference of positively and negatively selected SV regions. a The predicted DSB frequencies for regions overlapping RefSeq genes, two sets of cancer consensus genes, and common fragile sites (CFS) are shown as violin plots. The stars represent significantly higher values in the region subsets, compared to genomic regions that do not overlap the given annotation set, using a Wilcox ranked sum text. b The same regions as in a, but with d-score values, a measure of the deviation of the observed breakpoint frequencies from the predicted or expected DSB frequencies. c Observed SV breakpoint frequencies for ICGC carcinomas (excluding breast cancer) with predicted DSB frequencies from the NHEK DSBCapture model. Each point represents a 50-kb region and is colored by its d-score. Regions were split into high (cancHpredL) and low (cancLpredH) d-score categories (d-score p value < 0.01), a cancHpredH category, representing regions with d-scores near zero, and a cancHpredL2 category, representing low mappability regions (see the “Methods” section). d Each category was tested for enrichment of various annotations using circular permutation (see the “Methods” section). The yellow dotted line marks p < 0.01 significance, and the numbers in parenthesis indicate the number of 50 kb regions in each category, out of 61,903 in total

Back to article page