Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Modeling double strand break susceptibility to interrogate structural variation in cancer

Fig. 3

Modeling accuracy and the polarity of genomic features. a NHEK DSBCapture 50 kb regions data is split into three distinct groups with differing modeling accuracies. b, c The values of the model features for the two boxes, a and b, and for group c, which contains randomly chosen points along the spectrum of DSB frequency values for the majority of the genome. The columns are ordered by observed DSB frequency, shown on the top row, and the rows for features used to build the model (the third to second to the last row) are ordered by average variable importance. The number of 50 kb regions in each group is shown in parenthesis above each heatmap. Each feature was normalized, setting the 1st to 99th quantiles to values between 0 and 1, with high outliers (in the top percentile) set to 1.1. b Group A has high H3K9me3 and low mappability scores, indicative of heterochromatin and repetitive sequence, while b has feature patterns that closely match low DSB values in group c. c For most of the genome, high H3K9me3 corresponds to low DSB regions, and high, or early, replication timing values and open chromatin values signify high DSB regions

Back to article page