Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: The somatic mutation landscape of the human body

Fig. 2

Cross-tissue analysis of somatic mutations. a The total number of mutations observed in a tissue depends on the sequencing depth of that tissue. Sequencing depth is defined as the cumulative amount of uniquely mapped reads across all samples of a tissue. A linear regression line is shown in blue; tissues above it exhibit more mutations than expected by sequencing depth, while tissues below it show fewer mutations than expected. Rho is the Spearman coefficient. b Examples of significant mutation associations with age and biological sex (see Additional file 1: Fig. S4 and Additional file 6: Table S4 for all tissue data). Age ranges represent the youngest and oldest quartiles for each tissue. To control for sequencing depth and other technical artifacts, mutation values were obtained as the residuals from a linear regression (see the “Methods” section). p values are from a two-sided Mann-Whitney test. c Caucasian sun-exposed skin shows a higher percentage of C>T mutations compared to the sun-protected skin, while no such difference was seen for African-American skin. p values are from two-sided Mann-Whitney tests. d Median variant allele frequency (VAF) for each mutation type based on their impact to the amino acid sequence; error bars represent the 95% confidence interval after bootstrapping 1000 times; p values are from two-sided Mann-Whitney tests. e tSNE plot constructed from a normalized pentanucleotide mutation profile (the mutated base plus two nucleotides in each direction; see the “Methods” section for normalization details) and all samples in this study. f Average silhouette scores representing the coherence of selected groups of samples from the tSNE space in panel e; a score of 1 represents maximal clustering, whereas 0 represents no clustering (see the “Methods” section). Grouping was performed by tissue-of-origin, or multiple tissues combined (red labels). “Grouped by people” (green label) is an average silhouette score after grouping samples by their person-of-origin from 20 randomly selected people. The blue dashed line represents the average random score expectation after permuting tissue labels (see the “Methods” section), and the blue stripes are ± two standard deviations. Error bars in points represent the 95% confidence interval based on bootstrapping 10,000 times. g Mutation load is positively associated with H3K9me3 and/or negatively associated with H3K36me3 across most tissues analyzed. p values were obtained from a linear regression using all histone modifications as explanatory variables (see the “Methods” section). Gray range denotes non-significant p values after Bonferroni correction

Back to article page