Skip to main content

Table 2 Summary of the statistical approaches for clonality analysis from bulk DNA sequencing

From: Evaluating statistical approaches to define clonal origin of tumours using bulk DNA sequencing: context is everything

Author/publication

Technology, type of genomic profile

Tissues (types of tumour), Samples, sample types, treatment types

Method 1

Method 2

Method 3

Preferred method by the author(s)

Significance

Waldman et al. 2000 [4]

CGH arrays, SCNA

FFPE (DCIS), n=18, primary DCIS and recurrent/new primary DCIS, BCS w (n=3)/wo radiotherapy (n=15, n=1 unknown);

1 mm surgical margin (n=9), positive margin (n=6, n=3 unknown)

%concordance between pairs based on only SCNA

(see text for details)

Unsupervised hierarchical clustering based on pairwise similarities of overall SCNA

In house developed similarity score based on CNA of each chromosome arm with greater weight being given to agreement when a gain or loss was rare than when it was common (see text for details)

None

17/18 were classified as pairs based on Method 1 and 2; 16/18 were classified as pairs based on Method 3; 1 discordant case was common among all methods

Teixeira et al. 2004 [13]

CGH arrays, SCNA

FF (IBC), n=12, ipsilateral/bilateral DCIS/LCIS and/IBC/ILC, N/Av

Probabilistic model derived by Kuukasjarvi et al. 1997 [40]

Probability = dividing number of occurrences of that particular genetic alteration/number of tumours analysed

Unsupervised hierarchical clustering: analysis were complete linkage and Pearson correlation: dendrogram was drawn based on overall SCNA or changes in chromosome arm

N/A

None

All cases were concordant between these two methods except one (patient 12 presented with both ipsilateral and bilateral tumours i.e. two in each breast)

Bollet et al. 2008 [6]

SNP arrays, SCNA

FF (IBC), cases n=22, control n=44, ipsilateral recurrent/new primary IBC, BCS w/wo radiotherapy

Hierarchical clustering of overall SCNA profile;

Pearson correlation was used to derive a dendrogram

Shared breakpoints; M score, partial identity score with a high cut off value (see text for details)

Clinical definition with matched histopathological subtypes, location of the recurrent tumour, grades and hormonal status (see text for details)

Shared breakpoints/partial identity score outperforms the clinical definition/overall SCNA

Method 2 provides significant difference for metastasis free survival than Method 3 (p=0.01)

Clonality R Package 2011 [27]

CGH arrays (SCNA, LOH),

Mutation analysis (NGS)

Publicly available data as well as own cohort: FF, FFPE: IBC, lobular Carcinoma in situ [5, 26]

SCNA (chromosome arm as a unit of analysis), LOH analysis, shared mutations

N/A

N/A

N/A

Statistical approach deriving P value for each tumour pair separately for CN and mutations.

Updated Clonality Package 2019-2020 [42]

Mutation analysis (NGS)

As above

Estimated marginal probability of occurrence of a shared mutation in TCGA,

Test developed only for metastasised tumours at a different site [43]

N/A

N/A

N/A

Updated R package estimating individual probability for clonal relatedness.

Newburger et al. 2013 [44]

WGS (median 53.4x), somatic SNVs, aneuploidy

FFPE (matched normal, early neoplasia w/wo atypia, carcinoma), n=6, synchronous breast early neoplasia with IBC w/wo DCIS

Using shared somatic mutation, lineage trees were built. Raw reads were aligned to the UCSC build hg19 and SNVs were called using GATK

N/A

N/A

N/A

IDC had 2.5x private somatic SNVs than the early neoplasia, and 10x than normal tissues; 4/6 cases early neoplasia shared a common ancestor (neoplasia and IDC shared a significant number of SNVs); genome of shared ancestor are already aneuploid.

Weng et al. 2015 [45]

TSP, SNV

FFPE (synchronous early breast neoplasia with IBC and/ or DCIS); n=6

Highly accurate VAF of phylogenetically informative SNVs was used to build high resolution lineage trees.

N/A

N/A

N/A

Atypical hyperplasia (AH) and DCIS/IBC shared a most common ancestor while DCIS/IDC have more private SNVs than AH. AH also has a greater mutation burden than typical ductal hyperplasia lesions.

Schultheis et al. 2016 [7]

WES (105x n=5), TSP (453x; n=18): method validation b/w WES & TSP (n=5); mutational landscape

FFPE (synchronous endometrioid endometrial and endometrioid ovarian carcinoma) n=23, metastasis/synchronous/independent primary tumour

CI: based on only nonsynonymous and synonymous mutations and compared with the frequency in TCGA EEC dataset of a given mutation: ≤20% frequency of any given shared mutation in TCGA is sufficient to call the case to be clonal CI ≥ 0.8

CI2: based on somatic mutations and their frequency in TCGA or a given cohort

N/A

None: CI and CI2 provide concordant results: CI was used by others in multiple studies [8, 46, 47]

CI was discordant with clinical definitions (22/23 was clonal vs 15/23 was independent, 8/23 was metastasis, respectively).

Biermann et al. 2018 [12]*

aCGH, SNP arrays (SCNA), DNA methylation, Whole RNA sequencing

FFPE (IBC), n=37, independent/new primary tumour (metachronous/synchronous)

-Similarity Index (SI) (SCNA) = Shared changes/shared + unique + opposite

-Unsupervised hierarchical clustering (SCNA, LOH)

-Distance measure: Euclidean distances

-Shared Segment analysis: The breakpoint and CN of each segment was compared between pairs

-Shared mutation

Clonality R Package for SCNA, LOH, mutational data

SI

-Discordance between methods with clinical definition

- Clonality analysis by SI are in agreement with other approaches except 5 patients

Roth et al. 2014 [48]

WGS, WES (>100×)

1000 Genomes Project samples

Hierarchical Bayes statistical model: PyClone estimates the clonal architecture and composition using somatic mutant allelic fractions adjusted for sequencing errors, tumour cell content, ploidy and local CN profile

N/A

N/A

N/A

Subclonal reconstruction utilising somatic mutations

Deshwar et al. 2015 [49]

WGS

Simulated data

PhyloWGS: subclonal reconstruction using both somatic mutations and SCNA

N/A

N/A

N/A

Incorporates critical contribution of SCNA for subclonal reconstruction

Kaufmann et al. 2021 [11]

High depth WGS

Pan-cancer Analysis of Whole Genomes, n=2778

MEDICC2: defines minimum event distance between pairs of SCNA profiles and uses neighbour-joining to infer relatedness

N/A

N/A

N/A

Incorporates whole genome doubling events

  1. N/A not applicable, WES whole exome sequencing, TSP targeted sequencing panel, WGS whole genome sequencing, SCNA somatic copy number alterations, SNV single nucleotide variants, VAF variant allele frequencies, BCS breast conserving surgery, CI Clonality index, NGS next-generation sequencing, FFPE formalin-fixed paraffin-embedded, DCIS ductal carcinoma in situ, IBC invasive breast cancer. *Methods are described only for SCNA, LOH and mutational data. MEDICC minimum event distance for intra-tumour copy-number comparisons