SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization
© Marth et al.; licensee BioMed Central Ltd. 2014
Received: 8 May 2014
Accepted: 31 July 2014
Published: 26 August 2014
Many tumors are composed of genetically divergent cell subpopulations. We report SubcloneSeeker, a package capable of exhaustive identification of subclone structures and evolutionary histories with bulk somatic variant allele frequency measurements from tumor biopsies. We present a statistical framework to elucidate whether specific sets of mutations are present within the same subclones, and the order in which they occur. We demonstrate how subclone reconstruction provides crucial information about tumorigenesis and relapse mechanisms; guides functional study by variant prioritization, and has the potential as a rational basis for informed therapeutic strategies for the patient. SubcloneSeeker is available at: https://github.com/yiq/SubcloneSeeker.
Identifying the few genetic changes that drive chemo-resistance or metastasis from hundreds or thousands of somatic variants found in whole-exome or whole-genome sequencing , of matched tumor-normal patient tissue samples is a daunting task. Current variant prioritization approaches examine predicted variant impact in candidate genes, or deploy pathway analysis to narrow down the long list of candidate mutations to a manageable number . Here we report an alternative approach to variant prioritization, exploiting the patterns of genetic heterogeneity often observed in diverse types of cancers.
The presence of such genetically divergent subpopulations of cells within a single tumor mass has been reported in various tumor types -. In contrast to normal tissue, in which the same germline mutation is present in every cell, a somatic mutation may be present in some, but not all, cancer cells within a tumor biopsy as a result of rapid mitotic growth and continuous selection. With multiple groups of somatic mutations present at different cellular frequencies, the tumor mass consists of distinct populations of cells, or tumor subclones, with each subclone harboring a specific subset of the mutations. The ability to delineate each such clonal subpopulation, determine its frequency within the tumor mass, and to infer the evolutionary relationships among subclones allows one to determine the order in which the mutation events occurred, and permits the identification of those mutations that are most likely to play a part in tumorigenesis, drug response, relapse, and metastasis.
Earlier studies have attempted to reconstruct subclonal structure with many different methods typically tailored to their specific study designs. These methods fall into distinct classes including: (1) cell genotype profiling using in situ hybridization ,; (2) identifying distinct allele frequency (AF) modals by clustering, followed by subclone structure reconstruction via visual inspection of the data and manual reasoning -; (3) phylogenic reconstruction based on single-cell PCR or sequencing-based profiling -; and (4) phylogenetic reconstruction using biopsies gathered from multiple metastases -. While each method adequately addressed the dataset in which it was applied, neither provided a sufficiently general framework for subclone reconstruction from somatic variation data. The work we are presenting is focused on automating the ‘reasoning’ step that starts with somatic variants from matched tumor/normal tissues of a single cancer patient, as well as additional tissues (for example, relapse, metastasis) if available, and ends in the enumeration of possibly multiple subclone structures consistent with the input data, and additional derived information that may be useful for variant prioritization or guiding treatment. The main difficulty of subclone reconstruction is the fact that the AFs measured in a large population of tumor cells, as is the case in ‘bulk tissue’ tumor sequencing or microarray genotyping experiments, do not retain the underlying linkage information that exists between individual somatic events, that is, whether or not two or more mutation events are present within the same cell. Unfortunately, given n mutation events, there are in total n! possible subclone structures, and often a large number of these can account for the AF measurements equally well. This makes it very difficult or impossible to unambiguously reconstruct subclone evolution from per-locus AF observations. To address these challenges, computational methods have been recently developed for tumor tissue purity estimation (that is, partitioning tumor cell populations into a mixture of normal and tumor subpopulations), using microarray - or sequencing data -. Even more recently, multiple algorithms to reconstruct clonal structures were developed. These algorithms either exploit specific biological assumptions  to choose between many mathematically equivalent structures; or by using statistical sampling procedures  to explore the solution space of all possible subclone structures. Both of these methods require high-precision AF measurements of one specific variant type: somatic single nucleotide variants or SNVs, and (presumably because of the computational complexity involved) only produce results for up to a few input sites (see Supplemental Result 1 in Additional file 1, and the datasets used in Additional files 2 and 3). Other approaches utilize maximum likelihood mixture decomposition on CNV data input ; jointly estimate subclone genotypes with only SNV  or with both CNV and SNV data , but without requiring that the subclones they infer fit within a consistent phylogeny; or model the possibly multi-furcating tumor phylogeny with a bifurcating tree, without the ability to consider multiple tumors from a single patient (such as primary / relapse pairs) . There have also been several methods developed in the context of transcriptome data, which are summarized in a recent review article .
Here we present a more general approach based on a strategy that is able to accept many types of somatic variation data (for example, SNVs, or copy number variations from sequencing or microarray datasets, Figure S2 and S3 in Additional file 1. Refer to Additional file 4 for sample datasets and scripts) as input. Out method enumerates all possible subclone structures that are consistent with the bulk AF measurements from the input data. It is capable of reducing this solution space significantly, often to a single, unique solution when data from multiple tumor biopsies such as primary and relapse from the same patient are available. In the event that more than a single alternative subclone structure still remains after such trimming, it is often possible to derive high-confidence linkage information between subsets of loci based on the consensus of all remaining structures. In such cases, we focus not on efforts to disambiguate mathematically equivalent solutions, but rather on using the complete set after our pruning procedure in a statistical framework to determine, for example, the probability that two given mutations are present within the same subclone (mutation co-localization), or that a given mutation pre-dates another one (mutation order). Such co-localization information may reveal, for example, that two distinct mutations that each sensitizes the cancer cells to specific drugs are, in fact, present on a single subclone. Given the high incidence and therapeutic challenges posed by chemoresistant tumors, knowledge of mutation co-localization may allow for more accurate and potentially more efficacious targeted therapeutics aimed at countering or preventing chemoresistance. Moreover, if such a novel mutation in a chemo-resistant tumor is present in every cell of the relapse sample, it may be a top candidate in the search for a mutation driving chemo-resistance.
Results and discussion
Our computational procedure for subclone structure analysis
A unified framework for subclone structure reconstruction that incorporates all types of genomic variants
We define a subclone as a collection of cells in the tumor sample that harbor the same set of genomic variants, including SNVs, structural variations (SV), copy number variations (CNV), loss of heterozygosity (LOH), and so on. The only requirement for a data type to be included in the analysis is the ability to derive the fraction of the cells within the tumor sample in which this mutation is present, a quantity that has also been referred to as ‘cell prevalence’ (CP) . In a simplified example, a heterozygous SNV in a copy number neutral region with an AF of 30% would correspond to a CP of 60% (Figure 1A). The estimation of CP is no trivial task, especially for SNVs falling into regions of CNV, because the same measured allele frequency results in different CP value depending on the absolute copy number state in the region. A number of tools have been developed to facilitate CP calculation, including ASCAT  and ABSOLUTE , which estimates the absolute copy number states of CNV regions, and PyClone , which estimates CP from SNV allele frequency while taking into account copy number. Our method requires as input CP measurements, regardless whether these measurements represent SNVs, CNVs, or some other type of genetic variation, allowing it to consider each such variant type, or any combination of variant types from a given sample. We note that, as a preprocessing step, our method clusters together variants with the same (or similar) CP values to minimize measurement uncertainties, and assumes a priori that all variants in each such cluster are co-localized in the same cells. The input to our downstream methods is an ordered list of CP values, corresponding to those clusters.
Subclone structure reconstruction
Trimming the space of viable subclone structures
Often there are more than one viable subclone structures in the resulting solution set, corresponding to multiple alternative subclone evolutions. However, if additional ‘linkage’ data are available, further trimming is usually possible. Such linkage information may be either directly observed, such as in the case of spectral karyotype images -, single cell colony assays, or single cell sequencing; or indirectly inferred from, for example, primary and relapse tumor from the same patient. Because typically, the relapse tumor is derived from the primary tumor, they share mutations originating from common ancestor subclones, and through such shared evolutionary history the primary and relapse subclones can be merged into one unified subclone structure (or multiple alternative unified subclone structures). Figure 1C shows examples of two compatible primary/relapse structures (left) as well as two incompatible ones (right). In the latter example, the relapse subclone R2 contains two mutations that are found in different branches on the primary tree (P1 and P3), violating the assumptions above. Any structure in the primary that has no compatible structure in the relapse, or vice versa, is discarded from consideration, reducing the solution space.
Mutation localization prediction
The SubcloneSeeker software
SubcloneSeeker is implemented in C++, and its source code available under MIT license. The package provides a complete set of APIs and data structures to represent subclone and genomic mutation data types, along with well documented source code and examples, so that anyone can easily extend on the core functions we provided to incorporate domain-specific knowledge, such as placing different prior probabilities over tree structures.
Our subclone structure reconstruction method always includes the correct structure among the solution set it reports
We generated simulated tumor samples (Supplemental Method 1 in Additional file 1) comprising 3, 4, …, 8 mutation events with distinct CP values. For each of these ‘tumor samples’, we produced a random subclone structure serving as a ‘true’ structure. We repeated this procedure 1,000 times. In every case, SubcloneSeeker was able to reproduce the ‘true’ subclone structure as one of the solutions in the complete solution set of viable subclone structures. This ‘sanity check’ was necessary to ensure that our software worked appropriately for simulated datasets.
The number of biologically plausible subclone structures is low
Our normal cell component estimation procedure is accurate
As described above, our subclone structure reconstruction method provides, for each structure, each subclone present together with a subclone fraction, that is, the fraction of that subclone within the tumor biopsy. The structure includes a subclone without any of the mutations: this is the normal cell component of the tumor biopsy, and its fraction is the normal cell fraction. We investigated the accuracy with which our method estimates the normal cell fraction in experimental data. We applied our method to a dataset created by mixing 10%, 20%, …, 90%, 95%, and 100% sequencing reads from a SNUC (Sinonasal Undifferentiated Carcinoma) cell line sample , with reads sequenced from paired normal tissue (Figure 2). In this dataset, the non-branching, stepwise mutation accumulation model (red-cross), a parsimonious solution that always exists (section ‘Method’), produced very accurate estimate for normal cell content among all alternative structures (R2 = 0.9705395 to the line y = x).
Our algorithmic procedure for subclone structure comparison improves on interpretation in previously published data
In a recent study, Ding et al. investigated clonal evolution in eight acute myeloid leukemia (AML) patients. To ensure easy comparison with the published results, we started with the somatic mutation clusters and AF values provided in the study (Table S5c and Table S10 in Ding et al. Additional file 1), rather than re-computing them ourselves. With two exceptions, SubcloneSeeker produced the same subclone structures, and with one exception, came to the same biological conclusions (Table S1 in Additional file 1).
Analysis of TCGA primary-relapse ovarian tumor samples reveals two distinct patterns for tumor recursion in the dataset
Analysis of whole-exome sequencing data from chemo-resistant versus primary ovarian tumors demonstrates that our subclone structure analysis can be used to prioritize somatic mutations for further follow-up
In the case of sample ‘S17’, the primary sample yielded two viable subclone structures, both compatible with the sole structure in the relapse (Figure 7F). Similarly to sample ‘S15’, mutation cluster ‘C4’ is likely to contain the initial driver mutation(s), and mutation cluster ‘C3’, which is present in all relapse subclones, is likely to contain the mutation leading to chemoresistance. In both samples, the use of subclone analysis resulted in information that one can use for variant prioritization, in order to narrow down the set of somatic events in the search for the causative mutation, both for initial tumor expansion, and for chemoresistance.
Simulation studies demonstrate that our statistical framework is able to accurately predict whether two somatic mutations (or mutation clusters) are localized in a subclone together
Re-analysis of bulk versus single cell colony assay data demonstrates that we are able to accurately identify mutations that are present in the same subclone
In this paper we present a novel algorithm to elucidate tumor subclonal structure using as input cell prevalence values of individual, unlinked somatic mutations. In contrast to other methods that require SNV allele frequencies, our method is able to analyze many different types of genomic variant data, as long as allele frequency measurements can be converted into cell prevalence values. Because bulk mutation frequency measurements from fragmentary sequence data or per-site microarray measurements do not retain ‘linkage’ across such somatic variant sites, often there are many alternative subclone structures that can account for the input measurements. Our method exhaustively enumerates all such viable subclone structures. We were able to show that the number of solutions is usually much smaller than the theoretical upper limit. Often tumor tissues from multiple phases of tumor development (for example, primary and relapse biopsies) are available. In such cases, the number of subclone structures that are not only consistent with the respective input frequency data but also across, for example, the primary and the relapse is lower, further trimming the ‘solution space’, often to a single, unique structure. Using both simulations and experimental data, we have extensively characterized and validated our methods. We have illustrated with a number of datasets that this approach is often able to identify key patterns underlying tumor progression and relapse, including information to guide mutation prioritization.
In the case that the solution space cannot be further trimmed, we provide methods to derive useful knowledge, in terms of mutation cluster co-localization and timing. Our subclone structure enumeration procedure is exhaustive, and is free from the biases introduced by the choice of parameters or prior distributions often required for statistical sampling of the subclone structure solution space. We demonstrated that the co-localization and timing of mutations predicted from the HSC bulk targeted sequencing (Jan et al.) correlate well with their function, and can be used in a similar fashion to prioritize functional study.
Our analysis of previously published datasets and our own datasets suggests that SubcloneSeeker will be applicable for a number of clinical/biological problems. Using serous ovarian cancer as an illustrative example, we have demonstrated that chemoresistance and relapse in this disease is a clonally driven process, and that such clones can be either present in the primary tumor or ‘arise’ during progression or relapse. The patterns of temporal mutational order and cellular co-localization provide clinically relevant insight into the genomic basis for chemoresistance. In ovarian cancer, 80% of tumors are classified as chemosensitive while 20% of cancers progress during or recur shortly after platinum-based adjuvant chemotherapy. Unfortunately, there are no known genetic markers at present that can reliably predict inherent or acquired chemoresistance. This is likely the result of the complex and multifactorial biological basis for this phenotype. However, whereas one or a small number of them may not be informative, analysis of many resistant clones and identification of the corresponding mutational order and cellular co-localization may lead to a better understanding of chemoresistance, and form a rational basis for targeting the chemoresistant clones.
We envision similar utility for this type of analysis in advancing the current understanding of genomic alterations involved in the pre-malignant phases of cancer. Once again using ovarian cancer as a prototypical case, it has been established that TP53 mutations are ubiquitous and early events in serous ovarian carcinogenesis . However, the prevalence of other recurrent somatic mutations is about 10% or less  suggesting that the additional requirements for transformation may be met through a combination of more diverse co-localized or temporally related somatic mutations (plus possible contributions from epigenetics and other molecular alterations,and so on). Thus genomic investigation of putative precursor lesion for serous carcinoma using approaches presented here is likely to identify subclonal hierarchies whose constituent mutations define cooperative classes on oncogenic event whose sum total results in malignant transformation.
Depending on the type of input data, mutation events and their associated allele frequencies are called by detection methods
The allele frequencies of events are converted into cell prevalence, and then subjected to clustering. If more than one sample is available, the clustering will be done in a multidimensional space, in which the number of dimensions is equal to the number of samples.
The resulting somatic event groups (by CP) serves as the input to the SubcloneSeeker core algorithm. This will result in a set of solutions that are biologically meaningful, and mathematically consistent with the input.
Further trimming can be performed on the solution set, such as trying to merge multiple samples into a unified evolutionary tree.
Mutation (cluster) co-localization can be inferred from the solution set.
Segmentation: the RCN derived from CNV or mBAF measurement is then subjected to segmentation algorithms, such as DNAcopy , or HMMSeg , to identify continuous regions with the same copy number of LOH state, and to delineate event boundaries of the corresponding events. SNV AF estimation: deep sequencing SNV data do not need to be segmented, however their allele frequencies needs to be accurately estimated, for example, using PyClone , which also performs CP estimation.
Cell prevalence calculation
in which u is the segmental mean and n is the ACN of the segment (which can be estimated by applying the CNV data processing technique described above to the LRR track of SNP6 microarray). SNVs: with accurate allele frequency estimation made available by ultra-deep sequencing and software advancements , CP can also be derived from SNVs along with allele specific copy number quantifications. For example, in diploid regions, CP = 2 ∙ AF for heterozygous SNVs, and CP = AF for homozygous SNVs.
Because the measurement of AF, and consequently CP, is potentially noisy, we attempt to mitigate its effect through clustering on CP to identify its modals. Examples shown in this paper are clustered with the kernel density function in R, with its bandwidth calculated by the Pilot Estimation of Derivatives . Users can choose to substitute with more advanced techniques, such as MCLUST . When multiple samples are available, it is important to perform clustering on multidimensional space, in which the dimension equals the number of samples, to identify separately inherited clusters.
Subclone structure reconstruction
Subclone evolution tree enumeration
Due to the unique biology of tumorigenesis, we make the following assumptions:
Cells in a tumor mass are derived from germline cells or parental, existing tumor cells through mitosis, in which recombination is unlikely to occur.
The same event (with respect to the boundary resolution) would not spontaneously occur in two subclones without a descendent relationship, nor would pre-existing events revert back to the normal state in a descendent subclone.
The function ‘Evaluate(T)’ will, through a post-order tree traverse, try to assign a subclone frequency (f, or SF) value to each of the tree nodes so that at the end the subclone structure will result in the observed data (E). If the function visits a leaf node, it will assign the CP of the event clusters uniquely contained in the node; if the function visits an internal node, it will assign the CP of the event clusters uniquely contained in the node, minus the sum of the SF of all its descendent nodes. If it can do so without assigning any node a less-than-zero SF, that specific tree structure is recorded as a feasible solution.
This method will result in a tree-set, which contains all the possible ways to partition the observed event clusters into subclones, and the phylogeny between the subclones. One can choose to further trim the set by external or internal linkage information, or perform co-existence prediction.
After merging, for any given non-leaf node, its children node must have all the mutations presented in the node itself (extra mutations are allowed).
No two branches shall have the same mutation simultaneously without sharing a common parent node who has that mutation.
These two conditions assure the fundamental assumptions concerning tumorigenesis aforementioned are met. Through this process, if a specific primary (or relapse) tree cannot be merged with any relapse (or primary) tree, that specific tree is then an invalid solution, and can be discarded.
CL is a binary variable that describes whether the given pair co-localize in solution i, which can either be 1, if in at least one subclone the event clusters co-localize, or 0, if in none of the subclones the event clusters co-localize. This framework allows us to estimate co-localization giving all structures equal possibility to be true, or weight towards, or against specific structures. (For example, one can reasonably argue that it is generally unlikely for a patient to develop two, separate tumor subclones without related by an common ancestor, thus placing a lower prior on those structures in which multiple subclones are derived directly from the normal tissue).
Array Comparative Genome Hybridization
Absolute copy number
Acute myeloid leukemia
Copy number variation
Hematopoietic stem cell
Loss of heterozygosity
Positive predictive value
Relative copy number
Sinonasal undifferentiated carcinoma
Single nucleotide variant
Whole genome sequencing
This work was supported by the Fund for Excellence in Science and Technology award from the University of Virginia (ARQ), by the University of Virginia’s Cancer Center (Marty Whitlow Fund); the Department of Obstetrics & Gynecology (AJ), and the by the National Human Genome Research Institute / National Institutes of Health (grants R01HG004719 and U01HG006513) to GM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordonez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, et al: A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2010, 463: 191-196. 10.1038/nature08658.PubMedPubMed CentralView ArticleGoogle Scholar
- Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA, Kiezun A, Hammerman PS, McKenna A, Drier Y, Zou L, Ramos AH, Pugh TJ, Stransky N, Helman E, Kim J, Sougnez C, Ambrogio L, Nickerson E, Shefler E, Cortes ML, Auclair D, Saksena G, Voet D, Noble M, DiCara D, et al: Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013, 499: 214-218. 10.1038/nature12213.PubMedPubMed CentralView ArticleGoogle Scholar
- Gonzalez-Perez A, Mustonen V, Reva B, Ritchie GR, Creixell P, Karchin R, Vazquez M, Fink JL, Kassahn KS, Pearson JV, Bader GD, Boutros PC, Muthuswamy L, Ouellette BF, Reimand J, Linding R, Shibata T, Valencia A, Butler A, Dronov S, Flicek P, Shannon NB, Carter H, Ding L, Sander C, Stuart JM, Stein LD, Lopez-Bigas N: Computational approaches to identify functional genetic variants in cancer genomes. Nat Methods. 2013, 10: 723-729. 10.1038/nmeth.2562.PubMedPubMed CentralView ArticleGoogle Scholar
- Anderson K, Lutz C, van Delft FW, Bateman CM, Guo Y, Colman SM, Kempski H, Moorman AV, Titley I, Swansbury J, Kearney L, Enver T, Greaves M: Genetic variegation of clonal architecture and propagating cells in leukaemia. Nature. 2011, 469: 356-361. 10.1038/nature09650.PubMedView ArticleGoogle Scholar
- Keats JJ, Chesi M, Egan JB, Garbitt VM, Palmer SE, Braggio E, Van Wier S, Blackburn PR, Baker AS, Dispenzieri A, Kumar S, Rajkumar SV, Carpten JD, Barrett M, Fonseca R, Stewart AK, Bergsagel PL: Clonal competition with alternating dominance in multiple myeloma. Blood. 2012, 120: 1067-1076. 10.1182/blood-2012-01-405985.PubMedPubMed CentralView ArticleGoogle Scholar
- Shah SP, Roth A, Goya R, Oloumi A, Ha G, Zhao Y, Turashvili G, Ding J, Tse K, Haffari G, Bashashati A, Prentice LM, Khattra J, Burleigh A, Yap D, Bernard V, McPherson A, Shumansky K, Crisan A, Giuliany R, Heravi-Moussavi A, Rosner J, Lai D, Birol I, Varhol R, Tam A, Dhalla N, Zeng T, Ma K, Chan SK, et al: The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature. 2012, 486: 395-399.PubMedGoogle Scholar
- Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, Ritchey JK, Young MA, Lamprecht T, McLellan MD, McMichael JF, Wallis JW, Lu C, Shen D, Harris CC, Dooling DJ, Fulton RS, Fulton LL, Chen K, Schmidt H, Kalicki-Veizer J, Magrini VJ, Cook L, McGrath SD, Vickery TL, Wendl MC, Heath S, Watson MA, Link DC, Tomasson MH, et al: Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012, 481: 506-510. 10.1038/nature10738.PubMedPubMed CentralView ArticleGoogle Scholar
- Landau DA, Carter SL, Stojanov P, McKenna A, Stevenson K, Lawrence MS, Sougnez C, Stewart C, Sivachenko A, Wang L, Wan Y, Zhang W, Shukla SA, Vartanov A, Fernandes SM, Saksena G, Cibulskis K, Tesar B, Gabriel S, Hacohen N, Meyerson M, Lander ES, Neuberg D, Brown JR, Getz G, Wu CJ: Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell. 2013, 152: 714-726. 10.1016/j.cell.2013.01.019.PubMedPubMed CentralView ArticleGoogle Scholar
- Bolli N, Avet-Loiseau H, Wedge DC, Van Loo P, Alexandrov LB, Martincorena I, Dawson KJ, Iorio F, Nik-Zainal S, Bignell GR, Hinton JW, Li Y, Tubio JM, McLaren S, O’Meara S, Butler AP, Teague JW, Mudie L, Anderson E, Rashid N, Tai YT, Shammas MA, Sperling AS, Fulciniti M, Richardson PG, Parmigiani G, Magrangeas F, Minvielle S, Moreau P, Attal M, et al: Heterogeneity of genomic evolution and mutational profiles in multiple myeloma.Nat Commun 2014, 5:2997.,
- Bea S, Valdes-Mas R, Navarro A, Salaverria I, Martin-Garcia D, Jares P, Gine E, Pinyol M, Royo C, Nadeu F, Conde L, Juan M, Clot G, Vizan P, Di Croce L, Puente DA, Lopez-Guerra M, Moros A, Roue G, Aymerich M, Villamor N, Colomo L, Martinez A, Valera A, Martin-Subero JI, Amador V, Hernandez L, Rozman M, Enjuanes A, Forcada P, et al: Landscape of somatic mutations and clonal evolution in mantle cell lymphoma. Proc Natl Acad Sci U S A. 2013, 110: 18250-18255. 10.1073/pnas.1314608110.PubMedPubMed CentralView ArticleGoogle Scholar
- Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW, Raine K, Jones D, Marshall J, Ramakrishna M, Shlien A, Cooke SL, Hinton J, Menzies A, Stebbings LA, Leroy C, Jia M, Rance R, Mudie LJ, Gamble SJ, Stephens PJ, McLaren S, Tarpey PS, Papaemmanuil E, Davies HR, Varela I, McBride DJ, Bignell GR, Leung K, Butler AP, et al: The life history of 21 breast cancers. Cell. 2012, 149: 994-1007. 10.1016/j.cell.2012.04.023.PubMedPubMed CentralView ArticleGoogle Scholar
- Schuh A, Becq J, Humphray S, Alexa A, Burns A, Clifford R, Feller SM, Grocock R, Henderson S, Khrebtukova I, Kingsbury Z, Luo S, McBride D, Murray L, Menju T, Timbs A, Ross M, Taylor J, Bentley D: Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. Blood. 2012, 120: 4191-4196. 10.1182/blood-2012-05-433540.PubMedView ArticleGoogle Scholar
- Egan JB, Shi CX, Tembe W, Christoforides A, Kurdoglu A, Sinari S, Middha S, Asmann Y, Schmidt J, Braggio E, Keats JJ, Fonseca R, Bergsagel PL, Craig DW, Carpten JD, Stewart AK: Whole-genome sequencing of multiple myeloma from diagnosis to plasma cell leukemia reveals genomic initiating events, evolution, and clonal tides. Blood. 2012, 120: 1060-1066. 10.1182/blood-2012-01-405977.PubMedPubMed CentralView ArticleGoogle Scholar
- Lundberg P, Karow A, Nienhold R, Looser R, Hao-Shen H, Nissen I, Girsberger S, Lehmann T, Passweg J, Stern M, Beisel C, Kralovics R, Skoda RC: Clonal evolution and clinical correlates of somatic mutations in myeloproliferative neoplasms. Blood. 2014, 123: 2220-2228. 10.1182/blood-2013-11-537167.PubMedView ArticleGoogle Scholar
- Jan M, Snyder TM, Corces-Zimmerman MR, Vyas P, Weissman IL, Quake SR, Majeti R: Clonal evolution of preleukemic hematopoietic stem cells precedes human acute myeloid leukemia.Sci Transl Med 2012, 4:149ra118.,
- Hou Y, Song L, Zhu P, Zhang B, Tao Y, Xu X, Li F, Wu K, Liang J, Shao D, Wu H, Ye X, Ye C, Wu R, Jian M, Chen Y, Xie W, Zhang R, Chen L, Liu X, Yao X, Zheng H, Yu C, Li Q, Gong Z, Mao M, Yang X, Yang L, Li J, Wang W, et al: Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell. 2012, 148: 873-885. 10.1016/j.cell.2012.02.028.PubMedView ArticleGoogle Scholar
- Xu X, Hou Y, Yin X, Bao L, Tang A, Song L, Li F, Tsang S, Wu K, Wu H, He W, Zeng L, Xing M, Wu R, Jiang H, Liu X, Cao D, Guo G, Hu X, Gui Y, Li Z, Xie W, Sun X, Shi M, Cai Z, Wang B, Zhong M, Li J, Lu Z, Gu N, et al: Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell. 2012, 148: 886-895. 10.1016/j.cell.2012.02.025.PubMedView ArticleGoogle Scholar
- Melchor L, Brioli A, Wardell CP, Murison A, Potter NE, Kaiser MF, Fryer RA, Johnson DC, Begum DB, Hulkki Wilson S, Vijayaraghavan G, Titley I, Cavo M, Davies FE, Walker BA, Morgan GJ: Single-cell genetic analysis reveals the composition of initiating clones and phylogenetic patterns of branching and parallel evolution in myeloma. Leukemia. 2014, 28: 1705-1715. 10.1038/leu.2014.13.PubMedView ArticleGoogle Scholar
- Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, Cook K, Stepansky A, Levy D, Esposito D, Muthuswamy L, Krasnitz A, McCombie WR, Hicks J, Wigler M: Tumour evolution inferred by single-cell sequencing. Nature. 2011, 472: 90-94. 10.1038/nature09807.PubMedPubMed CentralView ArticleGoogle Scholar
- Walker BA, Wardell CP, Melchor L, Hulkki S, Potter NE, Johnson DC, Fenwick K, Kozarewa I, Gonzalez D, Lord CJ, Ashworth A, Davies FE, Morgan GJ: Intraclonal heterogeneity and distinct molecular mechanisms characterize the development of t(4;14) and t(11;14) myeloma. Blood. 2012, 120: 1077-1086. 10.1182/blood-2012-03-412981.PubMedView ArticleGoogle Scholar
- Yachida S, Jones S, Bozic I, Antal T, Leary R, Fu B, Kamiyama M, Hruban RH, Eshleman JR, Nowak MA, Velculescu VE, Kinzler KW, Vogelstein B, Iacobuzio-Donahue CA: Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature. 2010, 467: 1114-1117. 10.1038/nature09515.PubMedPubMed CentralView ArticleGoogle Scholar
- Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, Varela I, Phillimore B, Begum S, McDonald NQ, Butler A, Jones D, Raine K, Latimer C, Santos CR, Nohadani M, Eklund AC, Spencer-Dene B, Clark G, Pickering L, Stamp G, Gore M, Szallasi Z, Downward J, Futreal PA, Swanton C: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012, 366: 883-892. 10.1056/NEJMoa1113205.PubMedView ArticleGoogle Scholar
- Campbell PJ, Yachida S, Mudie LJ, Stephens PJ, Pleasance ED, Stebbings LA, Morsberger LA, Latimer C, McLaren S, Lin ML, McBride DJ, Varela I, Nik-Zainal SA, Leroy C, Jia M, Menzies A, Butler AP, Teague JW, Griffin CA, Burton J, Swerdlow H, Quail MA, Stratton MR, Iacobuzio-Donahue C, Futreal PA: The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature. 2010, 467: 1109-1113. 10.1038/nature09460.PubMedPubMed CentralView ArticleGoogle Scholar
- Li C, Beroukhim R, Weir BA, Winckler W, Garraway LA, Sellers WR, Meyerson M: Major copy proportion analysis of tumor samples using SNP arrays.BMC Bioinformatics 2008, 9:204.,
- Van Loo P, Nordgard SH, Lingjaerde OC, Russnes HG, Rye IH, Sun W, Weigman VJ, Marynen P, Zetterberg A, Naume B, Perou CM, Borresen-Dale AL, Kristensen VN: Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A. 2010, 107: 16910-16915. 10.1073/pnas.1009843107.PubMedPubMed CentralView ArticleGoogle Scholar
- Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, Laird PW, Onofrio RC, Winckler W, Weir BA, Beroukhim R, Pellman D, Levine DA, Lander ES, Meyerson M, Getz G: Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012, 30: 413-421. 10.1038/nbt.2203.PubMedPubMed CentralView ArticleGoogle Scholar
- Su X, Zhang L, Zhang J, Meric-Bernstam F, Weinstein JN: PurityEst: estimating purity of human tumor samples using next-generation sequencing data. Bioinformatics. 2012, 28: 2265-2266. 10.1093/bioinformatics/bts365.PubMedPubMed CentralView ArticleGoogle Scholar
- Roth A, Ding J, Morin R, Crisan A, Ha G, Giuliany R, Bashashati A, Hirst M, Turashvili G, Oloumi A, Marra MA, Aparicio S, Shah SP: JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics. 2012, 28: 907-913. 10.1093/bioinformatics/bts053.PubMedPubMed CentralView ArticleGoogle Scholar
- Larson NB, Fridley BL: PurBayes: estimating tumor cellularity and subclonality in next-generation sequencing data. Bioinformatics. 2013, 29: 1888-1889. 10.1093/bioinformatics/btt293.PubMedPubMed CentralView ArticleGoogle Scholar
- Strino F, Parisi F, Micsinai M, Kluger Y: TrAp: a tree approach for fingerprinting subclonal tumor composition.Nucleic Acids Res 2013, 41:e165.,
- Jiao W, Vembu S, Deshwar AG, Stein L, Morris Q: Inferring clonal evolution of tumors from single nucleotide somatic mutations.BMC Bioinformatics 2014, 15:35.,
- Oesper L, Mahmoody A, Raphael BJ: THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data.Genome Biol 2013, 14:R80.,
- Zare H, Wang J, Hu A, Weber K, Smith J, Nickerson D, Song C, Witten D, Blau CA, Noble WS: Inferring clonal composition from multiple sections of a breast cancer.PLoS Comput Biol 2014, 10:e1003703.,
- Fischer A, Vazquez-Garcia I, Illingworth CJ, Mustonen V: High-definition reconstruction of clonal composition in cancer. Cell Rep. 2014, 7: 1740-1752. 10.1016/j.celrep.2014.04.055.PubMedPubMed CentralView ArticleGoogle Scholar
- Hajirasouliha I, Mahmoody A, Raphael BJ: A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics. 2014, 30: i78-i86. 10.1093/bioinformatics/btu284.PubMedPubMed CentralView ArticleGoogle Scholar
- Yadav VK, De S: An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples.Brief Bioinform 2014, Epub ahead of print.,
- Roth A, Khattra J, Yap D, Wan A, Laks E, Biele J, Ha G, Aparicio S, Bouchard-Cote A, Shah SP: PyClone: statistical inference of clonal population structure in cancer. Nat Methods. 2014, 11: 396-398. 10.1038/nmeth.2883.PubMedView ArticleGoogle Scholar
- Schrock E, du Manoir S, Veldman T, Schoell B, Wienberg J, Ferguson-Smith MA, Ning Y, Ledbetter DH, Bar-Am I, Soenksen D, Garini Y, Ried T: Multicolor spectral karyotyping of human chromosomes. Science. 1996, 273: 494-497. 10.1126/science.273.5274.494.PubMedView ArticleGoogle Scholar
- Liyanage M, Coleman A, du Manoir S, Veldman T, McCormack S, Dickson RB, Barlow C, Wynshaw-Boris A, Janz S, Wienberg J, Ferguson-Smith MA, Schrock E, Ried T: Multicolour spectral karyotyping of mouse chromosomes. Nat Genet. 1996, 14: 312-315. 10.1038/ng1196-312.PubMedView ArticleGoogle Scholar
- Purdue PE, Zhang JW, Skoneczny M, Lazarow PB: Rhizomelic chondrodysplasia punctata is caused by deficiency of human PEX7, a homologue of the yeast PTS2 receptor. Nat Genet. 1997, 15: 381-384. 10.1038/ng0497-381.PubMedView ArticleGoogle Scholar
- Takahashi Y, Pickering C, Gelbard A, Drummond J, Wheeler DA, Kupferman ME, Myers JN, Hanna EY: Genomic characterization of sinonasal undifferentiated carcinoma.J Neurol Surg B 2014, 75:A084.,
- Integrated genomic analyses of ovarian carcinoma. Nature. 2011, 474: 609-615. 10.1038/nature10166.
- Hu L, Li Z, Cheng J, Rao Q, Gong W, Liu M, Shi YG, Zhu J, Wang P, Xu Y: Crystal structure of TET2-DNA complex: insight into TET-mediated 5mC oxidation. Cell. 2013, 155: 1545-1555. 10.1016/j.cell.2013.11.020.PubMedView ArticleGoogle Scholar
- Schmiesing JA, Ball AR, Gregson HC, Alderton JM, Zhou S, Yokomori K: Identification of two distinct human SMC protein complexes involved in mitotic chromosome dynamics. Proc Natl Acad Sci U S A. 1998, 95: 12906-12911. 10.1073/pnas.95.22.12906.PubMedPubMed CentralView ArticleGoogle Scholar
- Abdel-Wahab O, Mullally A, Hedvat C, Garcia-Manero G, Patel J, Wadleigh M, Malinge S, Yao J, Kilpivaara O, Bhat R, Huberman K, Thomas S, Dolgalev I, Heguy A, Paietta E, Le Beau MM, Beran M, Tallman MS, Ebert BL, Kantarjian HM, Stone RM, Gilliland DG, Crispino JD, Levine RL: Genetic characterization of TET1, TET2, and TET3 alterations in myeloid malignancies. Blood. 2009, 114: 144-147. 10.1182/blood-2009-03-210039.PubMedPubMed CentralView ArticleGoogle Scholar
- Quivoron C, Couronne L, Della Valle V, Lopez CK, Plo I, Wagner-Ballon O, Do Cruzeiro M, Delhommeau F, Arnulf B, Stern MH, Godley L, Opolon P, Tilly H, Solary E, Duffourd Y, Dessen P, Merle-Beral H, Nguyen-Khac F, Fontenay M, Vainchenker W, Bastard C, Mercher T, Bernard OA: TET2 inactivation results in pleiotropic hematopoietic abnormalities in mouse and is a recurrent event during human lymphomagenesis. Cancer Cell. 2011, 20: 25-38. 10.1016/j.ccr.2011.06.003.PubMedView ArticleGoogle Scholar
- Homme C, Krug U, Tidow N, Schulte B, Kuhler G, Serve H, Burger H, Berdel WE, Dugas M, Heinecke A, Buchner T, Koschmieder S, Muller-Tidow C: Low SMC1A protein expression predicts poor survival in acute myeloid leukemia. Oncol Rep. 2010, 24: 47-56.PubMedGoogle Scholar
- Staaf J, Lindgren D, Vallon-Christersson J, Isaksson A, Goransson H, Juliusson G, Rosenquist R, Hoglund M, Borg A, Ringner M: Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays.Genome Biol 2008, 9:R136.,
- Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004, 5: 557-572. 10.1093/biostatistics/kxh008.PubMedView ArticleGoogle Scholar
- Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007, 23: 657-663. 10.1093/bioinformatics/btl646.PubMedView ArticleGoogle Scholar
- Day N, Hemmaplardh A, Thurman RE, Stamatoyannopoulos JA, Noble WS: Unsupervised segmentation of continuous genomic data. Bioinformatics. 2007, 23: 1424-1426. 10.1093/bioinformatics/btm096.PubMedView ArticleGoogle Scholar
- Gusnanto A, Wood HM, Pawitan Y, Rabbitts P, Berri S: Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics. 2012, 28: 40-47. 10.1093/bioinformatics/btr593.PubMedView ArticleGoogle Scholar
- Sheather SJ, Jones MC: A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B. 1991, 53: 683-690.Google Scholar
- Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL: Model-based clustering and data transformations for gene expression data. Bioinformatics. 2001, 17: 977-987. 10.1093/bioinformatics/17.10.977.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.