There is increasing interest in the integration of genomic and epigenomic data from the same DNA specimen in order to provide greater insight into disease processes. It is particularly intriguing to integrate genomic CN and DNA methylation data, which may allow the identification of synergistic mechanisms for the inactivation of tumor suppressor genes or the activation of oncogenic pathways . However, the integration and ultimately the interpretation of these integrated datasets are both costly and challenging if carried out separately.
Here we sought to evaluate whether the Infinium HumanMethylation450 BeadChip could be utilized to determine CNAs as well as epigenetic alterations. Initially, we sought to confirm that the methylation state inferred by the Infinium HumanMethylation450 BeadChip was not biased by altered CN state. We show there is little bias when comparing normal (two copies) to heterozygous loss (one copy) or single copy gain (three copies). However, there does appear to be a correlation at loci of complete genomic loss, potential homozygous deletion (more than one copy) and amplification (more than four copies). Association of methylation and CNA state with homozygous loss is unsurprising and has little impact on methylation analysis per se as these loci are generally removed from methylation analysis due to signal intensities indistinguishable from background (low detection P-value). However, it may represent a confounding factor effect when comparing methylation in samples with and without CNA. For example, a tumor suppressor deleted in a proportion of samples may be hypermethylated in others, but in many Infinium methylation array analysis pipelines this information will be lost due to the removal of missing data. This highlights the importance of integrated analysis using both CNA and methylation data. The strong negative association between methylation state and regions of high level amplification was less anticipated, and appears to be a result of the genomic distribution of probes as opposed to inherent biases of the arrays. As most probes in regions of amplification fall within CpG islands, which are predominately unmethylated, these therefore contribute to the apparent loss of methylation in regions of amplification.
Our primary objective was to assess whether the Infinium HumanMethylation450 BeadChip could be used to accurately assess CNAs to the same degree of reliability and sensitivity as standard SNP array platforms, such as the Affymetrix 6.0 SNP or Illumina CytoSNP arrays. Specifically, we compared Infinium CNA profiles from samples with matched SNP array data. Using the same algorithm for all array types, we show that approximately 85% of all alterations were identified in both SNP and Infinium arrays (when regions contain sufficient overlapping probes). Interestingly, we see a reduced concordance when assessing smaller alterations, with a high number of false positive alterations identified by the Infinium arrays compared to SNP platforms. The majority of these appear to be results of differences in array design and the gene-centric design bias of the Infinium arrays. Unlike the standard SNP array design, with probes roughly evenly distributed throughout the genome, the Infinium arrays are very much gene-centric in their design, with 95% of probes within 2 kb of 95% of the known genes and, on average, >9 probes per gene. Therefore, although the Infinium arrays may lack the resolution of SNP arrays to detect alterations in large intergenic regions or gene desert regions, they provide high resolution coverage of the majority of coding loci. This allows for the identification of discreet alterations of individual genes, which would not be detected by standard SNP arrays. Similarly, with over 94% of CpG islands represented, these arrays may also allow the identification of small alterations within regulatory regions, potentially revealing novel mechanisms of gene disregulation. Therefore, the gene-centric/biased design of the Infinium array has a greater potential to identify driver CNAs involved in tumorigenic processes.
Furthermore, as the same loci can be interrogated for both methylation and CN in the same DNA sample, the analysis potentially allows easier integration of epigenetic and genomic data. The integration of methylation and CN data can provide fascinating insights into the underlying biology of malignant processes where the challenge is to identify driver from passenger alterations . For instance, a change in genomic content (that is, single copy gain or loss) does not have to correlate with a linear change in methylation; in fact, it is those genes that show an inverse correlation between CNA and methylation that may be most important. For example, tumor suppressor genes that undergo a ‘double hit’ - that is, heterozygous loss and hypermethylation - or oncogenes in a region of gain that are hypomethylated compared with neighboring genes may represent those genes most likely to be differentially expressed and consequently drivers of tumorigenic processes. Hence, through utilizing the Infinium arrays for both epigenetic and CN analysis, it may be possible to more accurately distinguish between genes that drive the selection of a malignant phenotype from those that are passengers within an amplified or deleted region.
Finally, it can be difficult to compare CNA data across different high-density array platforms, particularly given differing designs, and even the comparison of the same data with differing algorithms can lead to varying results [37–39]. Even given these caveats, these data show the utility of using the Infinium HumanMethylation450 BeadChips to define CNAs in human cancers. We show that the Infinium Arrays are as robust and sensitive as current high density SNP arrays for the detection of CNAs and appear highly applicable for providing estimates of CN as well as a measure of methylation state. Furthermore, we highlight that the gene centric design of the arrays may be beneficial, in allowing the identification of alterations containing single genes or just regulatory regions, which may aid in our understanding of the complex genomic and epigenomic interactions driving the development and progression of a malignant phenotype.