Skip to main content
  • Poster presentation
  • Published:

Onco-proteogenomics: a novel approach to identify cancer-specific mutations combining proteomics and transcriptome deep sequencing


The accumulation of somatic mutation is a common property in all cancer genomes. These mutations include several patterns of mutagenesis such as small insertions, chromosomal rearrangement and nucleotide substitutions. Consequently, the mutated genomes produce mutant transcriptome and, therefore, mutant proteins that give the cancer cell its oncogenic properties [1]. For such mutated proteins, however, mass spectrometry-based identification by shotgun proteomics is generally difficult because the identification is dependent on databases containing normal proteins or hybrid database with normal and mutated proteins. Here, we present 'onco-proteogenomics, a novel proteogenomics approach to identify the cancer-related peptides (phospho- and non-phospho peptides) and proteins.


We analyzed 15 MS/MS runs of HeLa S3 cells, as a test sample, by shotgun proteomics and phosphoproteomics. The obtained data was analyzed by an extended version of MSSS (MS Spectra Sequential Subtraction), the proteogenomic approach that we used before in the identification of novel genomic features in Rice plant [2]. In our onco- proteogenomic approach, we used four databases containing normal sequences (Human protein, cDNA, mRNA and genome databases) for Mascot peptide identification and removed all the MS/MS spectra that corresponds to all identified peptides. The reminder MS/MS spectra were searched against one cancer-driven database obtained through deep sequencing of HeLa S3 cells to identify cancer-specific peptides.


The four databases that contain normal sequences were used sequentially to identify all potential peptide sequences and phosphorylation sites that can be generated from the normal genome. This includes the potential protein sequences, junction-peptides and exon-skipping peptides (protein and cDNA databases), exonic peptides (mRNA database) and extragenic peptides (genome database). Following each Mascot search, we removed all the MS/ MS spectra corresponding to the identified peptide sequences and created new files containing the reminder MS/MS spectra. Next, we constructed HeLa S3 transcriptome database with data obtained from deep sequencing of HeLa S3 cells (obtained from NCBI UniGene Database). The constructed database contains over 60,000 entries. For the remaining unidentified MS/MS spectra, we performed Mascot search against this transcriptome database. Consequently, we were able to identify 25 cancer-specific peptides including phosphorylated sites. For further check, the identified peptides were aligned to the employed normal databases using NCBI BLAST. The alignment did not show any significant matches indicating that these peptides are specifically expressed in the HeLa S3 cancer cell-line. In future studies, we will apply the same approach in different cancers aiming to identify global cancer biomarkers and drug targets Figure 1.

Figure 1
figure 1

Analysis flowchart and future work.


  1. International network of cancer genome projects. Nature. 464: 993–998. 10.1038/nature08987.

  2. Helmy M, Tomita M, Ishihama Y: Novel features for rice genome revealed using proteogenomic analysis. The 10th international conference on Systems Biology (ICSB2009). 2009, Stanford University, CA, USA

    Google Scholar 

Download references

Author information

Authors and Affiliations


Rights and permissions

Reprints and permissions

About this article

Cite this article

Helmy, M., Sugiyama, N., Tomita, M. et al. Onco-proteogenomics: a novel approach to identify cancer-specific mutations combining proteomics and transcriptome deep sequencing. Genome Biol 11 (Suppl 1), P17 (2010).

Download citation

  • Published:

  • DOI: