Skip to main content

A computational approach to identify transposable element insertions in cancer cells


Transposable elements (TEs) in the human genome may contribute to molecular evolution, hereditary diseases and cancer [[13]]. Therefore, analyzing the impact of TEs in the genome is necessary to better characterize genetic events related to tumorigenesis. Here, we used a computational approach to identify TE insertions in publicly available data for exome sequences in lymphoblastoid and breast tumor cells derived from the same patient.


A total of 29,340, sequences from the cell lines HCC1954 (18,365,271) and HCC1954BL (10,975,107) were used to investigate gene fusion with TEs (gfTEs) [4, 5]. The RepeatMasker and Burrows-Wheeler Alignment (BWA) tools were used to identify and to map gfTEs, respectively. We also used BEDTools to find overlaps between gfTEs and genome annotations. Human mRNAs and RepeatMasker tracks were downloaded in BED format from the GRCh37/ hg19 assembly. Repbase was used to filter the eukaryotic TEs.


RepeatMasker was used to identify gfTEs in the exome reads. Next, the repeat masked reads were aligned against the reference genome using BWA. Finally, we filtered the aligned reads to exclude those without TEs (length of Ns <15, Ns means block of nucleotides masked), those with alignments showing low sequence identity (<95%) or those with a small hit length (<50 nucleotides). The study focused on the detection of TEs in coding sequence gene regions. A total of 3,307,608 reads were excluded, and 23,841 reads were predicted as cancer-specific gfTEs. Table 1 shows the number of gfTEs distributed among the TE families and highlights the members with higher frequency in both cell lines. Insertions of LINE/L1 and SINE/Alu were the most frequent. The Gene Ontology analysis for the biological process and molecular function terms showed a bias toward membrane receptor and cell adhesion proteins.

Table 1 Number of genes containing insertion of TEs from different families


We used a computational approach to identify putative cancer-specific gfTEs using human exome capture sequences. Interestingly, the total number of gfTEs was similar in normal and tumor cell lines, but the Gene Ontology analysis revealed an enrichment of insertions in genes encoding protein receptors and cell adhesion molecules. These results suggest that TEs could be contributing to cancer development.


  1. 1.

    Cordaux R, Batzer MA: The impact of retrotransposons on human genome evolution.Nat Rev Genet 2009, 10:691–703.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  2. 2.

    Callinan PA, Batzer MA: Retrotransposable elements and human disease.Genome Dyn 2006, 1:104–115.

    PubMed  CAS  Article  Google Scholar 

  3. 3.

    Zhang W, Edwards A, Fan W, Deininger P, Zhang K: Alu distribution and mutation types of cancer genes.BMC Genomics 2011, 12:157.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  4. 4.

    Zhao Q, Kirkness EF, Caballero OL, Galante PA, Parmigiani RB, Edsall L, Kuan S, Ye Z, Levy S, Vasconcelos AT, Ren B, de Souza SJ, Camargo AA, Simpson AJ, Strausberg RL: Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing.Genome Biol 2010, 11:R114.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  5. 5.

    Galante PA, Parmigiani RB, Zhao Q, Caballero OL, de Souza JE, Navarro FC, Gerber AL, Nicolás MF, Salim AC, Silva AP, Edsall L, Devalle S, Almeida LG, Ye Z, Kuan S, Pinheiro DG, Tojal I, Pedigoni RG, de Sousa RG, Oliveira TY, de Paula MG, Ohno-Machado L, Kirkness EF, Levy S, da Silva WA Jr, Vasconcelos AT, Ren B, Zago MA, Strausberg RL, Simpson AJ, de Souza SJ, Camargo AA: Distinct patterns of somatic alterations in a lymphoblastoid and a tumor genome derived from the same individual.Nucleic Acids Res 2011, 39:6056–6068. doi: 10.1093/nar/gkr221

    PubMed  CAS  PubMed Central  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Wilson A Silva Jr.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Silva, I.T., Pinheiro, D.G. & Silva, W.A. A computational approach to identify transposable element insertions in cancer cells. Genome Biol 12, P28 (2011).

Download citation


  • Transposable Element
  • Cell Line HCC1954
  • Gene Ontology Analysis
  • Hg19 Assembly
  • Cell Adhesion Protein