Skip to main content

Table 2 Open reading frame predictiona

From: Separating homeologs by phasing in the tetraploid wheat transcriptome

  T. turgidum T. urartu
Contigs (n) 140,118 86,247
Non-wheat sequencesb (eliminated) (n) 558 518
Wheat protein coding sequences   
BLASTX, E-value cutoff 1e-3 96,244 59,439
Contigs with a Pfam domain (1e-3) 59,917 39,965
Contig sequences without BLASTX (1e-3) or Pfam (1e-3) 42,999 26,070
Predicted open reading frames   
Predicted ORFs (non-redundant, >30 amino acids) 76,570 43,014
Fulllength 32,548 22,868
Missing 5' end 26,723 12,225
Missing 3' end 12,792 5,376
Missing 5' and 3' end 4,507 2,545
Putative pseudogenes (frameshift and/or premature stop codon) 9,937 5,208
Putative fused transcripts   
Contigs with BLASTX on inconsistent strand 4,376 3,628
Contigs with >1 predicted ORFs (>30 amino acids, no repetitive elements, not a pseudogene) 2,164 1,349
Putative fused transcripts (excluding overlaps) (n) 6,409 4,866
  1. aOpen reading frames were predicted with a comparative genomics approach using the findorfprogram and BLASTX alignments (E-value cutoff 1e-5) between contigs and proteomes of barley, Brachypodium, rice, maize, sorghum, and Arabidopsis.
  2. bNon-wheat sequences were identified based on taxonomic distribution of top 10 BLASTX hits against nr.