Skip to main content

Advertisement

Table 2 Top features in CoreBoost

From: Boosting with stumps for predicting transcription start sites

Classifier type Features
CpG P versus U Log-likelihood ratios from third order Markov chain, log-likelihood ratios from TSS weight matrix
   GC-box score, weighted score of transcription factor NFY, weighted energy score at position +1
   Weighted score of transcription factor YY1, TATA score, weighted score of transcription factor ELK1
   MTE score, weighted score of transcription factor CREB
  P versus D Log-likelihood ratios from third order Markov chain, GC-box score
   Weighted score of transcription factor NFY
   Log-likelihood ratios from TSS weight matrix
   Difference between the energy score around positions -25 and +1 and the average from surroundings
   Log-likelihood ratios from transcription factor ELK1, frequency of G+C
   Log-likelihood ratios from transcription factor YY1, TATA score, frequency of G
Non-CpG P versus U Correlation between vector of energy scores and empirical average energy profile
   Log-likelihood ratios from third order Markov chain, TATA score
   Difference between the energy score around positions -25 and +1 and the average from surroundings
   Weighted energy at position +1
   Proportion of Inr and GC-box pair within 10 bp of observed distance, Inr score.
  P versus D Correlation between vector of energy scores and empirical average energy profile, TATA score
   Log-likelihood ratios from third order Markov chain
   Weighted energy at position +1
   Correlation between vector of flexibility scores and empirical average flexibility profile, Inr score
   Difference between the flexibility score around position +1 and the average from surroundings, GC-box score
  1. bp, base pairs; D, immediate downstream sequence; P, promoter; TSS, transcription start site; U, immediate upstream sequence.