From: Decoding enhancer complexity with machine learning and high-throughput discovery
Method | Core algorithm/architecture | Goal | Trained model | Reference |
---|---|---|---|---|
Gkm-SVM | Support Vector Machine | To find distinguishing features within regulatory elements | Class 1: CTCF ChIP-seq signal enriched regions in GM12878 cell line Class 2: Random sequences (matching length, GC and repeat fraction) | [196] |
EnhancerFinder | (Multiple Kernel Learning) Support Vector Machine | Enhancer prediction (developmental enhancers) | Class 1: Enhancers from VISTA Enhancer Browser Class 2: Random regions from genomic background | [197] |
RFECS | Random Forest | Enhancer prediction | Class 1: p300-binding sites (H1 and IMR90 datasets from NIH Roadmap Epigenome Project) Class 2: TSS overlapping DNase-I, and random regions distal to known TSS and p300 sites (H1 and IMR90 datasets from NIH Roadmap Epigenome Project) | [198] |
DeepEnhancer | Convolutional Neural Network | Enhancer prediction | Class 1: Enhancers from FANTOM5 Class 2: Sequences from human reference genome | [199] |
DeepSEA | Convolutional Neural Network | To prioritize functional variants at regulatory regions | Multi-label: Open chromatin, TF binding and histone mark profiles from ENCODE and Roadmap Epigenomics datasets across multiple human cell types | [192] |
DeepBind | Convolutional Neural Network | TF binding prediction | Class 1: Protein binding microarrays, ENCODE ChIP-seq peaks, HT-SELEX Class 2: Shuffled class 1 sequences (maintaining dinucleotide composition) | [200] |
Basset | Convolutional Neural Network | To find distinguishing features within regulatory elements | Multi-label: Chromatin accessibility in 164 cell types (ENCODE and Roadmap Epigenomics Consortium) | [20] |
DeepSTARR | Convolutional Neural Network | To find distinguishing features within regulatory elements | Class 1: Enhancers with developmental activities Class 2: Enhancers with housekeeping activities | [193] |
BiRen | Convolutional Neural Network + (Gated Recurrent Unit) Bidirectional Recurrent Neural Network | Enhancer prediction | Class 1: Human and mouse enhancers from VISTA Enhancer Browser with reproducible expression patterns Class 2: Human and mouse enhancers from VISTA Enhancer Browser without reproducible expression patterns | [22] |
DeepMEL | Convolutional Neural Network + (Long-Short Term Memory) Bidirectional Recurrent Neural Network | To find distinguishing features within regulatory elements | Multi-label: Melanoma human open chromatin regulatory regions | [26] |
DanQ | Convolutional Neural Network + (Long-Short Term Memory) Bidirectional Recurrent Neural Network | To find distinguishing features within regulatory elements; To prioritize functional variants at regulatory regions | Multi-label: 919 ChIP-seq and DNase-seq peaks from ENCODE and Roadmap | [201] |
AgentBind | Convolutional Neural Network | Predicting TF binding sites | Class 1: ENCODE TF binding ChIP-seq data from multiple cell types Class 2: Genome-wide excluding Class 1 regions matched for GC content | [202] |
ResNets | (Residual Network) Convolutional Neural Network | To find distinguishing features within regulatory elements | Multi-label: Enhancer sequences with distinct regulatory architectures (homotypic clusters, heterotypic clusters, enhanceosomes) | [203] |
CSI-ANN | Time-Delay Neural Network | Enhancer prediction | Class 1: HeLa cell ENCODE data, Human CD4+T cell data Class 2: Random genomic loci | [204] |
EnhancerDBN | Restricted Boltzmann Machine + Deep Belief Network | Enhancer prediction | Class 1: Human “positive” enhancers (VISTA Enhancer Browser), DNA methylation, histone marks, GC content Class 2: Genomic background matched for length and chromosome distribution to Class 1 | [205] |
BPNet | (Residual Network) Convolutional Neural Network | To predict TF binding profiles at single base-resolution | Multi-label: ChIP profiles for TFs | [206] |
DNABERT | Bidirectional Encoder Representations from Transformers | To find distinguishing features within regulatory elements | Multi-label: k-mers | [207] |
Sei | Convolutional Neural Network; linear and non-linear layers with residual connections | Classifies based on > 21,000 types of human chromatin profiles | Multi-label: > 21,000 types of publicly available human chromatin profiles (TF binding, histone marks and DNA accessibility) across > 1,300 human cell lines and tissues | [18] |
Enformer | Convolutional Neural Network + Transformer | To predict gene expression and chromatin state profile across multiple cell types in human and mouse genomes | Multi-label: 5,313 human and 1,643 mouse gene expression and chromatin states at 128 bp resolution from 200 kb of input sequence | [194] |
ChromBPnet | Convolutional Neural Network | To predict chromatin accessible profiles at single base-resolution across the genome after removing biases from enzymes used in DNase-seq and ATAC-seq assays | Multi-label: SnATAC-seq of human developing cortex | [208] |