Skip to main content

Table 1 Machine learning models used in the prediction of cis-regulatory elements

From: Decoding enhancer complexity with machine learning and high-throughput discovery

Method

Core algorithm/architecture

Goal

Trained model

Reference

Gkm-SVM

Support Vector Machine

To find distinguishing features within regulatory elements

Class 1: CTCF ChIP-seq signal enriched regions in GM12878 cell line

Class 2: Random sequences (matching length, GC and repeat fraction)

[196]

EnhancerFinder

(Multiple Kernel Learning) Support Vector Machine

Enhancer prediction (developmental enhancers)

Class 1: Enhancers from VISTA Enhancer Browser

Class 2: Random regions from genomic background

[197]

RFECS

Random Forest

Enhancer prediction

Class 1: p300-binding sites (H1 and IMR90 datasets from NIH Roadmap Epigenome Project)

Class 2: TSS overlapping DNase-I, and random regions distal to known TSS and p300 sites

(H1 and IMR90 datasets from NIH Roadmap Epigenome Project)

[198]

DeepEnhancer

Convolutional Neural Network

Enhancer prediction

Class 1: Enhancers from FANTOM5

Class 2: Sequences from human reference genome

[199]

DeepSEA

Convolutional Neural Network

To prioritize functional variants at regulatory regions

Multi-label: Open chromatin, TF binding and histone mark profiles from ENCODE and Roadmap Epigenomics datasets across multiple human cell types

[192]

DeepBind

Convolutional Neural Network

TF binding prediction

Class 1: Protein binding microarrays, ENCODE ChIP-seq peaks, HT-SELEX

Class 2: Shuffled class 1 sequences (maintaining dinucleotide composition)

[200]

Basset

Convolutional Neural Network

To find distinguishing features within regulatory elements

Multi-label: Chromatin accessibility in 164 cell types (ENCODE and Roadmap Epigenomics Consortium)

[20]

DeepSTARR

Convolutional Neural Network

To find distinguishing features within regulatory elements

Class 1: Enhancers with developmental activities

Class 2: Enhancers with housekeeping activities

[193]

BiRen

Convolutional Neural Network + (Gated Recurrent Unit) Bidirectional Recurrent Neural Network

Enhancer prediction

Class 1: Human and mouse enhancers from VISTA Enhancer Browser with reproducible expression patterns

Class 2: Human and mouse enhancers from VISTA Enhancer Browser without reproducible expression patterns

[22]

DeepMEL

Convolutional Neural Network + (Long-Short Term Memory) Bidirectional Recurrent Neural Network

To find distinguishing features within regulatory elements

Multi-label: Melanoma human open chromatin regulatory regions

[26]

DanQ

Convolutional Neural Network + (Long-Short Term Memory) Bidirectional Recurrent Neural Network

To find distinguishing features within regulatory elements; To prioritize functional variants at regulatory regions

Multi-label: 919 ChIP-seq and DNase-seq peaks from ENCODE and Roadmap

[201]

AgentBind

Convolutional Neural Network

Predicting TF binding sites

Class 1: ENCODE TF binding ChIP-seq data from multiple cell types

Class 2: Genome-wide excluding Class 1 regions matched for GC content

[202]

ResNets

(Residual Network) Convolutional Neural Network

To find distinguishing features within regulatory elements

Multi-label: Enhancer sequences with distinct regulatory architectures (homotypic clusters, heterotypic clusters, enhanceosomes)

[203]

CSI-ANN

Time-Delay Neural Network

Enhancer prediction

Class 1: HeLa cell ENCODE data, Human CD4+T cell data

Class 2: Random genomic loci

[204]

EnhancerDBN

Restricted Boltzmann Machine + Deep Belief Network

Enhancer prediction

Class 1: Human “positive” enhancers (VISTA Enhancer Browser), DNA methylation, histone marks, GC content

Class 2: Genomic background matched for length and chromosome distribution to Class 1

[205]

BPNet

(Residual Network) Convolutional Neural Network

To predict TF binding profiles at single base-resolution

Multi-label: ChIP profiles for TFs

[206]

DNABERT

Bidirectional Encoder Representations from Transformers

To find distinguishing features within regulatory elements

Multi-label: k-mers

[207]

Sei

Convolutional Neural Network; linear and non-linear layers with residual connections

Classifies based on > 21,000 types of human chromatin profiles

Multi-label: > 21,000 types of publicly available human chromatin profiles (TF binding, histone marks and DNA accessibility) across > 1,300 human cell lines and tissues

[18]

Enformer

Convolutional Neural Network + Transformer

To predict gene expression and chromatin state profile across multiple cell types in human and mouse genomes

Multi-label: 5,313 human and 1,643 mouse gene expression and chromatin states at 128 bp resolution from 200 kb of input sequence

[194]

ChromBPnet

Convolutional Neural Network

To predict chromatin accessible profiles at single base-resolution across the genome after removing biases from enzymes used in DNase-seq and ATAC-seq assays

Multi-label: SnATAC-seq of human developing cortex

[208]