Decoding enhancer complexity with machine learning and high-throughput discovery

Smith, Gabrielle D.; Ching, Wan Hern; Cornejo-Páramo, Paola; Wong, Emily S.

doi:10.1186/s13059-023-02955-4

Table 2 Common architectures for cis-regulatory classification

From: Decoding enhancer complexity with machine learning and high-throughput discovery

Machine learning algorithm	Mechanism	Advantages	Interpretation
Support vector machine	Finds a maximal margin hyperplane that best divides data into the required classes	Relatively memory efficient and best suited for high numbers of input dimensions (e.g., k-mers)	With respect to GkmSVN: - Calculation of importance scores at nucleotide resolution using Shapely values, GkmExplain [212] - Introduction of variants in the input sequence and estimation of their impact on the SVM score, deltaSVM [24]
Random forest	Predictions are made from the aggregated result from a set of decision trees, trained in parallel, where each node represents a particular feature	Features are used as explicit classifiers, providing a easy way to interpret the model	- Estimation of feature importance scores, such as Gini score, permutation score, and Shapley values, is a standard practice for dissecting tree ensembles [191, 213] - Partial dependence plots are useful to interpret a random forest; they show the relationship between a given feature and the response variable while other predictor features remain constant [214]
Gradient boosting machine	Uses a series of random forests, and allows for the systematic decrease of a loss function with forests improving on one after another	Yields the benefits of random forests but with added robustness due to having continually improving forests	Similar to random forest
Convolutional neural network (CNN)	Filters of varying sizes slide across the sequence/input unit, capturing patterns and integrating information using cross-correlation to produce a feature map of the sequence	Can learn complex patterns while reducing dimensionality compared to non-convolutional neural networks	Reviewed here [215] - Search for subsequences that activate a convolutional filter and construct PWMs - Attention weights for visualizing feature importance - Propagation of perturbed data through model to observe effects on predictions. This can be done by forward propagation (in silico mutagenesis (ISM)) or backward propagation (e.g., GradCAM, DeepLIFT [216]) - Aggregation of attribution maps to identify globally important sequence motifs (e.g., TFMoDisco [217]) - Initializing filters to known TF motifs (e.g., DanQ [201])
Bidirectional recurrent neural network (RNN) Related: Time-delay neural network	Hidden states in layers preserve information from previous layers, forming a context that contributes to deciding the next action	Captures interdependencies between hidden states	Similar to CNN
Bidirectional Encoder Representations from Transformers (BERT)	Uses an attention-based model used in natural language processing (NLP) tasks	Use self-attention to understand interaction between important regions	- Shapley values can be computed to dissect BERT models [218] - DNABERT-viz was developed to visualize importance scores at nucleotide resolution leveraging self-attention values [207]

Back to article page

ISSN: 1474-760X

Contact us

Submission enquiries: editorial@genomebiology.com
General enquiries: info@biomedcentral.com

Genome Biology

Contact us