Skip to main content

Table 2 Common architectures for cis-regulatory classification

From: Decoding enhancer complexity with machine learning and high-throughput discovery

Machine learning algorithm

Mechanism

Advantages

Interpretation

Support vector machine

Finds a maximal margin hyperplane that best divides data into the required classes

Relatively memory efficient and best suited for high numbers of input dimensions (e.g., k-mers)

With respect to GkmSVN:

- Calculation of importance scores at nucleotide resolution using Shapely values, GkmExplain [212]

- Introduction of variants in the input sequence and estimation of their impact on the SVM score, deltaSVM [24]

Random forest

Predictions are made from the aggregated result from a set of decision trees, trained in parallel, where each node represents a particular feature

Features are used as explicit classifiers, providing a easy way to interpret the model

- Estimation of feature importance scores, such as Gini score, permutation score, and Shapley values, is a standard practice for dissecting tree ensembles [191, 213]

- Partial dependence plots are useful to interpret a random forest; they show the relationship between a given feature and the response variable while other predictor features remain constant [214]

Gradient boosting machine

Uses a series of random forests, and allows for the systematic decrease of a loss function with forests improving on one after another

Yields the benefits of random forests but with added robustness due to having continually improving forests

Similar to random forest

Convolutional neural network (CNN)

Filters of varying sizes slide across the sequence/input unit, capturing patterns and integrating information using cross-correlation to produce a feature map of the sequence

Can learn complex patterns while reducing dimensionality compared to non-convolutional neural networks

Reviewed here [215]

- Search for subsequences that activate a convolutional filter and construct PWMs

- Attention weights for visualizing feature importance

- Propagation of perturbed data through model to observe effects on predictions. This can be done by forward propagation (in silico mutagenesis (ISM)) or backward propagation (e.g., GradCAM, DeepLIFT [216])

- Aggregation of attribution maps to identify globally important sequence motifs (e.g., TFMoDisco [217])

- Initializing filters to known TF motifs (e.g., DanQ [201])

Bidirectional recurrent neural network (RNN)

Related: Time-delay neural network

Hidden states in layers preserve information from previous layers, forming a context that contributes to deciding the next action

Captures interdependencies between hidden states

Similar to CNN

Bidirectional Encoder Representations from Transformers (BERT)

Uses an attention-based model used in natural language processing (NLP) tasks

Use self-attention to understand interaction between important regions

- Shapley values can be computed to dissect BERT models [218]

- DNABERT-viz was developed to visualize importance scores at nucleotide resolution leveraging self-attention values [207]