From: Decoding enhancer complexity with machine learning and high-throughput discovery
Machine learning algorithm | Mechanism | Advantages | Interpretation |
---|---|---|---|
Support vector machine | Finds a maximal margin hyperplane that best divides data into the required classes | Relatively memory efficient and best suited for high numbers of input dimensions (e.g., k-mers) | With respect to GkmSVN: - Calculation of importance scores at nucleotide resolution using Shapely values, GkmExplain [212] - Introduction of variants in the input sequence and estimation of their impact on the SVM score, deltaSVM [24] |
Random forest | Predictions are made from the aggregated result from a set of decision trees, trained in parallel, where each node represents a particular feature | Features are used as explicit classifiers, providing a easy way to interpret the model | - Estimation of feature importance scores, such as Gini score, permutation score, and Shapley values, is a standard practice for dissecting tree ensembles [191, 213] - Partial dependence plots are useful to interpret a random forest; they show the relationship between a given feature and the response variable while other predictor features remain constant [214] |
Gradient boosting machine | Uses a series of random forests, and allows for the systematic decrease of a loss function with forests improving on one after another | Yields the benefits of random forests but with added robustness due to having continually improving forests | Similar to random forest |
Convolutional neural network (CNN) | Filters of varying sizes slide across the sequence/input unit, capturing patterns and integrating information using cross-correlation to produce a feature map of the sequence | Can learn complex patterns while reducing dimensionality compared to non-convolutional neural networks | Reviewed here [215] - Search for subsequences that activate a convolutional filter and construct PWMs - Attention weights for visualizing feature importance - Propagation of perturbed data through model to observe effects on predictions. This can be done by forward propagation (in silico mutagenesis (ISM)) or backward propagation (e.g., GradCAM, DeepLIFT [216]) - Aggregation of attribution maps to identify globally important sequence motifs (e.g., TFMoDisco [217]) - Initializing filters to known TF motifs (e.g., DanQ [201]) |
Bidirectional recurrent neural network (RNN) Related: Time-delay neural network | Hidden states in layers preserve information from previous layers, forming a context that contributes to deciding the next action | Captures interdependencies between hidden states | Similar to CNN |
Bidirectional Encoder Representations from Transformers (BERT) | Uses an attention-based model used in natural language processing (NLP) tasks | Use self-attention to understand interaction between important regions | - Shapley values can be computed to dissect BERT models [218] - DNABERT-viz was developed to visualize importance scores at nucleotide resolution leveraging self-attention values [207] |