Skip to main content

Table 1 Current DNA methylation calling tools for Nanopore sequencing

From: DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation

Tools

DNA modification

Input required

Support multi-read FAST5 file?

Flow cells compatibility

Model trained on

Algorithm

Reported performance

4mC

5mC

5hmC

6 mA

Nanopolish [9]

 

✓

  

Basecalled FAST5

✓

R7.3, R9, R9.4, R9.4.1, R9.5, R10

E. coli

Hidden Markov model (HMM)

Accuracy = 0.94 (5mC, Homo sapiens)

Tombo/ Nanoraw [20]

✓

✓

 

✓

Raw FAST5

 

R9.4, R9.4.1, R9.5

no model

Mann-Whitney and Fisher’s exact test

Accuracy = 0.839, AUC = 0.78

SignalAlign [39]

 

✓

✓

✓

Basecalled FAST5

 

R7.3, R9.4a

Synthetic nucleotides

Hidden Markov model with a hierarchical Dirichlet process (HMM-HDP)

Accuracy = 0.76 (for 5hmC, 5mC)

E. coli

Accuracy = 0.96 (for 5mC), Precision = 0.92

Guppy [32]

 

✓

 

✓

Raw FAST5

✓

R7.3, R9, R9.4, R9.4.1, R9.5, R10, R10.3

Homo sapiens and E. coli

Recurrent neural network

N/A

NanoMod [31]

 

✓

  

Basecalled FAST5, requires control sequence

 

R7.3, R9

no model

Kolmogorov-Smirnov test

Precision = 0.9

mCaller [33]

   

✓

Basecalled FAST5

 

R9, R9.4, R9.5

E. coli

Neural network

Accuracy = 0.954, AUC = 0.99

DeepSignal [35]

 

✓

 

✓

Basecalled FAST5 processed by Tombo re-squiggle module

 

R9, R9.4, R9.4.1

E. coli

Bidirectional RNN with LSTM+Inception structure

Accuracy = 0.92 (5mC, Homo sapiens), 0.90(m6A), Precision = 0.97

DeepMod [34]

 

✓

 

✓

FAST5 with raw signals and base calls

 

R9, R9.4, R9.4.1

E. coli

Bidirectional recurrent neural network (RNN) with long short-term memory (LSTM)

Precision = 0.99, AUC >  0.97

Megalodon [36]

 

✓

 

✓

Raw FAST5b

✓

R9.4.1, R10.3

Homo sapiens and E. colic

Recurrent neural networkd

N/Ae

methBERT [23]

 

✓

 

✓

Raw FAST5

 

R9

Homo sapiens and E. coli

Bidirectional encoder representations from transformers (BERT)

Precision = 0.9147 (5mC, Homo sapiens)f

METEORE [38]

 

✓

 

✓

Methylation calling per-read resultsg

 

R9.4.1

E. coli

Random forest (RF), multiple linear regression (REG)

Root mean square error (RMSE) = 0.0687 (5mC, E.coli)h

DeepMP [37]

 

✓

 

✓

Basecalled FAST5 processed by Tombo re-squiggle module

 

R9, R9.4

Homo sapiens, E. coli, pUC19

Convolutional neural network (RNN)

F-score = 0.9324 (5mC, Homo sapiens)i

  1. aSignalAlign’s compability on R9.4 is only validated on 5mC and 6 mA, not 5hmC
  2. bMegalodon must obtain the intermediate output from the basecall neural network, and Guppy is the recommended backend to obtain this output from FAST5
  3. cThe model is trained in biological contexts only on Homo sapiens and E. coli. Users have to specify the model from the modified base models included in basecaller Guppy or research models in ONT Rerio repository
  4. dMegalodon’s functionality centers on the anchoring of the high-information neural network basecalling output to a reference sequence
  5. eThe performance for Megalodon is not available since it is still under active development, no available published paper yet
  6. fOnly 5mC precision on Homo sapiens at per-site level is listed here, more performance parameter (AUC, Recall) of 5mC at per-site level and per-read level, and 5mC/6 mA performance on E.coli are available in the original paper
  7. gMETEORE combine the outputs from two or more methylation calling tools
  8. hRMSE is for METEORE RF model combining Megalodon and DeepSignal at per-site level on selected sites of E.coli
  9. iOnly 5mC overall accuracy on Homo sapiens at per-read level is listed here. More performance parameters on Homo sapiens, and 5mC performance on E.coli, 5 mA performance on pUC19 plasmid are available in the original paper