Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: pycoMeth: a toolbox for differential methylation testing from Nanopore methylation calls

Fig. 2

The MetH5 file format. A Structure of the HDF5 container including dataset types and shapes. \(N_x\) refers to the number of methylation calls per chromosome x. R refers to the total number of reads in the entire container. Methylation calls are stored together with their genomic coordinate on the chromosome (range), the log-likelihood ratio (LLR) of methylation, and a numeric read ID (unique to this container). Read names are optionally stored, mapping each of the MetH5 numeric read IDs to the original read name. An arbitrary number of read groupings can be stored, assigning each read to exactly one read group per grouping. B Schematic representation of random access in the MetH5 format. An index per chromosome allows direct access to the required chunk. The range dataset can then be searched for the start and end index. Once these indices have been acquired, LLRs and read IDs can be read directly and optionally. If globally unique read names are required, they can be looked up directly using the read ID, and the same holds for read groups such as haplotype assignments. C Performance comparison between MetH5 and BAM/CRAM format with MM tag (Materials and methods). In the file size comparison, bars represent only the extra space occupied by MM and ML tags, and native BAM size is annotated next to the bar

Back to article page