epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data

Table 1 Observed reads and the true epigenetic states

A	Read 1	T	C	G	G	A	T
	Read 2	C	C	G	A	A	T
	5^′→3^′	C	\(\overset {\text {me}}{\mathsf {C}}\)	G	G	A	T
	3^′→5^′	G	G	\(\overset {\text {me}}{\mathsf {C}}\)	C	T	A
B	Read		T	C	G	A
	5^′→3^′		C/T	\(\overset {\text {me}}{\mathsf {C}}/\mathsf {C}\)	G	G/A
	3^′→5^′		G/A	G	\(\overset {\text {me}}{\mathsf {C}}/\mathsf {C}\)	C/T

The conditions for epigenetic inference are optimal if there are no sequencing and PCR errors. The table shows inference on the true epigenetic states from reads from a single DNA fragment
A. If error-free reads are available from both strands, one of six possible pairs of nucleotides might be observed at each site by comparing reads, as shown in the two top rows of the table (see also Fig. 1). In each case, the epigenetic state and the strand direction can be inferred unambiguously. For example, if G and A are observed, then the 5^′→3^′ strand has G, while the 3^′→5^′ strand has C unmethylated
B. If reads from only one strand are observed, then inference is in general inconclusive. For each of the four possible nucleotides that might be observed, the true epigenetic state and the strand direction cannot be inferred unambiguously. For example, if T is observed in a read, then the true epigenetic state of the fragment might be C- G or T- A, depending on whether the observed T comes from the 5^′→3^′ strand or the 3^′→5^′ strand. For correct epigenetic inference, it is, therefore, important that reads originating from both strands are observed

ISSN: 1474-760X