Commercial reagents were used as received unless otherwise noted. ARP (O-(biotinylcarbazoylmethyl) hydroxylamine))was purchased from Cayman Chemical(Ann Arbor, Michigan, USA), Alexa Fluor® 488 hydroxylamine from Invitrogen (Carlsbad, California, USA), ammonium acetate from Sigma (Dorset, UK), anisidine from Aldrich (Dorset, UK)and potassium perruthenate from Alfa Aeasar (Ward Hill, Massachusetts, USA). Acetonitrile for high performance liquid chromatography (HPLC)-electrospray ionization (ESI)-mass spectrometry (MS) analysis was purchased from VWR (Radnor, Pennsylvania, USA), HPLC gradient grade. NHEt3OAc for HPLC buffers was purchased from Glenn Research (Sterling, Virginia, USA).
HPLC-ESI-MS analysis of oligonucleotides
The samples were analyzed by HPLC-ESI-MS on a Bruker (Fremont, USA) amaZon × Ion Trap MS and chromatographed by a Dionex (Sunnyvale, CA, USA ) UltiMate 3000 UHPLC system equipped with a diode array detector and a column oven. HPLC analysis of oligonucleotides was performed on a Nucleosil C18 column (250 × 4.6 mm, 5 μm; Macherey Nagel (Dueren, Germany) using the following solvent system: solvent A, 50 mM NHEt3OAc pH 7.4; solvent B, CH3CN; flow rate of 1 ml.min-1; a linear gradient of 0 to 30% was applied over 20 min. The column temperature was maintained at 30°C. The elution was monitored at 260 and 280 nm (Dionex UltiMate 3000 Diode Array Detector). Ions were scanned by use of a negative polarity mode for oligonucleotides.
Mass spectrometry of nucleosides
Genomic DNA was digested using DNA Degradase Plus (Zymo Research, Irvine, CA, USA) according to the manufacturer's instructions and analyzed by liquid chromatography-tandem mass spectrometry on a LTQ Orbitrap Velos mass spectrometer (Thermo Scientific, Waltham, Massachusetts, USA) fitted with a nanoelectrospray ion-source (Proxeon, Odense, Denmark). Mass spectral data for 5fC and 5mC were acquired in high resolution full scan mode (R >40,000 for the protonated pseudomolecular ions and >50,000 for the accompanying protonated base fragment ions). Data for 5mC were also acquired in selected reaction monitoring (SRM) mode monitoring the transition 242 → 126.0662, with HCD fragmentation using a 4 mass unit parent ion isolation window, a relative collision energy of 20% and R >14,000 for the fragment ions. Peak areas for the 5fC and 5mC fragment ions were obtained from extracted ion chromatograms of the relevant scans.
Synthesis of ODN2 (oxidation of 5hmC containing oligonucleotide ODN1)
Oxidation of 5hmC containing oligonucleotide ODN1 was performed by adapting a procedure reported by Booth et al. . The oxidation was performed on a 5'-phosphate protected oligonucleotide to avoid any side reactions due to the oxidation of the 5' primary hydroxyl group. 5hmC containing oligonucleotide ODN1 (1.2 nmol, 8 μM, 1 eq) and KRuO4 (90 nmol, 600 μM, 75 eq) were mixed in a 50 mM NaOH aqueous solution and placed on ice for 15 min. The reaction was stopped by standard ethanol precipitation. The calculated and found masses of ODN2 are reported in Table s2 and Figure s1A in Additional 1.
Synthesis of ODN3 (biotin-labeling of fC containing oligonucleotide ODN2)
The biotin conjugate was obtained by adapting a procedure reported by Pfaffeneder et al. . 5fC containing oligonucleotide ODN2 (0.27 nmol, 8 μM, 1 eq) in a 40 mM aqueous NH4OAc buffer pH 5.0 supplemented with 100 mM anisidine was incubated with N-(aminooxyacetyl)-N'-(D-biotinoyl) hydrazide) (13.6 nmol, 400 μM, 50 eq) for 24 h at 25°C. The reaction was stopped by standard ethanol precipitation. Figure s1A in Additional file 1 shows the LC-MS analysis of the product ODN3 and the calculated and found masses are reported in Table s2 in Additional file 1.
In order to control the specificity of the reaction, the same reaction was carried out on ODN1; in the absence of the oxidation step, only the starting material was recovered (Figure s1B in Additional file 1).
Biotin-labelling of fC in genomic DNA samples
Genomic DNA was prepared by sonicating genomic DNA extracted from mouse embryonic stem cells J1. Genomic DNA was sonicated for 4 × 15 cycles (30 s on, 30 s off pulse) using a Diagenode Bioruptor sonicator in order to obtain 200 to 500 bp fragments. The incubation of genomic DNA samples with ARP was carried out in analogy with the procedure for synthetic oligonucleotides. The final DNA concentration was adjusted to 5 ng/μl in a 40 mM aqueous NH4OAc buffer pH 5.0 supplemented with 100 mM anisidine and 2 mM ARP. After the reaction, the DNA was purified using the GeneJet PCR purification kit (Fermentas, Waltham, Massachusetts, USA) and eluted in 50 µl elution buffer (10 mM Tris-HCl, pH 8.5).
Labeling of deoxyribose and oligonucleotide
The biotin conjugates were obtained by adapting a procedure reported by Ide et al. . Therefore, 2-deoxyribose 5-phosphate or abasic sites containing 103-mer (100 μM) was incubated with ARP (2 mM) in phosphate-buffered saline (pH 7) for 1 h at 37°C.
Synthesis of abasic sites containing 103-mer
The abasic sites containing 103-mer were obtained in 2 steps starting from the incorporation of uracil into 103-2 DNA using the Dreamtaq polymerase (Fermentas) (Table s1 in Additional file 1). The DNA was then purified using the GeneJet PCR purification kit. Subsequent treatment of the uracil-containing oligonucleotide with Uracil-DNA-Glycosylase (NEB, Ipswich, Massachusetts, USA) afforded the abasic sites containing 103-mer.
Pulldown experiment and Illumina library preparation
The antibody pulldown experiment was done following a procedure reported by Ficz et al. . For the chemical pulldown, the ends of the DNA fragments were repaired and paired-end sequencing specific adaptors (Illumina, San Diego, California, U.SA) were ligated using the NEBNext DNA Sample Prep Reagent Set 1 (NEB). Following adaptor ligation, DNA and 2 μg poly-dI:dC were incubated with 50 μg streptavidin coated magnetic beads (MagneSphere Promega, Fitchburg, Wisconsin, USA) in 50 μl 2× binding buffer (10 mM Tris pH 7.5, 1 mM EDTA, 2 M NaCl, 0.1% TWEEN) for 15 minutes at room temperature. Beads were washed with 5× 500 μl binding buffer and transferred into a new eppendorf. For elution, beads were incubated with 100 μl elution solution (95% formamide, 10 mM EDTA and 400 nM biotin) at 90°C for 5 minutes and the eluant was collected and immediately placed on ice. This step was repeated to elute any residual DNA. Eluted DNA was then precipitated in ethanol and the DNA pellet was resuspended in 15 μl ddH2O. Fragments were amplified with 16 cycles using adaptor specific primers (Illumina); fragments ranging between 200 and 500 bp in size were gel purified before cluster generation and sequencing. Sequencing was done on an Illumina Genome Analyzer GAIIX using Cluster Generation v4 and 5 chemistries as well as Sequencing by Synthesis Kit v5. Data collection was performed using Sequencing Control Software v2.6 and 2.9. Real-time Analysis (RTA) 1.6 and 1.9 were used for base calling.
Enrichment test by quantitative PCR prior sequencing
Before each pulldown, genomic DNA was spiked with two synthetic oligomers: 1 pg of 103 bp DNA containing one single biotin-fCpG and 10 pg of 103 bp DNA with one single CpG. C-103mer was added 10-fold more than fC-103mer in order to get similar Ct values for both strand amplification. Details on the sequence and primers used are given in Table s1 in Additional file 1. After pulldown, enrichment was validated by quantitative RT-PCR.
Culturing of ES cells and RNA interference knockdown of Tet1 and Tdg
Cell culturing was done on J1 ES cells line (129S4/SvJae), purchased from ATCC (catalogue number SCRC-1010) and grown on a γ-irradiated pMEF feeder layer at 37°C and 5% CO2 in complete ES medium (DMEM 4,500 mgl-1 glucose, 4 mM L-glutamine and 110 mg l-1 sodium pyruvate, 15% fetal bovine serum, 100 U of penicillin/100 mg of streptomycin in 100 ml medium, 0.1 mM non-essential amino acids, 50 mM β-mercaptoethanol, 103U LIF ESGRO). RNA interference experiments were performed in J1 ES cells without feeders with three rounds of transfections with siRNA every second day. In the first day 1 × 105 cells were seeded per well (3.8 cm2) of a 12-well plate and adherent cells were transfected the next day with 50 pmol siRNA: 3 μl Lipofectamine™2000 complexes in media without antibiotics according to the manufacturer's instructions. After 8 h fresh media with antibiotics was added to the cells. This procedure was repeated 48 and 96 h after the first transfection, both times on cells in suspension. Cells were passaged and transfections were scaled up when necessary. Transfections were done with Dharmacon (Lafayette Colorado, USA) siGENOME siRNA duplexes (Thermo Fisher Scientific, Waltham, Massachusetts, USA) against mouse Tet1 (catalogue number D-062861-01; caacuugcauccacgauua), siGENOME SMARTpool against mouse Tdg (catalogue number M-040666-01; gaagugcaguauacauuug, gaguaaagguuaagaacuu, caaagaagauggcuguuaa, gcaaggaucugucuaguaa) and siGENOME non-targeting siRNA#2 (catalogue number D-001210-02; sequence not available). Cells were harvested after three rounds of transfection for DNA/RNA isolation.
Bioinformatics and data analysis
Reads in fastq format obtained from the Illumina sequencing pipeline have been aligned against the mouse genome (NCBI version mm9) using bwa  with default settings. Before further analyses, only reads unequivocally assigned to a single genomic position (that is, reads mapped with mapq quality of 15 or greater) were retained (see Table s7 in Additional file 1 for details of total number of reads and mapped reads). The libraries enriched for 5hmC and 5mC were downloaded from the Short Read Archive (accession ID ERP000570; run IDs ERR031631 and ERR031628 for 5hmC; run IDs ERR031630 and ERR031627 for 5mC). These libraries were processed as above, although only the first mate of each pair and only the first 40 bases of each read were analyzed in order to conform them with the 5fC sequencing protocol (single-end, 40 cycles).
Genomic regions enriched in 5fC, 5hmC or 5mC were identified with the program rseg  in mode 2 using the input library as control and setting the bin size to 100 bp. Each replicate was analyzed separately and a consensus enrichment was compiled by intersecting the enriched regions from the two replicates of each treatment. The consensus regions are provided as supplementary files (Additional file 2). The regions affected by the Tet1 or Tdg KD - that is, the regions enriched in 5fC in the KD relative to the control KD - were identified by running rseg in mode 3 with bin size of 100 bp (Additional file 2).
The position of the functional regions (CGI, intron, exons, and so on) as well as the position of CTCF, p300 and Pol II were extracted from the UCSC genome browser .
The identification of CGI relatively more enriched in one modification over another was performed by assuming that the number of reads overlapping each CGI follow a negative binomial distribution. The difference between conditions was tested by means of the exact test described in Robinson et al. . These analyses were performed using the R/Bioconductor package edger . All the analyses not mentioned above were performed by means of samtools , bedtools  and custom python and R scripts. MeDIP-seq data for the TDG KO analysis from the HEROIC Consortium (accession ID GSE27468) were aligned with Bowtie in paired-end mode using options -m 1 --best --strata --maxins 700 --chunkmbs 512. Gene ontology annotation was executed via the web service DAVID v6.7  by means of the Functional Annotation tool .
The raw sequence data used to map 5fC and the position of the 5fC enriched regions for each library have been deposited at NCBI Gene Expression Omnibus under accession number GSE40148.