The cell line EBRTcH3 (EB3) was obtained from the laboratory of Dr Hitoshi Niwa and have been previously described in .
mES cells were grown in mES media + leukemia inhibitory factor (LIF) (DMEM high glucose (Invitrogen Ltd, Paisley, UK, catalog no. 11995-065) supplemented with 15% fetal bovine serum defined (HyClone, Thermo Scientific, Logan, UT, USA, catalog no. SH30070.03), 0.1 mM nonessential amino acids (Gibco-Brl, Invitrogen Ltd, Paisley, UK, catalog no. 11140-050), 0.1 mM 2-mercaptoethanol (Sigma-Aldrich, St. Louis, MO, USA, catalog no. M6250), and 1,000 U/ml ESGRO-LIF (Millipore, Billerica, MA, USA, catalog no. ESG1107)) at 37°C in an atmosphere of 5% CO2. All stable cell lines derived from EB3 were grown in mES media + LIF supplemented with 1 μg/ml Tc (Sigma, catalog no. T7660). For antibiotic selection of RMCE lines, mES + LIF + Tc supplemented with 1.5 μg/ml of puromycin (Sigma, catalog no. P9620) was used. In the case of two of the mES inducible clones (ZFP295, Hunk), these were grown in mES + LIF + Tc supplemented with 7.5 μg/ml puromycin to decrease the variation among the biological replicates of clones.
mES cells were trypsinized (in Trypsin-EDTA solution 10×, Sigma, catalog no. T4174) and plated 1 day before the nucleofection on 0.1% gelatin (Gelatin Type I from porcine skin, Sigma) coated 100-mm dishes (Nunc Gmbh & Co., Langenselbold, Germany, catalog no. 150350) in mES media + LIF supplemented with Tc. For nucleofection 2 × 106 cells were counted for each sample. Plasmids were prepared using Qiagen plasmid Midi kit (Qiagen spa, Milano, Italy, catalog no. 12145): 5 to 6 μg of pPthC vector containing each ORF  were incubated with 3 μg of pCAGGS-Cre vector  and 100 μl of Mouse ES Cell Nucleofector Kit (Amaxa, Lonza Cologne, Germany, catalog no. VPH-1001) was added to the plasmid mix. The nucleofection program used was the A30 program. Cells were then incubated for 10 to 15 minutes at room temperature in the presence of complete medium and plated. The day after the nucleofection, cells were washed twice with PBS (Dulbecco Phosphate buffered Saline 1×, Gibco, catalog no. 14190), and switched to selection media (mES + LIF + Tc + 1.5 μg/ml puromycin). The colonies were grown for approximately 7 to 8 days before they were individually trypsinized and transferred to 96-well U-bottom plates (Nunc, catalog no. 163320). Trypsinized cells were neutralized with mES media + LIF, vigorously pipetted, and then each clone was equally distributed among two gelatin-coated 48-well plates (Nunc, catalog no. 150687), the former with selection media and the latter with mES + LIF + 150 μg/ml hygromicin (Hygromycin B in PBS, Invitrogen, catalog no. 10687-010). When confluent, the clones resistant to selection media and completely dead in parallel in mES media + LIF + hygromicin were isolated, replicated in 12-well plates (Nunc, catalog no. 150628) and when confluent replicated in 6-well plates (Nunc, catalog no. 140675) to extract the genomic DNA using standard conditions.
The positive clones were identified by PCR using standard conditions using the following primer pair: 5'-GCATCAAGTCGCTAAAGAAGAAAG-3' and 5'-GAGTGCTGGGGCGTCGGTTTCC-3'. All positive clones analyzed were frozen at -135°C using standard conditions.
In compliance with our policy of distribution of published reagents, all the mES clones generated within this project are available for distribution to academic research centers upon request.
The exchange vector pPthC-Oct-3/4 was obtained from the laboratory of Dr Hitoshi Niwa and has been previously described in .
For the cloning of each gene we decided to use only the coding sequence, from the ATG to the stop codon, without the 5' and 3' untranslated regions. For 29 ORFs, we cloned the murine coding sequence, while for 1 transcription factor (ZFP295) and 2 protein kinases (DYRK1A; SNF1LK) we used the human coding sequence (see Additional file 2 for more general information about these genes). For a subset of the selected genes there is evidence for the presence of different alternatively spliced isoforms that may differ in their coding sequence. In this case we decided to clone the longest annotated coding sequence.
The exchange vector was modified, in the region between XhoI and NotI restriction sites, by adding a multiple-cloning site that contains sequences recognized by three restriction enzymes (I-SceI, AscI and PacI) and by adding the epitope 3 × FLAG. Two double-stranded oligonucleotides, containing 3 × Flag sequence, with the sequences recognized by PacI and NotI at the 5' and 3' ends, respectively, were designed. These oligonucleotides were then inserted into the exchange vector, and digested by PacI-NotI. The epitope 3 × FLAG was designed to be in frame with the stop codon of each ORF.
The plasmids containing the cDNAs of Gabpa, Olig1 and Dscr1 were obtained from Biotech Custom Services Primm srl (Milano, Italy); the plasmid containing the cDNA of Olig2 was obtained from the laboratory of Dr Yaspo; the plasmid containing the cDNA of Runx1 was obtained from the laboratory of Dr Groner; the plasmid containing the cDNA of Sim2 was obtained from the laboratory of Dr Whitelaw. The cDNAs of Aire, 1810007M14Rik, Erg and Hunk were obtained by retro-transcription with SuperScript III Reverse transcriptase (Invitrogen, catalog no. 18080-044) from total RNA extract of embryonic stem cells. All other plasmids were purchased from ImaGENES (formerly RZPD, Berlin, Germany).
The cDNAs were amplified using the plasmids as templates by PCR in standard conditions. The forward and reverse primers used to amplify the cDNAs were designed to include in the sequence the restriction sites recognized by the enzymes AscI and PacI at the 5' and 3' ends, respectively.
Primer pair sequences used for the cloning are available in Additional file 21. In the case of Cstb, the primers introduce the sequence recognized by PacI at both ends of the amplified product while, in the case of Runx1, the primers introduce the restriction sites of XhoI and NotI at the 5' and 3' ends, respectively. After digestion with the specific restriction enzymes, the cDNA fragments were cloned into pTOPO-bluntII (Invitrogen, catalog no. K2875J10). The pTOPO-bluntII containing the cDNAs was then cleaved by AscI-PacI or only by PacI (for Cstb) or by XhoI-NotI (for Runx1). The fragments obtained by digestion were separated from pTOPO-bluntII in a 1% agarose gel in TAE buffer and finally purified with QIAquick Gel Extraction kit (Qiagen, catalog no. 28706) using standard conditions. The purified cDNA fragments were then inserted into the appropriately digested and purified pPthC vector . We screened the Escherichia coli positive clones in which the vector contained the cDNA fragments by enzymatic digestions and then sequencing the positive clones using the universal M13Fw primer and, for longer sequences, internal forward primers specific to the gene of interest.
Induction of transgene expression
Three positive clones coming from the six-well copy were thawed, amplified and tested for the inducibility of the introduced gene to Tc. The complete removal of Tc results in sufficient induction of the Tet-off system . Cells to be induced were washed twice with PBS, cultured for more than 3 hours in DMEM without Tc, trypsinized and re-plated onto new dishes. Clones were grown in medium deprived of Tc to perform a time course of induction (17, 24, 39 and 48 hours). In the presence of Tc (0 hours), the expression of each mRNA was indicative of the basal expression level in mES cells. Total RNA samples at various times of induction were purified by QIAshredder (catalog no. 79656) and extracted with RNeasy Protect Mini Kit (catalog no. 74126) using standard conditions. Total RNA (1 μg) was reverse-transcribed by QuantiTect Reverse Transcription Kit (Qiagen, catalog no. 205313) according to the manufacturer's instructions. q-PCR experiments were performed using Light Cycler 480 Syber Green I Mastermix (Roche spa, Monza, Italy, catalog no. 04887352001) for cDNA amplification and in LightCycler 480 II (Roche) for signal detection. q-PCR results were analyzed using the comparative Ct method normalized against the housekeeping gene Actin B.
All primer pair sequences used for q-PCR are available in Additional file 4. Luciferase assays on mES cells overexpressing the firefly luciferase (Luc) gene was performed using Dual Luciferase Reporter Assay System (Promega Italia, Milano, Italy). YFP fluorescence assay to detect the expression of the YFP reporter was performed using the DM6000 Leica Microscope.
The analysis was performed on 20 different inducible clones of our mES cell bank (7 effective and 13 silent genes) and on parental ES cells (EB3) at the beginning of this study on the cell line received from Dr Hitoshi Niwa and again 2 years later. A single inducible clone was chosen randomly within the biological triplicate for this analysis. Cells at 70% confluence were treated with colcemid (Invitrogen) for 2 hours and harvested. Cell pellets were resuspended in pre-warmed hypotonic solution (0.56% KCl) and incubated at 37°C. Cells were then fixed with freshly prepared, ice-cold methanol-acetic acid solution (3:1 in volume) and mounted by dropping onto slides from a height of 1 meter. Metaphase spreads were stained with 5% Giemsa solution (Invitrogen). Approximately 20 images were taken, and 25 spreads were analyzed to assess the percentage of euploid cells.
Embryonic stem cell differentiation
The EB3 cells and the parental line E14 cells  were allowed to differentiate using the 'hanging drop' method [44, 45]. The differentiation medium consists of the mES cell medium depleted of LIF. The primer pair of Oct3/4 used in q-PCR is reported in Additional file 4.
Whole cell lysates were extracted after 24 or 48 hours of induction by lysis buffer (50 mM Tris-HCl (pH 8.0), 200 mM NaCl, 1% Triton, 1 mM EDTA, 50 mM Hepes) containing 1% (v/v) of proteinase inhibitor cocktail (Sigma, catalog no. P8340). Thirty micrograms of protein extract from 4 out of 7 clones overexpressing effective genes (Erg, Nrip1, Runx1, Pdxk) and 11 out of 13 overexpressing silent genes (Bach1, Ets2, Gabpa, Olig1, Pknox1, 1810007M14Rik, Dscr1-Rcan1, DYRK1A, Hunk, Pfkl, Ripk4) were fractionated on 10% SDS-PAGE gels and electroblotted onto Trans-Blot transfer membrane (Biorad Italy, Segrate, Milano, Italy, catalog no. 162-0112). After incubation in blocking buffer in standard conditions, the membranes were incubated with anti-Flag antibody produced in rabbit (Sigma, catalog no. F7425) and then with anti-rabbit IgG horseradish peroxidase linked whole antibody (Amersham Biosciences, GE Healthcare Europe GmbH, Milano, Italy, catalog no. NA934V). Luminescence was performed using Super Signal West Pico Chemiluminescent substrate (Pierce, Euroclone, Pero, Milano, Italy, catalog no. 34080).
Total RNA (3 μg) was reverse transcribed to single-stranded cDNA with a special oligo (dT)24 primer containing a T7 RNA promoter site, added 3' to the poly-T tract, prior to second strand synthesis (One Cycle cDNA Synthesis Kit by Affymetrix, Fremont, CA, USA). Biotinylated cRNAs were then generated, using the GeneChip IVT Labeling Kit (Affymetrix). Twenty micrograms of biotinylated cRNA was fragmented and 10 μg hybridized to the Affymetrix GeneChip Mouse Genome 430_2 array for 16 hours at 45°C using an Affymetrix GeneChip Fluidics Station 450 according to the manufacturer's standard protocols.
Microarray data processing
Low-level analysis to convert probe level data to gene level expression data was done using robust multiarray average (RMA) implemented using the RMA function of the Affymetrix package of the Bioconductor project [46, 47] in the R programming language . The low-level analysis for the BAMarray tool was performed using the MAS5 method, implemented using the corresponding function of the same Bioconductor package.
Statistical analysis of differential gene expression
For each gene, a t-test was used on RMA normalized data to determine if there was a significant difference in expression between the two groups of microarrays (induced versus uninduced). P-value adjustment for multiple comparisons was done with the FDR of Benjamini-Hochberg . A FDR control was applied to correct for multiple comparisons; the thresholds used in the different cases are reported in the main text. The BAM analysis was performed with BAMarray v3.0. The analysis was performed on MAS5 normalized array data using the default settings except for the following parameters: accuracy was set to high, clustering was set to manual with a value of 25, and variance was set to unequal.
t-Tests were also carried out to assess the significance of the variation in the relative expression values of each of the 20 genes analyzed in the parental cell line (EB3) versus the corresponding transgenic inducible clones (in the biological replicates) grown in the presence of Tc (0 hours of induction). In this statistical analysis the threshold for statistical significance chosen was a FDR < 0.05. The apparent increase of expression levels between EB3 cells and the non-induced state (in the cases of Bach1 and Gabpa, for example) was not statistically significant and therefore can be explained by the biological variability of expression levels of these genes in mES cells. In Additional file 5, we report the comparison of relative expression of 20 genes in the EB3 cell line with the corresponding transgenic inducible clones (in the biological replicates) grown in the presence of Tc (0 hours of induction).
Microarray data analysis
In the cases of Runx1 and Erg overexpression, a large number of genes were differentially expressed with FDR <5% (4,585 genes for Runx1 and 5,820 for Erg). This means that the number of false positives obtained from Runx1 and Erg experiments are 229 and 291, respectively. In order to reduce the number of false positives, we decided to perform the GO analysis on the gene set obtained while filtering the array using a more stringent criteria (FDR <1%). The differential expression of genes as obtained with the microarray was validated by q-PCR of the most up- and down-regulated genes as ranked by the differential expression ratio. In Additional file 4 we report the primer pair used in q-PCR.
Gene set enrichment analysis
GSEA [24, 50] was performed to determine if the set of silent genes was characterized by above average wild-type expression levels. The analysis was performed on the whole list of 45,102 probesets using the online GSEA server  with the default values for all the tool parameters and produced an enrichment score of 0.402 (FDR q-value = 0).
Protein disorder measurement
The protein disorder was measured using the GlobPlot online tool v2.3 [52, 53]. The disorder value for a protein was determined by a summation of the lengths of the disordered regions determined by the tool.
Comparison with Tc1 cell line
The results of our overexpression experiments were collectively and individually compared with the Tc1 expression data. The MAS4 pre-processed Tc1 data were retrieved from Array Express [ArrayExpress:E-MEXP-654] and subsequently processed according to the same canonical statistical analysis (Cyber-t plus FDR correction; FDR < 5%) as our expression data, yielding a total of 284 significant genes (FDR < 0.05). Since the Tc1 dataset was obtained with a different chipset from ours (MG_U74Av2), we first converted the probesets into their 430_2 equivalents using the Affymetrix 'best match' conversion table; the result of the conversion yielded 241 genes. The probesets selected for each comparison were those that were found to be significant in both the Tc1 and the specific overexpression experiment; the composition of the individual lists is reported in Additional file 16. The total list used for Figure 4 was obtained by merging the individual lists and removing duplicate genes by keeping the maximum in absolute value and discarding the others, yielding 168 genes. The scatter plots were obtained by plotting the logarithm of the Tc1 fold change (ratio of treated versus untreated cell line) on the x axis, and the logarithm of the overexpressed gene on the y axis. The regression line coefficients were obtained using an algorithm computing a non-centered version of the correlation coefficient (the xcorr Matlab function) for the individual plots, and a standard A = YX-1 algorithm for the collective plot (the two algorithms are interchangeable). The P-value for the regression coefficients was computed using a Student's t distribution for a transformation of the correlation. A P-value indicating the probability of obtaining the shown ratio of same-sign over total dots purely by chance was computed as follows. A set of n (x, y) pairs was created by randomly extracting x from the list of Tc1 log ratio values and y from the list of current gene values, where n is the number of dots in the graph; 100,000 such sets were created (1 million in the case of Aire), and the percentage of sets for which x × y > 0 was true for at least k out of the n pairs was noted and taken as P-value, where k is the number of dots in the graph having same-sign coordinates.
Large-gel two-dimensional protein electrophoresis
The total protein extraction from mES cells was carried out using our standard protocol . Protein (70 μg) was separated in each 2DGE run. Transgenic and parental cell lines were always run in parallel. The proteomic analysis was carried out on two Runx1 overexpressing clones (E6 and E7) out of the three clones (E6, E7 and F3) used for the transcriptome analysis (Additional file 3). Three technical repeats were performed for each clone. Overall, 12 two-dimensional gels were run for each Runx1 overexpressing clone: 6 replicates for the non-induced state and 6 replicates for the induced state (48 hours). All of the above samples were always run simultaneously in the same electrophoresis chamber to ensure gel pattern comparability. The protein expression alterations upon Runx1 overexpression were calculated by the ratio of the t48 hours mean to the t0 hours mean, using the averaged values across six gels (three technical replicates of each biological replicate). The statistic significance was accessed by student's t-test, with P < 0.05, and in addition, only if there is an expression alteration greater than 20% as described in . Silver staining protocol was employed to visualize protein spots . Computer-assisted gel evaluation was performed (Delta2D v3.4, Decodon, Greifswald Germany). Briefly, 2DGE gels were scanned at high resolution (600 dpi; TMA 1600, Microtek, Willich, Germany). Corresponding gel images were first warped using 'exact mode' (manual vector setting combined with automatic warping). A fusion gel image was subsequently generated using 'union mode', which is a weighted arithmetic mean across the entire gel series. Spot detection was carried out on this fusion image automatically, followed by manual spot editing. Subsequently, spots were transferred from fusion image to all gels. The signal intensities (volume of each spot) were computed as a weighted sum of all pixel intensities of each protein spot. Percent volume of spot intensities calculated as a fraction of the total spot volume of the parent gel was used for quantitative analysis of protein expression level. Normalized values after local background extraction were subsequently exported from Delta2D in spreadsheet format for statistical analysis. Student's t-test was carried out for control versus induced cell lines to access statistical significance of the expression differences (pair-wise, two-sided). P < 0.05 was used as statistical significance threshold. To reduce the influence of data noise, only protein expression changes over 20% compared to control were retained for further analysis. Additional file 22 shows the raw data of the proteomic analysis by 2DGE following the overexpression of Runx1. The detailed spot quantification data, in the form of relative volume data of each spot on each individual 2DGE gel, are also provided in this table. 2DGE gel image data have now been submitted to the World-2DPAGE Repository of the ExPASy Proteomics Server [2DPAGE:0021] for public access .
Mass spectrometric protein identification
For protein identification by mass spectrometry, high resolution 2DGE gels were stained using a mass spectrometry compatible silver staining protocol . Protein spots of interest were excised and subjected to in-gel trypsin digestion without reduction and alkylation. Tryptic fragments were analyzed using a LCQ Deca XP nano HPLC/ESI ion trap mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) as described previously . For database-assisted protein identification, monoisotopic mass values of peptides were searched against NCBInr (version 20061206, taxonomy Mus musculus), allowing one missed cleavage. Peptide mass tolerance and fragment mass tolerance were set at 0.8 Dalton. Oxidation of methionine and arylamide adducts on cysteine (propionaide) were considered as variable peptide modifications. Criteria for positive identification of proteins were set according to the scoring algorithm delineated in Mascot (Matrix Science, London, UK) , with an individual ion score cut-off threshold corresponding to P < 0.05.