The roles of the reprogramming factors Oct4, Sox2 and Klf4 in resetting the somatic cell epigenome during induced pluripotent stem cell generation
© BioMed Central Ltd 2012
Published: 22 October 2012
Skip to main content
© BioMed Central Ltd 2012
Published: 22 October 2012
Somatic cell reprogramming to induced pluripotent stem (iPS) cells by defined factors is a form of engineered reverse development carried out in vitro. Recent investigation has begun to elucidate the molecular mechanisms whereby these factors function to reset the epigenome.
Current reprogramming technology, pioneered by Takahashi and Yamanaka , was built on several seminal advances in the field of developmental biology. First, nuclear transfer experiments demonstrated that a somatic cell nucleus could be epigenetically reset to an early developmental state . Second, cell culture conditions were developed that allowed for the isolation and culture of pluripotent cells, termed embryonic stem (ES) cells, from the inner cell mass of the human and mouse blastocyst [3, 4]. Finally, study of these cells and of early embryonic development led to the identification of factors that were ultimately able to reprogram mouse embryonic fibroblasts (MEFs) to the iPS cell state when ectopically expressed, albeit at low frequency .
Several groups rapidly followed up on the initial generation of iPS cells and demonstrated that these cells, in their ideal state, are functionally equivalent to ES cells in their ability to contribute to healthy adult mice and their offspring, in addition to forming teratomas when injected into athymic mice [5–10]. In accordance with these results, the gene expression and chromatin states of iPS cells were also found to be strikingly similar to their ES cell counterparts, although subtle differences remain [10–12]. Tremendous innovation has occurred in the method of factor delivery and the type of somatic cells being reprogrammed. Initially, reprogramming factors were expressed from retroviral transgenes integrated into the genome. Subsequent advances have eliminated the requirement for genomic insertion and viral infection altogether (reviewed in ). Additionally, iPS cells have been generated from individuals with specific genetic lesions that can be used to model human diseases (reviewed in ). However, despite all of these advances, much remains to be learned about the reprogramming process itself. We believe that the MEF reprogramming paradigm still holds the most promise for future studies due to the ease of obtaining primary cells that are genetically tractable and easy to expand and reprogram, even though we acknowledge that additional lessons may be learned from the use of non-mesenchymal cells, such as hepatocytes or neural cells. The next frontier for the reprogramming field will be a complete mechanistic understanding of how the factors cooperate to reshape the epigenome and gene expression profile of the somatic cell.
The frequency with which somatic cells convert to iPS cells is typically below 1%. Therefore, much effort has gone into improving reprogramming. Several transcription factors normally expressed in the early stages of embryonic development can enhance reprogramming when added ectopically to O, S and K treated MEFs. These include Glis1, Sall4 and Nanog [19–22]. This class of enhancer factors likely acts late in the reprogramming process to establish and stabilize the pluripotency transcription network. In contrast to c-Myc, Glis1 added to O, S and K enhances the generation of iPS cell colonies without producing Nanog-negative, partially reprogrammed colonies . Remarkably, adding Glis1 and c-Myc together with O, S and K further enhances iPS cell colony formation without the presence of Nanog-negative colonies, suggesting that Glis1 is able to coerce them to the fully reprogrammed state. Forcing Nanog overexpression in partially reprogrammed cells leads to their conversion to iPS cells, demonstrating its late-stage reprogramming activity [22, 23].
The ability of cells to pass through the cell cycle has also been shown to be an important determinant of reprogramming efficiency. Knockdown or gene deletion of p53, p21 or proteins expressed from the Ink4/Arf locus allows cells undergoing reprogramming to avoid the activation of cell cycle checkpoints and cellular senescence, leading to greater iPS cell formation [21, 24–27]. Consequently, it is likely that any manipulation that accelerates the cell cycle would enhance reprogramming. Thus, reprogramming cultures should be monitored for alterations in their proliferation rate to determine whether the action of an enhancer factor can be attributed to changes in the cell cycle (Figure 1a).
In summary, the induction of pluripotency by O, S and K is a multistep progression whose efficiency can be boosted by enhancer factors. Even though additional factors can positively influence reprogramming, the efficiency of reprogramming is typically still very low. The list of factors discussed above is a brief overview and is by no means exhaustive. Enhancer factors are not exclusively proteins and may consist of any manipulation, including small molecules, long non-coding RNAs and microRNAs, that improves reprogramming [28, 29]. Their addition at different stages of the reprogramming process, the generation of partially reprogrammed cells, and the conversion of these cells to the fully reprogrammed state allows one to assay for enhancers of the early and late stages of reprogramming. It will be important to identify the subset of genes whose expression is changed by the introduction of each enhancer factor. Do these genes work alongside the core gene expression changes conferred by O, S and K, or do they simply amplify the magnitude and kinetics of these changes? Also, do known enhancer factors share common mechanisms of action?
Replacement factors possess the unique ability to substitute for O, S or K in reprogramming (Figure 1b). Esrrb, an orphan nuclear receptor that is expressed highly in ES cells, has been reported to replace Klf4 . Additionally, p53 knockdown has been shown to permit reprogramming in the absence of Klf4 . High-throughput screens have been used successfully to identify small molecule replacement factors. Treatment of cells with kenpaullone allows reprogramming to occur without Klf4, albeit with slightly lower efficiency , and several distinct classes of small molecules contribute to iPS cell generation in the absence of Sox2 [33–35]. Reprogramming enhancer and replacement factors are not necessarily mutually exclusive. Nr5a2, for instance, is capable of both enhancing reprogramming and replacing Oct4 . In the human reprogramming system, Lin28 and Nanog, mentioned above as enhancer factors, combine to replace Klf4 .
Replacement factors, despite their substantial molecular and functional divergence, may provide important insights into the mechanism whereby O, S and K function in reprogramming. Future work will demonstrate whether these factors regulate the same key genes and pathways as the reprogramming factors that they replace or whether they help achieve the iPS cell state via different means.
The first of these phases includes downregulation of lineage-specific genes and activation of a genetic program that radically alters cell morphology . This change, known as mesenchymal-to-epithelial transition (MET), is activated by BMP/Smad signaling and inhibited by activation of the TGF-β pathway [34, 38, 40]. The difference in morphology that results from MET is not simply cosmetic. For example, knockdown of Cdh1, which encodes the epithelial cell adhesion protein E-cadherin, significantly reduces reprogramming efficiency . Additionally, reduction in cell size has been shown to be an important early event that occurs in cells that go on to reach the pluripotent state .
The intermediates generated in a reprogramming culture do not appear to be stable when factor expression is turned off before pluripotency is achieved [38, 42, 43]. In this instance, cells revert back to a MEF-like gene expression pattern. In agreement with this notion, stable reprogramming intermediates isolated in the form of pre-iPS cells with an ES-cell-like morphology retain high levels of ectopic O, S, K and c-Myc [11, 12]. These cells have successfully downregulated fibroblast genes and initiated MET, but have not activated the self-reinforcing network of transcription that characterizes the ES/iPS state [11, 12, 44, 45].
Fully reprogrammed cells arise with low frequency in reprogramming cultures. These cells exhibit indefinite self-renewal and possess the capacity to differentiate into any of the cell types that make up the developing organism. These unique properties are governed by a complex transcriptional program involving many transcription factors, including the reprogramming factors O, S and K, now expressed from their endogenous loci, and additional genes such as Nanog, Esrrb, Smad family members and Stat family members [44, 45]. Transcription factors within the pluripotency network appear to work cooperatively to regulate genes. Genome-wide chromatin immunoprecipitation (ChIP) experiments demonstrate co-binding among these factors at levels well beyond what would be expected by chance [12, 44, 45]. Additionally, the presence of multiple factors at a given locus is associated with increased levels of ES/iPS cell-specific gene expression [12, 44, 45].
In ES cells, which are viewed as a proxy for iPS cells due to their high level of functional similarity, knockdown of any one of a number of transcription factors leads to loss of the pluripotent state, indicating the interconnected nature of the transcriptional network . However, one factor - Nanog - seems to be of special importance. Overproduction of Nanog was able to rescue several of the aforementioned loss-of-function effects and allow ES cells to maintain pluripotency in the absence of the growth factor LIF [46–48]. Furthermore, reprogramming of Nanog-deficient cells proceeds to a partially reprogrammed state that cannot transition to the iPS cell state due to impaired upregulation of the pluripotency network [22, 23]. These data illustrate the central role of Nanog in the establishment and maintenance of pluripotency and are consistent with its role as a late-stage enhancer of reprogramming.
Now that transcription factors within the pluripotency network have been largely identified, future research can determine their relative importance by performing similar gain-of-function and loss-of-function assays to those described above involving Nanog. Are all pluripotency-associated factors capable of acting as enhancers of reprogramming? Does their abrogation block reprogramming? Why or why not?
In addition to the changes in specific gene programs, reprogramming fundamentally alters the cell in several important ways. For instance, mouse ES/iPS cells have an altered cell cycle with a shortened G1 phase . Thus, reprogrammed cells have a reduced doubling time, and a greater fraction of these cells reside in the later phases of the cell cycle . In order to protect genomic integrity during early development, ES/iPS cells have an enhanced capacity for DNA repair [50, 51]. Pluripotent cells also have an increased nuclear to cytoplasmic ratio when compared with differentiated cells, as shown by electron microscopy .
In accordance with the reduction in membrane surface area and secretory function relative to MEFs, iPS cells generally express genes whose products function outside of the nucleus at comparatively lower levels. Significantly enriched gene ontology (GO) terms within the list of genes whose expression is reduced at least twofold from MEFs to iPS cells include: Golgi apparatus, endoplasmic reticulum and extracellular matrix (Figure 2a). Conversely, genes whose expression is up at least twofold in iPS cells relative to MEFs act primarily within the nucleus and are enriched for GO terms such as nuclear lumen, chromosome and chromatin (Figure 2a).
One important class of nuclear proteins whose gene expression is increased dramatically in ES/iPS cells relative to MEFs is chromatin-modifying complexes (Figure 2b) . These molecular machines modulate gene expression partly by covalent and non-covalent modification of nucleosomes. The expression levels of physically associated subunits within these complexes are largely coordinately regulated during reprogramming. For example, transcripts encoding the components of the PRC2 polycomb complex, responsible for H3K27me3, are highly upregulated as cells progress to the pluripotent state (Figure 2b). The DNA methyltransferases, which are not stably associated, also experience similar increases in their expression as reprogramming proceeds (Figure 2b). On the other hand, the transcription factor IID (TFIID) and mixed-lineage leukemia (MLL)/Set complexes are more moderately upregulated as a whole, yet they contain highly upregulated individual subunits, which play important roles in pluripotency and reprogramming (Figure 2b; Taf7, Taf7l and Taf5 of TFIID; Dpy30 and Wdr5 of MLL/Set) [54–56]. Expression switches within chromatin-modifying complexes may affect the induction of pluripotency. In agreement with this notion, Smarcc1 (BAF155) replaces Smarcc2 (BAF170) in the specific form of the BAF complex expressed in pluripotent cells and is critical for their self-renewal (Figure 2b) .
The presence of increased levels of chromatin-modifying complexes in ES/iPS cells may serve one of two purposes. First, these proteins may contribute to the maintenance of the self-renewing, undifferentiated state. Examples of this class, where loss-of-function disrupts self-renewal, include Smarca4 (Brg1), Chd1 and Wdr5 [54, 57, 58]. Second, while a given protein may not be required for normal growth of ES/iPS cells, its presence may be required for the proper execution of subsequent developmental events. Thus, a loss-of-function phenotype will only be detected upon differentiation, as is seen for PRC2, G9a and TAF3, and the DNA methyltransferases Dnmt1, Dnmt3a and Dnmt3b [59–63].
Epigenetic changes during reprogramming, most frequently seen in the posttranslational modification status of histone tails, are likely to be both cause and consequence of the previously mentioned changes in gene expression. Differences in H3K4me2 and H3K27me3 are detected rapidly upon reprogramming factor induction and often at times precede transcriptional upregulation of the underlying loci . Shifts in the balance of active and inactive chromatin marks at proximal gene regulatory elements are highly correlated with transcriptional changes during reprogramming. ChIP experiments in MEFs and iPS cells demonstrate that the promoter regions of many genes with the greatest expression increase in the transition from MEFs to iPS cells lose H3K27me3 and gain H3K4me3 [10, 12]. The low efficiency of reprogramming makes it difficult to study the chromatin state of reprogramming intermediates with population studies such as ChIP, particularly towards the end of the process where the majority of cells have not progressed down the reprogramming path. Pre-iPS cells, which are a clonal population of cells expanded from Nanog-negative colonies with an ES-cell-like morphology, are thought to represent a relatively homogeneous late reprogramming state amenable to ChIP [11, 12, 22, 33]. Similar to what has been observed regarding changes in gene expression, the resetting of chromatin marks does not appear to occur all at once because pre-iPS cells display an intermediate pattern of a subset of chromatin modifications that lies between the MEF and iPS states, both globally and near transcription start sites [12, 64].
High-throughput sequencing coupled with ChIP has allowed for the identification of putative distal regulatory elements based on combinations of chromatin marks. These 'enhancer' regions have been mainly defined by the presence of H3K4me1 and H3K4me2 at sites that lie at a distance from transcription start sites, which are frequently marked by H3K4me3 [39, 65, 66]. Chromatin at these distal sites is reset to an ES-cell-like state over the course of reprogramming [39, 65]. In addition to promoting the proper expression of pluripotency-related genes, these sites may contribute to the developmental potential of pluripotent cells by maintaining a poised state that allows for the upregulation of lineage-specific genes in response to the appropriate signals [65, 66]. Future studies that analyze more histone marks and incorporate machine learning techniques will help to better characterize these regions as well as other important chromatin states in cells at different stages of reprogramming, which will require the isolation or at least enrichment of cells that will undergo faithful reprogramming.
Over the course of reprogramming, cells experience dramatic global increases in a variety of active histone acetylation and methylation marks, while H3K27me3 levels remain unchanged . The majority of these changes occur during the late stages of reprogramming - between the pre-iPS and fully reprogrammed states . Additionally, the number of heterochromatin foci per cell, as marked by HP1α (heterochromatin protein 1α), is reduced in iPS cells when compared with MEFs . In accordance with this observation, electron spectroscopic imaging demonstrates that lineage-committed cells have compacted blocks of chromatin near the nuclear envelope that are not seen in the pluripotent state [67, 68]. The specific increase in active chromatin is somewhat surprising given that the expression levels of chromatin-modifying complexes associated with both the deposition of active and inactive marks increase as reprogramming proceeds. Overall, changes in chromatin structure and histone marks coupled with increased transcription of repeat regions indicate that the pluripotent state may possess a unique, open chromatin architecture .
Another epigenetic modification, DNA methylation, plays an important role in silencing key pluripotency genes, including Oct4 and Nanog, as cells undergo differentiation . The promoter regions of pluripotency genes are demethylated in ES cells but strongly methylated in fibroblasts . The lack of DNA methylation within these promoters in faithfully reprogrammed iPS cells strongly suggests that during reprogramming, this repressive mark must be erased in order to allow for the establishment of induced pluripotency [5, 9–11]. Bisulfite sequencing suggests that removal of DNA methylation from pluripotency loci is a late event that can be placed between the pre-iPS and iPS cell states in the reprogramming continuum . Furthermore, reprogramming efficiency is increased in response to the DNA methyltransferase inhibitor 5-aza-cytidine . This enhancement is greatest when it is added in a brief window towards the end of the reprogramming process, thus reinforcing the importance of the late-stage removal of DNA methylation .
Several other components of the chromatin-modifying machinery have also been shown to affect reprogramming efficiency. Knockdown of LSD1, as well as chemical inhibition of histone deacetylases, leads to enhanced reprogramming . Also, overproduction of the histone demethylases Jhdm1a and Jhdm1b/Kdm2b, and the SWI/SNF complex components Brg1 and Baf155, increases the efficiency of iPS cell generation [71, 72]. In contrast, knockdown of Chd1 and Wdr5 inhibits reprogramming in a cell-proliferation-independent manner [54, 58]. Knockdown of candidate chromatin-modifying proteins during human reprogramming identified the histone methyltransferases DOT1L and SUV39H1, and members of the PRC1 and PRC2 polycomb complexes as modulators of reprogramming activity . Reducing the levels of DOT1L and SUV39H1 led to enhanced reprogramming, while reductions in Polycomb complex subunits (BMI1, RING1, SUZ12, EZH2 and EED) resulted in decreased reprogramming efficiency . Recently, Utx/Kdm6a was also shown to be critical for several types of reprogramming, including iPS cell generation from MEFs . The action of this protein is important to remove H3K27me3 from repressed genes in MEFs and prevent the acquisition of H3K27me3 by pluripotency genes as reprogramming proceeds . Finally, Parp1 and Tet2, which both contribute to chromatin modification of the silenced Nanog locus early in reprogramming, are each required for iPS cell formation .
Through the results mentioned above, several general themes have emerged. First, heterochromatin-associated marks, namely histone deacetylation, H3K9me3 and DNA methylation, represent a barrier whose removal leads to increased reprogramming efficiency. Second, proteins that contribute to an active chromatin environment by writing or reading the H3K4me3 mark are important for achieving pluripotency. Finally, removal of marks associated with transcriptional elongation (H3K36me2/3 and H3K79me2) surprisingly enhances reprogramming. Mechanistically, removal of H3K36me2/3 by Jhdm1b, which is stimulated by ascorbic acid, has been shown to overcome cell senescence by repressing the Ink4/Arf locus . Inhibition of DOT1L leads to reduced H3K79me2 at mesenchymal genes, thereby facilitating their downregulation .
From comparing their binding profiles between pre-iPS cells and iPS cells , it is thought that O, S and K vary considerably in their DNA-binding patterns over the course of reprogramming. Eventually, however, they adopt an ES-cell-like binding configuration upon reaching the iPS cell state . Genes that exhibit the largest expression changes during reprogramming are frequently bound by all three reprogramming factors in ES and iPS cells . Increased factor binding at gene promoters in iPS cells is associated with higher levels of transcription, indicating that O, S and K work together to regulate genes primarily as transcriptional activators as described for ES cells [11, 12, 44, 45].
Reprogramming factors must navigate a dynamic chromatin landscape at the various stages of iPS cell generation. While it is plausible that DNA binding differences may be due in part to changes in local chromatin accessibility, O, S and K do not appear to be blocked by the presence of the repressive mark H3K27me3, as promoters enriched for this chromatin mark also can be bound by O, S and K [12, 45, 77]. In contrast, binding of overproduced OCT4 to the enhancers of silenced genes is associated with nucleosome depletion and the absence of DNA methylation, suggesting that nucleosomes and DNA methylation may comprise a physical barrier that inhibits factor binding [78, 79]. Future work may identify additional chromatin signatures that enable or inhibit reprogramming factor binding. Mapping of O, S and K binding in the early stages of reprogramming should reveal chromatin states and nucleosome positions that allow the factors to access target genes.
While there is considerable overlap between the ChIP profiles of all three factors in ES and iPS cells, Oct4 and Sox2 are found together most frequently, whereas Klf4 binds to approximately twice as many sites genome-wide as either of the other factors [12, 44, 45]. Oct4 and Sox2 can bind cooperatively to composite sox-oct motifs that are frequently found within the regulatory elements of important pluripotency genes [80–82]. These genes include those that encode Oct4 and Sox2 themselves, indicating that these two factors act within autoregulatory positive feedback loops that help to reinforce the pluripotent state [80, 81].
Reprogramming factors can sometimes be functionally replaced by paralogs within their respective families (Figure 3c). Comparison of O, S and K with their paralogs grouped in terms of functional redundancy may provide insight into their mechanisms of action during reprogramming. The binding pattern in ES cells and DNA-binding specificity in vitro measured for Klf4 overlaps substantially with Klf2 and Klf5 . Only triple knockdown of all three of these proteins together is sufficient to induce the loss of pluripotency . However, each of these factors may also play more nuanced roles in maintaining self-renewal of pluripotent cells . During reprogramming, Klf2, Klf5 and another close family member, Klf1, have been reported to replace Klf4 with varying degrees of efficiency (Figure 3c) . Sox2, on the other hand, can be replaced by several diverse family members from across its phylogenetic tree, but not others (Figure 3c) . Interestingly, reprogramming activity can be activated in Sox17, a reprogramming-incompetent paralog, by point mutation of a single glutamate within helix 3 of its HMG domain to the corresponding lysine residue present in Sox2 . This change enables cooperative binding with Oct4 at the canonical subset of sox-oct motifs . Thus, the physical association between Sox2 and Oct4 when bound to DNA is likely to be critical for the induction of pluripotency. Oct4 cannot be replaced by Oct1 or Oct6 in reprogramming, suggesting that it may possess divergent activity not seen in other family members (Figure 3c) . This difference in reprogramming activity among the different Oct factors may not be simply due to differences in DNA-binding preference. Oct1 and Oct4 both bind cooperatively to sox-oct elements in the Fgf4 enhancer, but only Oct4 promotes transcriptional activation of the gene due to its ability to form an active ternary complex with Sox2 [82, 90].
Residues that lie outside of the highly conserved DNA-binding domains in O, S and K are also important for their ability to activate transcription and mediate reprogramming (Figure 3a). Klf4 possesses an acidic transactivation domain (TAD) that interacts non-covalently with SUMO-1 . Oct4 contains TADs both amino-terminal and carboxy-terminal of its DNA-binding domains, while Sox2 contains several regions with transactivation activity carboxy-terminal of its HMG box (Figure 3a) . Since these regions were characterized using assays from different developmental contexts, future work is needed to determine which of these TADs function in reprogramming and to identify the co-activators that act through these domains.
Reprogramming efficiency can be enhanced by fusing TADs from other proteins to the reprogramming factors. Addition of a TAD from VP16 to Oct4 or Sox2 increases reprogramming efficiency [93, 94]. Fusion of the MyoD TAD to either terminus of Oct4 accelerates and enhances the induction of pluripotency . This enhancement activity is highly specific, since a variety of other known TADs were unable to accomplish the same feat . Additionally, the MyoD TAD was unable to replace the transactivation regions within the Oct4 protein, indicating that these TADs are functionally distinct . Collectively, these results imply that the Oct4 TADs make contact with reprogramming-specific cofactors that cannot be recruited by other well-studied TADs. However, the presence of these TADs fused to the full-length protein likely brings in additional co-activators that enhance the induction of pluripotency. Further investigation is needed to elucidate the exact mechanisms through which these TADs cooperate with the reprogramming factors to enhance reprogramming.
The reprogramming factors are likely to effect changes in transcription through interactions between their TADs and protein cofactors that recruit the RNA polymerase machinery or modify the local chromatin structure. Several of these cofactors have been identified thus far. For instance, Sox2 and Oct4 have been reported to bind to a complex of XPC, RAD23B and CENT2 to mediate the transactivation of Nanog . Loss-of-function experiments demonstrated that these proteins are important for ES cell pluripotency and somatic cell reprogramming . Additionally, several proteomic studies have identified a multitude of candidate O,S,K-interacting proteins that warrant further study [97–100].
Reprogramming factor activity can also be modulated by posttranslational modifications (PTMs). Oct4 phosphorylation at S229 within the POU homeodomain reduces its transactivation activity, possibly by impairing DNA binding as a result of the disruption of a hydrogen bond with the DNA backbone [84, 101]. Reprogramming activity is completely abolished in a phosphomimetic mutant (S229D) protein . Additionally, Oct4 can be O-GlcNAcylated at T228 . Mutation of this residue to alanine substantially reduces reprogramming activity, indicating that this PTM may be important for the induction of pluripotency . Given these results, it will be important to examine the effects of other known PTMs within O, S and K during reprogramming.
Incredibly, somatic cells can revert to the pluripotent state through the forced expression of defined reprogramming factors. The identification and study of these factors has helped to provide insight into the mechanism of induced pluripotency. Conversely, the reprogramming process serves as a robust functional assay that allows us to advance our understanding of Oct4, Sox2, Klf4 and other essential regulators. Much remains to be learned regarding the logic of where these factors bind in the genome and the transcriptional changes that they then induce at these sites. This is not a trivial task given the heterogeneity and inefficiency of the reprogramming process. In a broad sense, knowledge gained through the study of somatic cell reprogramming may be applicable to other gene regulatory events that transform the epigenome and drive embryonic development.
induced pluripotent stem
mouse embryonic fibroblast
transforming growth factor
transcription factor IID.
KP is supported by grants from the NIH and CIRM, and the Broad Stem Cell Center at UCLA. RS is supported by the Medical Scientist Training Program training grant from the NIH.