Single-cell gene expression analysis reveals heterogeneity within phenotypically defined human MEP
To isolate MEP, we adapted a previously validated immunophenotyping strategy in which MEP are distinguished from the other lineage-negative (Lin-) CD34 + CD38+ hematopoietic progenitors, CMP and GMP, by the absence of CD123 and CD45RA (Fig. 2a, Additional file 1: Figure S1A) [24]. To test the hypothesis that cellular heterogeneity exists within the MEP compartment, including cells primed for megakaryocyte versus erythroid differentiation, we analyzed 489 Lin- CD34 + CD38 + CD123- CD45RA- human MEP cells from three healthy donors. Individual cells were isolated by index FACS sorting using a panel of nine cell surface markers (Additional file 1: Figure S1A). Single-cell gene expression profiling was performed by multiplex RT-PCR using a customized panel of 87 genes, enabling correlation of individual cell surface immunophenotype and gene expression profiles. This gene set included genes predicted to be differentially regulated during erythroid and megakaryocyte differentiation according to published RNA-Seq datasets from bulk-sorted, human MEP and mature erythroid and megakaryocyte populations [25]; cell surface antigens known to be expressed during erythroid and megakaryocytic differentiation [8, 25]; and three housekeeping genes. Principal component analysis (PCA) revealed that MEP were clearly segregated into two distinct subpopulations by principal component (PC) 1 (Fig. 2b), which accounted for 10.72 % of the variance in gene expression between cells (Fig. 2c and Additional file 1: Figure S1B). No important plate or sample effect was observed (Additional file 1: Figure S1C–F).
CD71 and CD41 are early identifiers of erythroid and megakaryocyte progenitors, respectively [17, 18, 26]. CD42 (glycoprotein 1b) is expressed later during megakaryocyte differentiation and has been associated with unipotent megakaryopoietic activity in mouse models [26]. These antigens were therefore included in the immunophenotyping panel used to isolate the original cells for gene expression profiling and the intensity of surface expression (mean fluorescence intensity [MFI]) was superimposed on the PCA. This indicated that the two cellular subsets identified by PCA (Population 1 and 2) were distinguishable by their surface expression of CD34, CD38, and CD71 (Fig. 2d). Population 1 (left) contained cells with higher CD34 and lower CD38 expression, suggesting a more immature phenotype (Fig. 2d), while Population 2 (right) contained cells with higher CD71 expression (Fig. 2d). Infrequent cells with distinctly higher expression of CD41 and CD42 were notable which did not clearly cluster with either population by PC1 (Fig. 2e) although the CD41-high cells separated more distinctly in PCs 3 and 4 (Fig. 2e). We reasoned that these cells might represent megakaryocyte-primed MEP that do not form a separate cluster on the PCA by PC1 due to their relatively low frequency.
We next directly analyzed the cell surface expression of CD71, CD41, and CD42 within Lin- CD34 + CD38 + CD123- CD45RA- MEP of peripheral blood CD34+ cells from 14 healthy, G-CSF-treated donors (Fig. 2f, g). In keeping with the PCA, two subpopulations could be distinguished by their differential expression of CD71 and a third by the expression of CD41: (1) CD71-41- (43.6 ± 4.8 % of total MEP); (2) CD71 + 41- (37.4 ± 3.6 %); and (3) CD71 + 41+, which was significantly less frequent than the other two populations (5.1 ± 0.6 %, Fig. 2f, P <0.0001). CD42 expression was restricted to ~1/5 of CD71 + 41 + MEP cells, or ~1 % of total MEP (Fig. 2g).
We then explored the possibility that the CD71 + 41- and CD71 + 41 + MEP subfractions might represent erythroid and megakaryocyte-primed populations, respectively. Due to the rarity of the CD71 + 41+ MEP cells, we selectively analyzed an additional 192 CD71 + CD41+ MEP cells from the three same donors by index-FACS sorting for gene expression profiling. When all 681 analyzable cells (489 unselected MEP plus 192 71 + 41+ MEP) were studied, PCA demonstrated that 71 + 41+ MEP constituted a distinct third population (Fig. 3a), allowing us to identify three distinct populations on the basis of PCs 1 and 2 for each individual cell (Fig. 3b). Cells expressing highest levels of surface CD42 by FACS appeared at the apex of Population 3 in the PCA (Additional file 1: Figure S2A).
Depicting the data by non-linear dimensionality reduction (t-distributed stochastic neighbor embedding, t-SNE) analysis [27–30] also demonstrated three subpopulations, supporting the PCA (Additional file 1: Figure S2B). To determine whether gene choice was a strong determinant of the three subpopulation substructure apparent on PCA and t-SNE, random subsets of genes were selected to perform PCA and the proportion of cells that were congruently assigned to each original population was ascertained (Additional file 1: Figure S2C). This demonstrated that on average 75 % of the cells are assigned equivalently with as few as 25 genes. Furthermore, to confirm that the PCA was not substantially biased by drop-out events, Zero Inflated Factor Analysis (ZIFA) was performed (Additional file 1: Figure S2D) [31]. In accordance with PCA and t-SNE, ZIFA also segregated the MEP cells into three populations (Additional file 1: Figure S2D).
Identifying the 18 most highly weighted genes in PC1 and PC2 (Fig. 3c) and the heatmap of gene expression (Fig. 3d, Additional file 1: Figure S2E) revealed that the segregation of the three populations was driven by differential expression of megakaryocyte-associated and erythroid- associated genes. Hierarchical clustering of the gene expression profiles also supported the division of Lin- CD34 + CD38 + CD123- CD45RA- MEP into three subpopulations (Additional file 1: Figure S2F).
Three MEP subpopulations can be prospectively identified immunophenotypically by their differential expression of CD44, CD71, and CD41
To determine whether FACS could be used to identify the three MEP subpopulations that emerged from PCA of gene expression, we next determined the mean fluorescence intensity of antigens in our FACS panel for the original cells index-sorted for gene expression profiling (Fig. 4a). The three subpopulations of MEP identified by PCA could be distinguished with high sensitivity and specificity (specificity range of 0.81–0.91; sensitivity 0.67–0.90; Additional file 1: Figure S3A) using a combination of CD71 and CD41: (1) CD71-41-; (2) CD71 + 41-; and (3) CD71 + 41+. Further, although all of the single MEP cells had been sorted from the Lin- CD34 + CD38 + CD123- MEP gate (Fig. 2a, Additional file 1: Figure S1A), CD71-41- MEP (Population 1) had relatively higher CD34, lower CD38, and higher CD123 and CD45RA surface antigen expression (Fig. 4a), suggesting they might be positioned earlier in the HSC/progenitor hierarchy. Expression of the early erythroid/megakaryocyte marker CD36 was lowest in Populations 1 and 3 but did not discriminate clearly between the MEP populations, and CD42 expression was highest in Population 3 (Fig. 4a). The cell surface phenotypes showed highly significant correlation with mRNA levels of the same surface proteins (Additional file 1: Figure S3B). Taken together, these data indicate that Lin- CD34 + CD38 + CD123- CD45RA- MEP constitute a heterogeneous population of cells with at least three distinct subpopulations that can be distinguished by unique surface marker and transcript profiles.
Because Population 1 remained negatively defined among CD34 + 38+ hematopoietic progenitors (Lin- CD34 + CD38 + CD123- CD45RA-CD71- CD41-), we sought to determine whether our immunophenotyping strategy for this population could be further refined by including additional surface antigens from our gene expression profiling panel that were not part of the original FACS panel. CD44, an adhesion molecule expressed by MEP and early erythroid and megakaryocytic progenitors that is downregulated during their differentiation and maturation [32, 33] emerged as the most prominent positive identifier of Population 1 by gene expression, with a mean expression level fivefold higher than the other two populations (P <0.0001, Fig. 4b). Other erythroid/megakaryocytic surface antigen genes were either barely expressed in Population 1 (CD61) or were expressed at similar levels in Populations 1 and 3 (CD9) or in all three populations (CD105, CD47) (Fig. 4b). MPL expression was detectable in all three MEP subpopulations, in keeping with previous reports [14], indicating that MPL is unlikely to be a good candidate marker to differentiate between the three populations by immunophenotyping (Fig. 4b).
To confirm the utility of CD44 as a positive identifier of this population by immunophenotyping, CD44 was incorporated into our 10-fluorochrome panel. This allowed us to separate the MEP population immunophenotypically into CD44hiCD71- CD41- MEP (Fig. 4c), which had similar surface CD44 expression to CMP and GMP (Additional file 1: Figure S3C), and CD44modCD71+ MEP, which contained all of the CD71 + 41- and CD41+ MEP cells (Fig. 4c). These data confirmed that the differential expression patterns of CD44, CD71, and CD41 enable positive identification and prospective isolation of all three MEP subpopulations. To confirm that the addition of CD44 to the immunophenotyping panel defined the transcriptome-identified subpopulations, 100 cells were sorted from each of the three MEP populations as defined by CD44, CD71, and CD41 co-expression as shown in Fig. 4c, in triplicate from each of four donors. Multiplex RT-PCR analysis performed using the same panel of gene expression assays used for the single-cell transcriptional profiling confirmed that the cells purified according to this novel surface phenotype strategy also showed transcriptional profiles as seen in the original single-cell analyses (Additional file 1: Figure S3D and 3E).
Differential expression of key megakaryocyte and erythroid genes between the MEP subpopulations indicates a “Pre-MEP,” “E-MEP,” and “MK-MEP” transcriptional profile
Significant differences were observed between these three populations in the expression of key erythroid and megakaryocyte genes (Fig. 5a–c). A higher proportion of cells in Population 1 expressed CSF3R (the granulocyte-colony stimulating factor [G-CSF] receptor), FLT3/CD135, and SOCS3 than Populations 2 and 3 and expression of the key erythroid-megakaryocytic transcription factors GATA1 and GATA2 were significantly lower in this population (Fig. 5a) consistent with a less differentiated state. Expression of myeloperoxidase (MPO), a gene abundantly expressed by granulocytes, CMP and GMP [34], was undetectable in all but five of 681 cells in all three populations (Fig. 5a), confirming that contamination of the sorted populations with CMP or other myeloid cells/progenitors was negligible. Expression of genes encoding erythroid transcription factors and membrane proteins, e.g. KLF1, CD71, TMOD1, ANK1, and LEF1 was significantly higher in Population 2 (Figs. 3d and 5b), while Population 3 showed highest expression of megakaryocyte-associated proteins, including VWF, FLI1, NFIB, TGFβ, and LOX (Figs. 3d and 5c). Correlations of megakaryocytic (CD9, LOX, MPL, VWF, NFIB, CD61, TGFβ, FLI1) and erythroid (CD36, KLF1, LEF1, CNRIP1, TMOD1, MYB) gene sets and megakaryocyte-erythroid transcription factors (GATA1, GATA2, FOG1) in all cells suggested co-regulation of same-lineage and repression of alternate-lineage pathways (Fig. 5d). Moreover, we also found distinct erythroid and megakaryocytic gene co-expression patterns (within the same single cells) in Population 2 and 3, respectively (Fig. 5e and 5f). On the basis of these data, we defined Population 1 as “pre-MEP/CMP-like” (“Pre-MEP”), Population 2 as erythroid-primed MEP (“E-MEP”), and Population 3 as megakaryocyte-primed MEP (“MK-MEP”).
Single-cell differentiation assays demonstrate that the lineage bias suggested by transcriptional and cell surface profiles correspond to functional differences in lineage differentiation
To validate that the lineage bias suggested by transcriptional and cell surface profiles corresponded to functional differences in the ability of the cells to differentiate, we analyzed Pre-MEP, E-MEP, and MK-MEP in single-cell differentiation assays. Single cells from the three MEP populations were seeded by FACS according to the strategy shown in Fig. 4c into conventional colony-forming assays in semi-solid medium. Erythroid and myeloid colony-forming capacity was tested in methylcellulose assays, which support the growth of erythroid, myeloid, and to a lesser extent megakaryocytic colonies. Colonies were classified as myeloid or erythroid by visual inspection (Fig. 6a); indeterminate colonies were plucked for analysis of lineage-associated surface antigens by flow cytometry. There was a marked difference in colony output from the three MEP populations that matched their transcriptional profile (Fig. 6a). Over 90 % of colonies arising from single CD71 + 41- E-MEP cells were erythroid (BFU-E/CFU-E), compared to ~60 % of colonies arising from single CD71 + 41+ cells and ~30 % of CD44hi71- 41- MEP colonies (P <0.001, Fig. 6a). Wells seeded with CD71 + 41- E-MEP also had the highest colony-forming efficiency overall, with colonies detected in almost 60 % of wells, as compared to ~40 % of wells seeded with CD44hiCD71- 41-Pre-MEP and ~20 % of wells seeded with CD71 + 41 + MK-MEP (Additional file 1: Figure S4A). Myeloid colonies were very rarely observed in wells seeded with E-MEP and MK-MEP cells, while mixed granulocyte-erythroid-macrophage-megakaryocyte colonies (CFU-GEMM) and pure myeloid (granulocyte-macrophage, CFU-GM) colonies each constituted 25–30 % of total colonies derived from Pre-MEP (Fig. 6a). This demonstrated that E-MEP and MK-MEP were almost exclusively committed to erythroid-megakaryocytic differentiation. In contrast, Pre-MEP had maintained potential to generate myeloid colonies. Further, Pre-MEP are functionally distinct from CMP being markedly enriched for erythro/megakaryopoietic efficiency as compared to CMP (Additional file 1: Figure S4B), and almost all of the myeloid clonogenic output observed in unfractionated MEP is contained within this fraction. Surface CD44 expression of cells giving rise to myeloid colonies was significantly higher than those giving rise to erythroid colonies, confirming the utility of CD44 as a positive identifier of cells with a Pre-MEP phenotype (Additional file 1: Figure S4C). In contrast, there was no difference in CD123 expression between cells which gave rise to myeloid colonies versus those that gave rise to pure erythroid or erythroid/MK colonies (Additional file 1: Figure S4C).
To test megakaryocyte colony-forming potential, cells from the three MEP populations were sorted into a collagen-based medium that supports megakaryocyte and myeloid colonies (Megacult™). We observed that MK-MEP cells gave rise to fourfold more megakaryocyte colonies than the other subpopulations, in keeping with a megakaryocytic differentiation bias (P <0.001, Fig. 6b).
In semi-solid assays, the growth of either myeloid and erythroid colonies (methylcellulose) or myeloid and megakaryocyte colonies (Megacult™) is efficient, but mixed megakaryocyte-erythroid colonies are infrequent and single-cell megakaryocyte colony-forming assays are not possible due to low clonogenic efficiency. Therefore, to identify bipotent cells with the potential to differentiate into both erythroid and megakaryocytic cells, we utilized a specifically designed single-cell liquid culture system optimized to support differentiation of erythroid cells and megakaryocytes. Cells from each MEP fraction were individually seeded into each well of 96-well plates containing medium supplemented with the cytokines required for both erythroid and megakaryocytic differentiation (EPO, TPO, IL3, IL6, SCF) [35, 36]. Wells were inspected 6 days following seeding by light microscopy for the presence of characteristic erythroblasts and proplatelet-forming megakaryocytes (Fig. 6c). The cellular phenotype of the progeny derived from the single cells was identified by morphology and the expression of lineage markers as assessed by fluorescence microscopy and flow cytometry allowing us to identify megakaryocyte-only, erythroid-only, and mixed progeny (Figs. 6d, Additional file 1: Figure S4D).
We used this approach to analyze the three MEP subpopulations. In this liquid culture system, single E-MEP cells were significantly more proliferative than the other two MEP fractions, generating higher numbers of cells 6 days after seeding (Fig. 6e) and had the highest frequency of cells giving rise to populations of exclusively erythroid progeny (Fig. 6f). The highest frequency of pure populations of megakaryocytic cells occurred in cells seeded with the MK-MEP (Fig. 6f). Only a minority of single E-MEP and MK-MEP cells gave rise to “mixed” colonies containing both erythroid and megakaryocytic cells (Fig. 6f). By contrast, mixed colonies occurred in almost 50 % of wells seeded with cells from the Pre-MEP fraction (P <0.02, Fig. 6f), which was also able to give rise to both unipotent erythroid and unipotent megakaryocytic cells. Together, these functional data are consistent with our conclusions from the transcriptional profiles and support a definition of the cellular subfractions as: Pre-MEP (CD44hi71- 41-); E-MEP (CD71 + 41-); and MK-MEP (CD71 + 41+).
Monocle trajectory analysis and sequential cultures identify a novel megakaryocyte-committed progenitor population
Finally, we performed a monocle trajectory analysis [37] to obtain a pseudo-temporal ordering of single cells along their differentiation trajectory on the basis of their transcriptional profiles (Fig. 7a, Additional file 1: Figure S5A, B). Two separate trajectories were investigated, from Pre-MEP to E-MEP (Fig. 7a, left plot) and Pre-MEP to MK-MEP (Fig. 7a, right plot). Additional file 1: Figure S5A shows heatmaps illustrating how expression of selected genes changed with the pseudotime trajectories. Additional file 1: Figure S5B shows selected genes along the Pre-MEP to E-MEP and MK-MEP trajectories. This analysis illustrates downregulation of CD44 and CD34 together with upregulation of GATA1 and CD71 along both trajectories, in keeping with the more primitive phenotype of the Pre-MEP population, which retains myeloid potential. In contrast, a number of genes showed distinct erythroid or megakaryocyte-specific expression with progressive separation along each trajectory. For example, upregulation of CNRIP1, KLF1, and LEF1 occurred along the E-MEP trajectory and CD41, CD61, CD42, NF1B, and VWF along the MK-MEP trajectory (Additional file 1: Figure S5A, B). Notably, CD42 and VWF expression increased markedly along the MK-MEP trajectory and correlated with loss of erythroid gene expression such as KLF1 and CNRIP1 (Additional file 1: Figure S5B). As the CD42-positive cells also clustered at the apex of Population 3 in the PCA (Additional file 1: Figure S2A), we reasoned that CD42 surface expression might represent a marker of full commitment to the megakaryocyte lineage with associated loss of erythroid potential. To explore whether the expression of CD42, restricted to ~20 % of MK-MEP cells and <1 % of total MEP overall (Fig. 2g) was associated with definitive commitment to the megakaryocyte lineage, in vitro megakaryocyte liquid cultures were established from healthy donor Lin-CD34+ cells and defined megakaryocyte progenitor populations were isolated from day 4 cultures for secondary subcultures according to their surface CD71, CD41, and CD42 expression (Fig. 7b, Populations A, B, and C). In secondary cultures in TPO-based liquid culture, cell fractions A (CD71 + CD41- CD42-), B (CD71 + CD41 + CD42-), and C (CD71 + CD41 + CD42+) showed progressive megakaryocyte maturity by morphology and CD41CD42 co-expression after 3 and 7 further days of TPO stimulation (Fig. 7c). If switched to EPO-based medium and methylcellulose (without TPO) for secondary culture, Populations A and B gave rise to mature CD71hi GlyA+ erythroblasts and erythroid colonies, while Population C had no erythroid potential (Fig. 7c, right panel). This confirmed that both CD71 + 41- 42- and CD71 + 41 + 42- populations (Populations A and B, Fig. 7b) contained cells with both megakaryocytic and erythroid potential, while CD71mid41 + 42+ co-expression marked the first identifiable lineage-committed MKP with complete loss of erythroid potential (Population C, Fig. 7b). In keeping with this, CD71 + 41 + CD42+ cells, compared to CD71 + 41 + CD42- cells, demonstrated significantly higher expression of megakaryocyte genes (e.g. CD41, CD61, VWF, CLU, NF1B) and significantly lower expression of erythroid-associated genes (e.g. ANK1, CD71, MYB). MYB is a transcription factor that enhances erythroid differentiation at the expense of megakaryopoiesis [38], in keeping with commitment to the megakaryocyte lineage.