Dynamics of gene silencing during X inactivation using allele-specific RNA-seq

Background During early embryonic development, one of the two X chromosomes in mammalian female cells is inactivated to compensate for a potential imbalance in transcript levels with male cells, which contain a single X chromosome. Here, we use mouse female embryonic stem cells (ESCs) with non-random X chromosome inactivation (XCI) and polymorphic X chromosomes to study the dynamics of gene silencing over the inactive X chromosome by high-resolution allele-specific RNA-seq. Results Induction of XCI by differentiation of female ESCs shows that genes proximal to the X-inactivation center are silenced earlier than distal genes, while lowly expressed genes show faster XCI dynamics than highly expressed genes. The active X chromosome shows a minor but significant increase in gene activity during differentiation, resulting in complete dosage compensation in differentiated cell types. Genes escaping XCI show little or no silencing during early propagation of XCI. Allele-specific RNA-seq of neural progenitor cells generated from the female ESCs identifies three regions distal to the X-inactivation center that escape XCI. These regions, which stably escape during propagation and maintenance of XCI, coincide with topologically associating domains (TADs) as present in the female ESCs. Also, the previously characterized gene clusters escaping XCI in human fibroblasts correlate with TADs. Conclusions The gene silencing observed during XCI provides further insight in the establishment of the repressive complex formed by the inactive X chromosome. The association of escape regions with TADs, in mouse and human, suggests that TADs are the primary targets during propagation of XCI over the X chromosome. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0698-x) contains supplementary material, which is available to authorized users.

For comparaPve purposes, we included male 2i or serum ESCs and epiblast stem cells (EpiSCs) in this analysis. Male ESCs are included as a control for gene expression in the presence of a single acPve X chromosome. Female EpiSCs have undergone random XCI and stably maintain their Xi [82]. Male 2i ESCs include two 2i--adapted E14 ESC lines [44], as well as one 2i--adapted Rex--GFP ESC line [65]; male serum ESCs include E14 and two Rex--GFP ESC lines, all derived and maintained in serum--containing medium [44,65]. DerivaPon of male and female pluripotent EpiSCs from post--implantaPon epiblast cells (E6.5) has been described previously [66,83,84]. See "Materials and methods" for more informaPon about these cell lines.

(b) DistribuPon of gene expression in the corresponding cell lines as shown in (a). To be included in this analysis,
genes were required to show expression levels of RPKM > 0.5 in any of the cell lines (637 and 16073 genes on the X chromosome and on autosomes, respecPvely). In line with the presence of two acPve X chromosomes, female Tsix-stop ESCs (2i and serum) show higher expression of X--linked genes relaPve to male ESCs (2i and serum) (p < 0.05 for 2i female versus male ESCs [47]). Gene expression between male and female EpiSC is similar as both only contain a single acPve X chromosome.     [22]). Since the nucleoPde composiPon of the B6 reference genome is much closer to the 129 genome than to the Cast genome, this results in preferenPal mapping of 129--derived reads and therefore a considerable bias towards expression from the 129--derived genome. The GSNAP--based pipeline includes the alternaPve alleles of polymorphic sites between the 129-- and Cast--genome during mapping. This results in an unbiased assignment of reads with equal contribuPon of reads derived from 129 and Cast, respecPvely. (b) Standard deviaPon (SD) of allelic expression for autosomal genes (for which the allelic raPo is expected to be stable) over the five individual Pme points, as a funcPon of the total counts over the polymorphic sites over the Pme course. Per Pme point, the relaPve contribuPon of 129 and Cast to the expression of each autosomal gene (in percent) was calculated, aoer which the SD over the 5 Pme points was determined for each gene. This analysis shows that lower coverage of genes result in higher SDs and thus in less accurate measurements of allelic raPos. For the current Pme course, we included genes that showed a SD of <15 % corresponding to a total count of 160 (at least 80 from each allele) of reads over the polymorphic sites of a gene.    Fig. 4d).

Suppl Figure 9
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Supplementary Figure 9. Running--sum staPsPcs for the genes present within each of the four clusters versus all 259 genes included in the analysis based on the distance to the XIC. We ranked the genes according to their distance from the XIC. Then the enPre ranked list is used to assess how the genes of each of the four clusters are distributed across the ranked list. To do this, GSEA walks down the ranked list of genes, increasing a running--sum staPsPc when a gene belongs to the set and decreasing it when the gene does not. P values of this analysis are documented in Fig. 4c   -2 0 2 4 6 P lp 2 1 8 1 0 0 3 0 O 0 7 R ik E 3 c2 6 7 2 0 4 0 1 G 1 3 R ik 3 8 3 0 4 1 7 A 1 3 R ik A U 0 1 5 8 3 6 X is t Ts x Tm sb 1 5 b 2 S1 0 0 g A p 1 s2 M id 1

ra#o129/Cast(log2)
ES_Tsix-stopT=0d ES_Tsix-stopreplica ES_Xist-delT=0d ES_Xist-delreplica * * * * * * * = too low coverage at polymorphic sites to reliable call allelic bias 75% bias RaPo future Xi/Xa (log2) Supplementary Figure 11. X--linked genes with allelic bias in undifferenPated female 2i ESCs are randomly distributed over the X chromosome. (a) ValidaPon of the method for idenPficaPon of genes showing allelic preference. To validate our method, we determined the number of X--linked genes showing allelic preference in any of the ESCs (pink), aoer 8--day EB formaPon (purple), or in any of the NPCs (red) (Table S4) using criteria as described in the "Materials and methods". The pipeline robustly detects biased (XCI) genes aoer 8--day EB formaPon and in the NPC lines. However, very few biased genes are present in the undifferenPated ESCs. (b) RaPo (log2) of the genes that show an allelic bias of 75 % or more towards 129 or Cast in at least one of four undifferenPated 2i ESC RNA--seq profiles (two biological replicas of undifferenPated 2i ES_Tsix--stop and ES_Xist--del, respecPvely). Similar to the ES_Tsix--stop ESCs, the ES_Xist--del ESCs have a full bias towards one of the two X chromosomes being silenced during XCI (being the Cast--derived X chromosome) due to a delePon in the Xist gene on the 129 allele [64]. (c) LocalizaPon of the genes as idenPfied in (b) over the linear X chromosome. Figure 12. Genes within region 1 are significantly higher expressed in the NPCs in which they escape XCI compared with the NPCs in which they are silenced on the Xi. (a) Xi/Xa RaPo of genes that escape XCI in region 1 in at least one NPC line. A star indicates inacPvaPon of the gene on the Xi. (b) Gene expression of *NPC_129--Xi and NPC_Cast--Xi relaPve to NPC_129--Xi over all X--linked genes (first two boxplots) and over the genes in escape region 1 (last two boxplots; all genes within region 1 of NPC_129--Xi are robustly silenced). The difference in expression level between *NPC_129--Xi and NPC_Cast--Xi in region 1 is caused by the fact that less genes escape XCI in region 1 in *NPC_129--Xi. Figure 13. ValidaPon of genes escaping XCI in the three escape regions using cDNA Sanger sequencing. In this analysis we included the known escape gene Kdm6a that escapes XCI in all three NPC lines profiled in this study (Table 1), and at least three genes per escape region idenPfied in this study. For each gene, the polymorphic site between 129 and Cast is indicated with an arrowhead. Genes that show escape from XCI are indicated with a circle around the polymorphic nucleoPde. For these nucleoPdes, there are clear cDNA signals from both 129 and Cast alleles. Except for 1810030O07Rik, the observed paierns of escape using cDNA Sanger sequencing are line with the results obtained by RNA--seq (Table 1). . As a large number of sequence tags in Hi--C originate from a non--specific background and proximity ligaPon, this analysis provides an esPmate of the karyotype of the ES_Tsix--stop ESCs. As expected, the ES_Tsix--stop ESC line has two Pmes more sequence tags originaPng from chromosome X compared with the male J1 ESCs. Furthermore, the higher number of tags on chromosome 12 suggests a trisomy 12 present in the ES_Tsix--stop ESC, as also observed in the RNA--seq analysis (Fig. S3a). (b) Allele--specific distribuPon of sequence tags originaPng from the Hi--C of ES_Tsix--stop ESCs over all chromosomes, ploied as Cast/129 (%). The double number of sequence tags originaPng from Cast compared with 129 on chromosome 12 shows that the trisomy 12 is caused by an addiPonal chromosome 12 of Cast origin, in line with the RNA--seq analysis (Fig.  S3b). The equal distribuPon of Cast-- versus 129--derived tags, together with the tag distribuPon observed in (a) and the RNA--seq tag distribuPon (Fig. S3), shows that the ES_Tsix--stop ESCs contain no major genomic abnormaliPes besides trisomy 12.  ESCs. This figure is the same as Fig. 6, but includes genes for which no informaPon allelic raPos was obtained (mainly due to low expression or the absence of polymorphic sites), as well as the interacPon matrix in male J1 ESCs obtained from Dixon et al. [51]. The reason for the absence of data in the male J1 ESCs in the 8--8.5 Mb region in (a) is that this is a region of low mappability, in which paired--end 32 nucleoPdes is not long enough for unique mapping. Since we sequenced paired--end 75 nucleoPdes, we were able to retrieve informaPon for this region for the ES_Tsix--stop ESCs that we profiled.  Suppl Figure 19 Supplementary Figure 19. In human, clusters of genes escaping XCI on the short arm of chromosome X [29] colocalize with topologically associated domains (TADs) [51].

Supplementary
(a) Percentage of escape (magenta) and silenced (black) genes within the TADs present on the short arm of chromosome X in human fibroblasts. Only the 17 TADs that contain escape informaPon for more than two of the associated genes are included. The TADs are ranked on the x--axis according to their posiPon on chromosome X, with 1 being very distal from the centromere, and 17 being proximal. For intersecPon of the genes with TADs, 100kb was subtracted from both sites of all TADs to avoid including genes that have their regulatory sequences in a neighbouring TAD. (b) Overview of the chromaPn structure (Hi--C) between posiPon 4--20 Mb of chromosome X (including TADs 2 Pll 9 as present in (a)) shows a clear alternaPng paiern of TADs containing escape genes and TADs containing silenced genes. The legend for genes that escape XCI or genes that are silenced is indicated below.

Annota0on for the data from Carrel and Willard [29]
Black = X inacPvated; Xi expression in 5/9 hybrid clones or less Magenta = escape; Xi expression in more than 5/9 hybrid clones gray = no informaPon on allelic raPos