Molecular networks involved in mouse cerebral corticogenesis and spatio-temporal regulation of Sox4 and Sox11 novel antisense transcripts revealed by transcriptome profiling

SAGE analysis reveals spatiotemporally regulated transcripts and overlapping sense and antisense transcripts that are important for mouse cerebral cortex development


Background
Complex behavioral tasks, from perception of sensory input and the control of motor output to cognitive functions such as learning and memory, are dependent on the precise development of innumerable interconnections of neuronal networks in the cerebral cortex. The development of the cerebral cortex (also known as cerebral corticogenesis) involves the specific influence of intrinsic and extrinsic mechanisms, which are triggered spatio-temporally [1][2][3][4]. Between embryonic day 11 (E11) and 18 (E18), the mouse cerebral cortex develops from a relatively homogenous band of mitotic multipotent progenitor cells into a complex laminated structure containing various classes of neuronal cells [2,[4][5][6]. Cerebral corticogenesis involves: proliferation of multipotent progenitors (E11 to E16.5); migration of postmitotic cells (E11 to E17); cell morphogenesis (E13 to E18); gliogenesis and synaptogenesis (E16 until early postnatal period); and reorganization, elimination and stabilization of neuronal networks (up to adulthood).
The mouse cerebral cortex develops in the latero-medial and rostro-caudal axes [7,8]. At E11, the primordial plexiform layer begins to form in the most lateral part of the neural wall. Its growth continues in the latero-medial axis to the medial part of the telencephalon by E13. The primordial plexiform layer is also expanded in the rostro-caudal axis. The growth in this axis is always less than the growth in the latero-medial axis. The first wave of migratory neuronal cells form the cortical plate 2 days later after the development of the primordial plexiform layer. These events are followed by the development of the cortical plate into an organization of six distinct layers that forms the adult cerebral cortex. Generally, the rostral-most regions of the adult cerebral cortex consist of areas involved in executive functions and motor coordination, whilst the caudal-most regions consist of areas involved in sensory perception such as visual function. Although distinct functional arealization of the cerebral cortex do not fully apply to rodents, mounting evidence suggesting that regulated arealization exists has been shown in mice involving transcription factors such as empty spiracles homolog 2 (Emx2; Drosophila) [9,10], paired box gene 6 (Pax6) [9], COUP transcription factor 1 (Coup-tf1) [11], Sp8 transcription factor (Sp8) [12,13], distal-less homeobox 1/2 (Dlx1/2) and gastrulation brain homeobox 2 (Gbx2) [14]. The extensive cyto-architectural and anatomical changes occurring in a spa-tio-temporal manner during the peak (E15) and at the end (E17) of embryonic cerebral neurogenesis as well as during early postnatal (P1) corticogenesis through to adulthood involves complex underlying molecular regulatory networks.
Complex molecular regulatory elements are important determinants in both spatial and temporal cerebral corticogenesis. These elements regulate gene expression at the chromatin, DNA or RNA, and protein levels through chromatin packaging or remodeling, histone acetylation and deacetylation, chromatin insulation, DNA methylation, post-transcriptional regulation and post-translational protein modification or degradation signaling pathways [15][16][17]. Other processes involved in such regulation include pre-mRNA processing and nuclear mRNA retention by nuclear-specific paraspeckle complexes [18,19], microRNAs (miRNAs) that interfere with mRNA translation [20][21][22], and a new class of under-characterized non-coding RNA transcripts known as natural antisense transcripts (NATs) [23,24]. These regulatory networks play a pivotal role in establishing when, where and how a multipotent progenitor cell should proliferate, migrate and settle at a designated position in the cortex. The information regarding regulatory networks during cerebral corticogenesis, however, remains incomplete and does not provide a comprehensive view of the underlying regulatory elements throughout this complex event.
In this study, we employed both short and long 3' serial analysis of gene expression (SAGE) technologies [25,26] to identify differentially expressed regulatory elements by comparing transcriptomes of cerebral cortices generated from four selected developmental stages: E15.5, E17.5, P1.5 and 4 to 6 months old. We also compared rostral to caudal regions of E15.5 and left to right regions of adult cerebral cortices. We report temporally co-regulated gene clusters, novel molecular networks and associated pathways, novel candidates in regionalized development and genomic clustering of SRY-box containing gene 4 (Sox4) and SRY-box containing gene 11 (Sox11) sense and antisense transcripts. The genomic clustering analysis led us to the discovery of spatio-temporal regulation of novel Sox4 and Sox11 antisense transcripts as well as differential regulation of these transcripts in proliferating and differentiating neural stem/progenitor cells (NSPCs) and P19 (embryonal carcinoma) cells.
We constructed 10 SAGE libraries from the cerebral cortex of E15.5, E17.5 and 4-to 6-month-old adult C57BL/6 mice (N = 10; Table 1). The data from two additional SAGE libraries generated from E15.5 and P1.5 cerebral cortices from Gunnersen et al. [27] were also incorporated into our analysis. These SAGE libraries represent key stages of cerebral corticogenesis and are accessible from the Gene Expression Omnibus (GEO) website [GEO: GSE15031] [28]. The libraries contain a total of 531,266 SAGE tags (approximately 44,000 tags per SAGE library), 223,471 of which are unique (nonredundant) after screening for artifacts and mapping of short SAGE tags to long SAGE tags ( Table 1). The number of unique tags in each library ranges from approximately 7,200 to 32,000 due to the variation in library size (approximately 13,500 to 70,000). The distribution of tag abundance, however, is similar in all libraries (Figures S1 and S2 in Additional data file 1), in which the majority of tags were detected only once (58 to 76% or approximately 5,500 to approximately 24,000 tags), representing a trend comparable with previously reported SAGE analyses of mouse neocortices [27,29]. Of all unique tags, only 5,199 (approximately 2.4%) are common to all developmental stages. The low number of common unique tags is most probably due to the high abundance of tags that occur only once.

Analysis of differentially expressed transcripts/tags
To identify differentially expressed tags/transcripts (DETs), we considered only those 25,165 unique tags with a count >2 across all libraries. Under stringent analyses (Table S1 in Additional data file 1), we identified a total of 561 DETs in various comparisons between developmental stages (Table 2; Figure S3 in Additional data file 1). A full list of DETs with assigned IDs is provided in Additional data file 2. Greater numbers of DETs are observed when the interval of two comparative developmental stages becomes wider. We find the largest number of DETs (approximately 59% or 328 DETs) in the embryonic versus adult stages (E versus Ad) followed by P1.5 versus Ad (approximately 34% or 192 DETs), E15.5 versus P1.5 (approximately 6% or 36 DETs) and E15.5 versus E17.5 (approximately 7% or 38 DETs) comparisons. These indicate distinctive transcript signatures during cerebral cortex development. Comparisons between rostral and caudal E15.5 (R versus C), and left and right adult cerebral cortices (L versus Ri) are described in a different section below.
Approximately 69% of DETs have been assigned to known genes and 6% to expressed sequence tags (ESTs), while 10% of DETs have multiple matches (tag matching multiple gene identifiers) and 8% having ambiguous matches (tag matching the same gene identifier at multiple chromosome loci). Approximately 8% of DETs have no matches and it is most likely that these DETs belong to transcripts from less defined regions, such as centromeric and telomeric areas of a chromosome or assembly gaps of the mouse genome.

Hierarchical clustering of DETs
To identify co-regulated genes, all 561 DETs were hierarchically grouped into 12 clusters based on the calculation of the Euclidean distance of logged normalized counts ( Figure 1). Clusters 1, 5 and 6 consist of embryonic-specific DETs that exhibit the highest expression during embryonic development of the cerebral cortex. DETs in cluster 1 are expressed throughout all stages of development but exhibit the lowest expression in the adult cortex. Expression of DETs in cluster 5 ceases prior to birth, whereas DETs in cluster 6 are expressed up to early postnatal stage. On the other hand, clusters 4, 8 and 10 consist of adult specific DETs, showing very similar temporal expression profiles, but with different magnitudes (for example, highest expression in cluster 10). Clusters 2 and 7 are termed 'gene-switching' clusters as they show interesting expression-switching profiles. Cluster 2 shows an expression switch between P1.5 and adult stages whilst cluster 7 shows an expression switch between embryonic and adult stages. Clusters 3 and 9 consist of DETs showing region-(caudal region of E15.5 cerebral cortex) and stage-specific (P1.5 only) expression. Clusters 11 and 12 were excluded from subsequent analyses because they contained very few annotated tags. DETs within the same cluster may be co-regulated and/or involved in similar biological functions during cerebral corticogenesis.
We performed a systematic gene ontology functional clustering using the standardized Gene Ontology term analysis tool DAVID (Database for Annotation, Visualization and Integrated Discovery) [30] (Additional data file 3). Functional analysis of these gene clusters reveals that they have distinctive roles during cerebral corticogenesis. Embryonic-specific gene clusters (1, 5 and 6) are dominated by genes associated with cellular protein and macromolecule metabolic processes or biosynthesis, and nervous system and neuron development. These results match with the expected functional ontologies during embryonic cerebral cortex development in which neuronal migration, differentiation and axonogenesis events are at their peak. In contrast, adult-specific gene clusters (4, 8 and 10) consist of genes related to biological processes in the mature cerebral cortex, such as ion homeostasis, synaptic transmission and regulation of neurotransmitter level. In addition, these gene clusters are also enriched for ribonucleotide/nucleotide binding activity and components of cytoplasmic membrane-bound vesicles. These functional ontologies are in accordance with adult cerebral cortex function, which features synaptogenesis and nerve impulse transmission at synapses. Gene-switching clusters 2 and 7 are enriched with gene ontologies that are similar to both the embryonic-and adult-specific gene clusters. In addition, these gene clusters are also enriched for microtubule cytoskeleton organization and biogenesis, nucleotide biosynthesis and regulation of mRNA translation processes.

Quantitative RT-PCR validation of DETs and gene clusters
To ascertain the robustness of the SAGE datasets, we selected 136 candidate DETs and two additional genes of interests (ATPase, Cu++ transporting, alpha polypeptide, Atp7a, and cullin-associated and neddylation-dissociated 2, Cand2) for validation after considering both stage-to-stage and hierarchical based analyses (Table S2 in Additional data file 1). The selected DETs are transcription regulators, chromatin modifiers or post-translational regulators, such as ubiquitination pathway related molecules. Seventeen DETs are ESTs, which have been identified in brain-related mouse cDNA libraries or transcriptomes. Independent quantitative RT-PCR (RT-qPCR) validation was carried out using three biological repli- High-throughput RT-qPCR validation of gene clusters Figure 2 High-throughput RT-qPCR validation of gene clusters. All validations were based on DETs for canonical mRNA. Failed validation of DETs according to hierarchical clustering expression profiles is indicated by arrows. N = 3 and data are presented as mean ± standard error of the mean.  All RT-qPCR data are statistically significant at P < 0.001 unless specified: * P < 0.01; † P < 0.05. ‡ A disagreement between RT-qPCR and SAGE data. As, adult-specific expression; Es, embryonic-specific expression; NS, no statistically significant difference between two developmental stages. Fold change values of <1.0 are presented in a negative fold change format.  All RT-qPCR data are statistically significant at P < 0.001 unless specified: * P < 0.01. † A disagreement between RT-qPCR and SAGE data. As: adultspecific expression; Nil: SAGE data not available; Ps: P1.5-specific expression. Fold change values of <1.0 are presented in a negative fold change format. All RT-qPCR data are statistically significant at P < 0.001 unless specified: * P < 0.05. Fold change values of <1.0 are presented in a negative fold change format.

RT-qPCR validation of SAGE profile for E versus adult comparison
DETs are (in descending order of enrichment) Zfp57, Csrp2, AA122503, Cdk4, Sox4, Marcks, Actb, BQ177889, Hmgb3, Rps4x, Actl6b, Zswim4 and Dr1, whose expression ranges from 33-to 1.4-fold greater than in the adult. On the other hand, the Plp1 is expressed at a level 40 times greater in the adult cerebral cortex compared to P1.5. Other validated genes that are enriched in the adult cerebral cortex include (in descending order) Hprt1, Calm1 and Mbp, with a 2.3-to 1.8fold greater expression than the P1.5 cerebral cortex. Comparison between E15.5 and P1.5 shows that Mapt has a 1.5fold greater expression level in the P1.5 cerebral cortex while Sox11 expression is 3.3-fold lower (Table 5).
We were unable to validate all 17 DETs from L versus Ri regions, suggesting the left and right hemispheres of the adult mouse cerebral cortex are highly similar and indistinguishable at the molecular level. SAGE and RT-qPCR analyses for R versus C regions of E15.5 are discussed in a separate section below.

Functional analysis of validated gene clusters using Ingenuity Pathway Analysis
The validated DETs of embryonic, adult and gene-switching clusters were functionally characterized using proprietary software, Ingenuity Pathway Analysis (IPA) from Ingenuity Systems ® , to identify enriched molecular networks and canonical pathways. Given a list of input genes (known as focus genes), IPA mapped these genes to a global molecular network developed from information contained in the Ingenuity knowledge base (a manually curated database of experimentally proven molecular interactions from published literature). Networks of these focus genes were then algorithmically generated based on their connectivity. IPA determined the most significantly enriched biological functions and/or related diseases by calculating the P-value using Fisher's exact test. Using similar methods, significantly represented canonical pathways in a set of focus genes were also determined using IPA (Section C in Additional data file 1).
In adult-specific gene clusters, two molecular networks (18 focus genes and 50 associated nodes; networks 3 and 4 in Figure 3) were identified and interconnected via a single gene, Mbp (Figures S6 and S7 in Additional data file 1). These molecular networks enrich for nine statistically significant canonical pathways (P < 0.05) such as synaptic long-term potentiation and depression, calcium signaling, B cell receptor signaling, cAMP-mediated signaling, GM-CSF signaling, amyotrophic lateral sclerosis signaling, G-protein-coupled receptor signaling and xenobiotic metabolism signaling pathways. Validated DETs such as Camk2a, Gria3, Itpr1, Egr1, Table 6 RT-qPCR validation of SAGE profile for rostral E15. 5    Alzheimer's disease (3 DETs; P = 1.52E-2) ( Figure S10 in Additional data file 1). Thirteen DETs associated with these neurological disorders are also implicated in processes related to cell death. Of these, eight DETs are expressed in neurons only, two in glia only and three in both neurons and glia.

Regionalized expression of DETs in the E15.5 cerebral cortex
Early regionalized development is an important event that could lay the foundations for adult arealization of cerebral functions. To identify genes with regionalized expression profiles, we compared SAGE libraries generated from rostral and caudal regions (equivalent to anterior and posterior regions of the human cerebral cortex) of the E15.5 cerebral cortex. We identified 44 DETs and selected 25 DETs (22 known genes, 1 EST and 2 ambiguous genes; Additional data file 2) and 2 genes of interest (bladder cancer associated protein, Blcap, and ankyrin repeat and zinc finger domain containing 1, Ankzf1) for RT-qPCR validation and further detailed regionbased analysis. This was done using independent biological triplicates of clearly defined regions/quadrants, such as rostro-lateral (RL), rostro-medial (RM), caudo-lateral (CL) and caudo-medial (CM) of the E15.5 cerebral cortex (see Materials and methods). Two positive controls with known regionalised expression were included in the RT-qPCR: RAR-related orphan receptor beta (Rorb) and nuclear receptor subfamily 2, group F, member 1 (Nr2f1). Rorb is highly expressed in the rostral region whereas Nr2f1 is highly expressed in the caudal region of the cerebral cortex [1,62].
An initial RT-qPCR analysis of combined RL and RM (rostral) as well as CL and CM (caudal) regions shows upregulation of Rorb and Nr2f1 in rostral and caudal regions, respectively (based on fold change direction and magnitude of 1.3 times). The same analysis also confirmed the SAGE data for 3 out of 25 DETs (Actb, Tmsb4x and BC025816) and Blcap (Table 6; Additional data file 6). Both Actb and Tmsb4x have greater expression in the rostral region whereas Blcap and BC025816 are greater in the caudal region. To identify expression profiles in a more refined area and prevent regional compensation due to combined quadrant analysis, we performed a quadrant versus quadrant multiple regions comparison. The largest number of DETs were found in the RL versus CM comparison, as they are the two developmentally most distinct regions within the cerebral cortex at E15.5 compared to others; RL versus CL > RM versus CL > RL versus RM = RM versus CM > CL versus CM. The region-specific expression profiles were plotted for each of the DETs ( Figure 4) and grouped into two categories: RL-specific DETs, such as Actb, Tmsb4x and cytochrome b-245, beta polypeptide (Cybb); and CM-specific DETs, including Blcap, EST BC025816, Ankzf1 and cytochrome c oxidase I, mitochondrial (Cox1) (Additional data file 6).
To visualize the regionalized expression profiles, we performed in situ RNA hybridization (ISH) on all DETs validated by quadrant versus quadrant RT-qPCR analysis, Blcap, Ankzf1 as well as the positive controls Rorb and Nr2f1. We performed ISH on sagittal and coronal sections (from rostral to caudal regions) of the E15.5 mouse brain ( Figure 5). Under dark-field microscopic examination, we confirmed the regionalized expression of Rorb (at the rostral cortical plate; Figure 5a) and Nr2f1 (at the caudal ventricular zone; Figure  5f). From the analysis, Actb is highly expressed at the cortical plate and the subplate (Figure 5b). Tmsb4x is highly expressed at the cortical plate and the intermediate zone ISH analysis showed that both Blcap and Ankzf1 are caudal specific. Serial coronal sections from rostral to caudal regions of the brain (Figure 5g-i) show Blcap is weakly expressed in the rostral cerebral cortex, particularly at the intermediate zone, subplate and the cortical plate, but is highly expressed in the hippocampus and thalamus (Figure 5i). Ankzf1 is expressed specifically in the ventricular zone as well as the cortical plate towards the caudal region of the cerebral cortex (Figure 5j-m). The distinctive expression of Ankzf1 in both the ventricular zone and cerebral cortex prompted us to extend our ISH analysis to various developmental stages starting from E11.5 to adulthood (Figure 5n-t). Ankzf1 is expressed in the primordial plexiform and the ventricular zone layers of the telencephalon at E11.5 (Figure 5n). By E13.5, Ankzf1 is weakly expressed in the ventricular zone, but is highly expressed in the cortical plate (Figure 5o). From E17.5 to P1.5, the expression of Ankzf1 is maintained in the cerebral cortex ( Figure 5p, q). In the adult, Ankzf1 expression is obvious in the piriform cortex, hippocampus and cerebellum (Figure 5rt).

Genomic clustering of sense-antisense SAGE tags at the Sox4 and Sox11 loci
We performed genomic clustering analysis of the SAGE tags to determine any actively transcribed chromosomal loci throughout cerebral corticogenesis. Probabilities for chance occurrences of two, three, four, and five DETs being clustered within a window of ten adjacent tags present within each chromosomal location, irrespective of genetic distance, were calculated. This analysis was based on the DET lists described above (Additional data file 2). The analysis showed two overrepresented chromosome loci at Sox4 and Sox11, which derive from embryonic-specific gene clusters.
At both loci, we observed multiple SAGE tags with both sense and antisense orientations, which signify alternative polyadenylation sites, differential splicing and overlapping antisense transcription. As an initial validation of the antisense messages, we performed strand-specific RT-PCR ( Figure 6) using cDNA synthesized from equally pooled total RNAs (three  To further validate the expression profiles of the multiple Sox4 DETs, we performed 3' rapid amplification of cDNA ends (RACE)-Southern analysis using pooled adaptor-oligo-d [T] 15 synthesized cDNAs from three mice at each developmental stage. Based on this method, we were able to semiquantitatively and accurately measure the expression levels of individual SAGE tags at the locus. To show that the amplification was cDNA specific, we performed PCR by using the same primer sets on mouse genomic DNA under the same conditions. In all cases, no amplification was observed (data not shown). This analysis confirmed the presence of four out of seven alternative transcripts for Sox4 (Figure 7d). Corresponding tags were determined by estimating the amplicon sizes between the strand-specific primers used, the next downstream AAUAAA/AUUAAA polyadenylation signal (if any) and succeeding CATG sequence or SAGE tags. Figure  7d(1)-d(3) confirms the existence of sox4_tag10, sox4_tag12 and sox4_tag15. Of these tags, SAGE expression profiles of sox4_tag10 and sox4_tag15 were validated (embryonic-specific and reduced expression after P1.5) but not sox4_tag12 (E15.5 caudal region-specific). 3' RACE-Southern analysis using a sense probe detected bands in the rostral and caudal regions of E15.5, E15.5 and E17.5 cerebral cortices and, therefore, confirmed the existence of the Sox4 antisense transcripts (Figure 7d(4)). Even though none of these tags were differentially expressed in between these regions based on the SAGE analysis, our findings show distinctive regionalization for sox4_tag14 expression at the E15.5 rostral cerebral cortex. Proteasome (prosome, macropain) subunit, beta type 2 gene (Psmb2) and Hmbs were used as controls and no antisense or alternative transcripts were identified at these gene loci (Figure 7d(5)-d(8)).

RT-qPCR analysis of all R versus C DETs based on quadrant versus quadrant analysis
Since 3' RACE-Southern analysis was dependent on oligo-[dT] 15 priming, we could not rule out the possibility of amplicons that were generated by false priming on homopolymer-A stretches. Therefore, Northern analyses were performed on equally pooled total RNA extracted from the cerebral cortices of seven mice at E15.5, E17.5 and P150 (negative control). By using a double-stranded DNA probe at the 3' untranslated region (UTR) of Sox4 (Additional data file 8), we identified six bands ranging from approximately 2 kb to approximately 4.7 kb (Figure 7e). Sox4 sense transcripts are weakly expressed in  Table S6 in Additional data file 1) [64]. Taken together, the analysis confirmed the existence of multiple overlapping variants of Sox4 sense and antisense transcripts at this gene locus.
To confirm the rostro-caudal expression of Sox4 sense transcripts, we performed ISH on sagittal sections of mouse brains using a Sox4 antisense riboprobe that spanned across the sox4_tag10, sox4_tag12, sox4_tag15 and sox4_tag16 SAGE tags. Sox4 showed regionalized expression at E15.5 and E17.5 (Figure 7f). At E15.5, Sox4 sense transcripts are expressed more in both the rostral-and caudal-end regions of the cortical plate compared to the intermediate region between them (Figure 7f(1)). By E17.5, expression of Sox4 sense transcripts is obvious in the rostral cortical plate (Figure 7f(2)). At both stages of development, Sox4 sense transcripts are uniformly expressed in the intermediate zone of the cerebral cortex. These findings correspond to the SAGE tag counts for E15.5 rostro-caudal regions of the cerebral cortex ( Figure 7f). These observations explain the averaged total tag count per 100,000 tags for different Sox4 sense transcripts, which are predominantly expressed in both rostral and caudal regions of the cerebral cortex. The regionalized expression of Sox4 in the cortical plate is obvious only at E15.5 and E17.5, but not at other stages of development (Figure S11 in Additional data file 1).

In situ RNA hybridization of Sox4 sense and antisense transcripts
To further ascertain the antisense expression of Sox4 in a spatio-temporal manner, we performed ISH on coronal sections obtained from E11.5, E13.5, E15.5, E17.5, P1.5 and P150 mouse brains. Sense and antisense RNA probes were generated from the same clone used in the Northern analysis. At E11.5, Sox4 sense transcripts are confined to the primordial plexiform layer (Figure 8a). From E13.5 to P1.5, the sense transcripts are expressed throughout the cortical plate (Figure 8b-e). Expression of sense transcripts in the subventricular zones is observed at E17.5 and P1.5 only (Figure 8d, e). There is no observable sense expression in the adult stage ( Figure 8f). Sox4 antisense transcripts are expressed throughout the telencephalon at E11.5 (Figure 8g). From E13.5 to P1.5, Sox4 antisense expression is confined to the cortical plate only (Figure 8h-k). There is no obvious antisense expression in the cerebral cortex in the adult stage (Figure 8l). A microscopic examination at high magnification showed that Sox4 antisense transcripts are predominantly localized in the nucleus whereas Sox4 sense transcripts are found in both the nucleus and cytoplasm (Figures S12, S13 and S14 in Additional data file 1). We used hemoglobin alpha, adult chain 1 (Hba-a1) of the corresponding brain region and time-point as a control in the analysis (Figure 8m-r).
Furthermore, Sox4 antisense expression occurs in the piriform cortex layer II (Figure 9a-c) and dentate gyrus ( Figure  9g-i) in the adult brain; however, no sense expression is observed in these regions. At P1.5, we identified complementary expression between Sox4 sense and antisense transcripts in the olfactory bulb (Figure 9d-f). Sox4 sense expression was confined to the granular and glomerular layers of the olfactory bulb whereas antisense expression was found only in the outer plexiform layer. We used either Sox11 or Hba-a1 of the corresponding brain region and time-point as a control in the analysis (Figure 9c, f, i).

Analysis of the Sox11 genomic cluster
SAGE tags, which represent multiple overlapping sense and antisense transcript variants at the Sox11 genomic cluster, were validated using 3' RACE-Southern analysis as described above. See Section F in Additional data file 1 for a full description of the Sox11 results. ISH analysis did not confirm the expression of antisense transcripts of Sox11, but the presence of PETs spanning three out of five antisense tags confirmed the existence of Sox11 antisense transcripts (Table S8 and Fig-ures S16, S17 and S18 in Additional data file 1). The discrepancy between ISH and RT-qPCR or 3' RACE-Southern analysis suggests that Sox11 antisense transcripts might be expressed at low levels or at specific locations of the cerebral cortex, and hence can be detectable only by using serial sections or whole mount ISH.

Screening of Sox4 and Sox11 antisense transcripts in the adult mouse brain, organs, P19 cell line and neurospheres
We screened various adult brain regions (olfactory bulb, cerebellum, medulla, hippocampus, thalamus and cerebral cortex) and selected mouse organs (E15.5 whole brain, heart, kidney, liver, skeletal muscle, skin, spleen, stomach, testis and thymus) for the expression of Sox4 and Sox11 antisense transcripts by strand-specific RT-qPCR. Within the adult brain, Sox4 sense and antisense transcripts are expressed in all regions, with the highest level found in the olfactory bulb, which is approximately four-to nine-fold greater than those in other brain regions (Figure 10a). Expression of Sox4 antisense transcripts occurs in all mouse organs, with the highest level in the thymus followed by E15.5 whole brain, testis and skin (Figure 10b). Sox4 sense and antisense expression profiles are similar throughout the entire series of samples screened, with the sense transcripts being consistently expressed at a greater level than the antisense transcripts (approximately 1.7-fold in various brain regions and approximately 2-to 14-fold in various organ comparisons).
Sox11 sense transcripts are expressed at the highest level in the olfactory bulb, approximately two-to seven-fold greater than those in other brain regions (Figure 10a). Sox11 antisense transcripts, on the other hand, are expressed in all brain regions screened and at a comparable level in the olfactory bulb, hippocampus, thalamus and cerebral cortex. In comparison to other adult mouse organs, Sox11 sense and antisense transcripts are highly expressed in the E15.5 whole brain, with Sox11 sense transcript levels at least 100-fold greater than those in other mouse organs (Figure 10b). On the other hand, Sox11 antisense expression is observed only in the E15.5 whole brain, skin and stomach. Notably, Sox11 sense ISH analysis of Sox4 transcripts in E11.5 to P150 mouse brains  transcripts are expressed more highly than antisense transcripts in the E15.5 whole brain and skin (23-and 4-fold, respectively).
Since both Sox4 and Sox11 are implicated in neuronal differentiation and glial maturation processes [65,66], we examined both Sox4 and Sox11 sense and antisense transcript expression in proliferating and differentiating P19 (embryonal carcinoma cells) and in embryonic NSPCs grown as neurospheres. Both Sox4 sense and antisense transcripts are upregulated during P19 cell differentiation (approximately 5.7-and 1.6-fold upregulation, respectively; Figure 11a) and neurosphere differentiation (approximately 1.9-and 1.8-fold upregulation, respectively; Figure 11b). For Sox11, both sense and antisense transcripts are upregulated in the differentiating compared to the proliferating P19 cells by approximately 2.3-and 4.2-fold, respectively (Figure 11c). Both the Sox11 sense and antisense transcripts are, however, downregulated in the differentiating neurospheres (approximately 2.6-and 1.5-fold, respectively; Figure 11d).

Discussion
In this study, SAGE was used to analyze global gene expression in the normal mouse cerebral cortex at various developmental stages. We report validated spatio-temporal regulation of genes involved in mouse cerebral cortex development from embryo to adulthood. The study highlights four main findings: association of DETs from different gene clusters with known functional processes or signaling pathways and disease-causative genes that are involved in cerebral corticogenesis; Ankzf1 and Sox4 sense transcripts are regionally expressed in the E15.5 cerebral cortex; multiple overlapping Sox4 and Sox11 sense and antisense transcripts are spatio-Sox4 antisense expression in other brain regions  temporally regulated during cerebral corticogenesis; and Sox4 and Sox11 antisense transcripts are differentially regulated in both proliferating and differentiating embryonicderived neurospheres and P19 cells.
We have shown that most tags generated in all libraries were singletons. The number of singletons could be reduced by increasing the number of tags sequenced. In mammalian cells, the number of additional unique transcripts identified approached zero when the number of SAGE tags sequenced reached approximately 600,000 [67]. Increasing the number of tags sequenced could improve the sensitivity of the technique to identify weakly expressed or novel transcripts, and the application of massively parallel signature sequencing [68] using a next-generation sequencer would be an ideal solution to accomplish this. In addition, one of the benefits of SAGE is that datasets generated from different groups or in public repositories such as SAGE Genie [69] and GEO [28] are readily comparable and, hence, can increase the tag count and sensitivity of the technique in discovering DETs between SAGE libraries. However, any meta-analyses involving various SAGE datasets are affected by experimental and biological variation; thus, a careful selection of matching libraries is crucial to limit systematic error or biases.
Our SAGE analysis robustly detected DETs with a low false positive rate (for example, <0.001% for comparison between left and right hemispheres of the adult cerebral cortex). Of all the identified DETs, approximately 8% were not mapped to either a single locus in the mouse genome or any unique annotation. This problem could be overcome by generating additional information from the 5' end of the transcript through alternative techniques such as PET sequencing [70], cap analysis gene expression (CAGE) [71] and 5' LongSAGE [72].
We have identified functional ontologies, molecular interactions and enriched canonical pathways that are distinct to the stage-specific gene clusters of validated DETs. The IPA network analysis generated connections between validated DETs across various developmental stages in relation to well-established proteins or molecules and neurological disorders. In the study, members of the Wnt/-catenin signaling pathway were enriched in networks 1 and 2 (embryonic-specific gene clusters). In neural development, Wnt/-catenin signaling plays an important role in regulating regional specification of the cortex along the rostro-caudal and dorso-ventral axes, and proliferation of cortical progenitors [73]. IPA highlighted three genes (Sox4, Sox11 and Sfrp1) associated with this pathway. In humans, the SFRP1 protein (secreted frizzled-related protein 1) stabilizes -catenin and increases transcription from -catenin-responsive promoters [74]. In -catenin-deficient mouse mutants, expression of both Sox4 and Sox11 is downregulated [75,76]. Sox4 and Sox11 proteins play an important role in establishing neuronal properties, pan-neuronal gene expression and proper myelination of the central nervous system [65,66]. This suggests that the role of the Wnt/-catenin signaling pathway in regulating neuronal development could be mediated, at least in part, by the Sox4 and Sox11 proteins.
The role of Wnt/-catenin signaling in regulating DETs (Btg1, Cdk4, Cdkn1c, Csrp2, Ezh2, Neurod1, Pcna, and Rps4x) involved in cell cycle and proliferation remains unclear. Gainand loss-of-function studies have established that Wnt/-catenin signaling is essential to maintain the pool of precursors for proper development of the cerebral cortex [77,78]. To date, there is no direct evidence to show that Ezh2, Pcna, Rps4x and Btg1 are involved in cell cycle regulation during early embryonic neurogenesis. But their expression in the ventricular/subventricular zone of the E15.5 developing mouse cerebral cortex [33,35,36] suggests that they may be involved in regulating cell proliferation during neurogenesis. The role of these DETs and their association with the Wnt/catenin signaling pathway remains unclear and requires further experimentation.
Networks 3 and 4 from the adult gene cluster were associated with various canonical pathways, in particular, synaptic long term potentiation (LTP) and calcium signaling. LTP is a process of synapse enhancement, which is thought to underlie some forms of learning and memory [79]. This process depends on Ca 2+ and calmodulin, which are major components of the calcium signaling pathway. We identified four DETs, Gria3, Itpr1, Ppp3ca and Camk2a, in both canonical pathways. In particular, the Camk2a enzyme is highly expressed in the brain and regulates mainly glutamatergic synapses during LTP [79]. DET products such as Nrgn and Camk2n1 can directly or indirectly regulate Camk2a or Ca 2+ / calmodulin and subsequently alter the outcome of the LTP pathway [80,81]. Therefore, they may serve as important candidate genes in the future analysis of the synaptic LTP pathway involving neurodegenerative diseases that cause the loss of cognitive function and memory.
Networks 5 and 6 are enriched for genes in the amyloid processing signaling pathway. App and Mapt are associated with this pathway. Under normal circumstances, the App protein is required for proper migration of neuronal precursors into the cortical plate in early embryonic corticogenesis [82]. The Mapt protein, on the other hand, plays an important role in maintaining the architecture of the neuronal cytoskeleton and intracellular trafficking. Overexpression of App protein and hyperphosphorylation of the Mapt protein have been implicated in the pathologies of Alzheimer's disease [83,84]. Interestingly, Ctsd, Atp7a, Clcn2 and Hprt1, the genes responsible for other human neurological disorders such as neuronal ceroid lipofuscinosis (Ctsd), Menkes disease (Atp7a), epilepsy (Clcn2), and Lesch-Nyhan syndrome (Hprt1), are associated with App and Mapt. These candidate genes are also involved in cell morphogenesis, assembly and organization and could be linked to deterioration of neurons during the pathologic progression of these disorders.
Pathway analysis of DETs classified into the N, G and B groups showed DETs in the neuron (N and B groups) are associated with Huntington's disease and schizophrenia, which were not previously identified in networks 3 to 6. Our analysis showed that both disorders share three common DETs, namely Rgs4, Ppp1r1b and Chgb, whose expression is downregulated in humans with Huntington's disease or schizophrenia [85][86][87][88]. A proportion of patients with Huntington's disease also develop schizophrenia [89,90]. Taken together, downregulation of Rgs4, Ppp1r1b and Chgb expression in neurons may contribute to the common symptoms in these disorders. Our findings imply that many DETs (including App, Hprt1 and Sncb) associated with both Huntington's disease and schizophrenia are also involved in neuronal/cell death processes [91][92][93]. Other DETs in the N group, not previously implicated in neuronal cell death, may serve as novel potential candidates during pathologic development in these disorders.
Regionalized development of the cerebral cortex involves the differential regulation of cell cycle exit, early migration and attainment of positional identity in neuronal fated cells. To date, only few genes have been associated with regionalized development of the cerebral cortex [3,94]. In the regionalization analysis, we identified the highest number of DETs in the comparison of the RL and CM libraries, which signifies that these two regions of the cerebral cortex are the most different. This finding supports the notion that the cerebral cortex is developed in a latero-medial axis followed by a rostro-caudal axis [7,8]. At E15.5, both Actb and Tmsb4x were expressed greater in the rostral cerebral cortex than in the caudal region. Both Actb and Tmsb4x proteins are involved in the actin cytoskeleton-signaling pathway [95]. In particular, the Tmsb4x protein has been shown to promote cardiomyocyte migration [96] and axonal tract growth in zebrafish [97]. Therefore, co-expression of Actb and Tmsb4x in the E15.5 mouse cortical plate suggests that they may have a synergistic role in early cortical cell development. Conversely, Blcap and Ankzf1 were expressed more highly in the caudal than in the rostral region of the E15.5 cerebral cortex. To date, the function of both Blcap and Ankzf1 in the cerebral cortex remains uncharacterized. This study provides the first comprehensive expression profile of Ankzf1 and suggests it could be an important transcription factor in cerebral corticogenesis.
At E15.5, Sox4 sense transcripts were expressed in a high-rostral and high-caudal manner with lesser expression within the intermediate region. By E17.5, Sox4 expression becomes obvious at the rostral cortical plate, which is similar to Rorb. But, we did not find that the regionalized expression of Sox4 sense transcripts resembles that of the restricted Rorb expression at E15.5 or in the postnatal brain [1,62]. This finding could be caused by the combined expression profiles of different Sox4 sense transcripts that are present across the rostro-caudal axis of the cortical plate. The regionalized expression of Sox4 sense transcripts occurs only between E15.5 and E17.5. Because the thalamic axon innervates the cortical plate after E17.5 [98], the regionalization of Sox4 sense transcripts in early cortical development could be an outcome of an intrinsic instead of an extrinsic mechanism that regulates early patterning of the cerebral cortex.
Genomic clustering of DETs identified the differentially regulated Sox4 and Sox11 gene loci. These genomic clusters imply that there are multiple overlapping sense and antisense transcripts surrounding the same gene locus that are co-transcribed simultaneously during cerebral cortex development. Both Sox4 and Sox11 are single exon genes and these transcript variants are therefore likely to be generated due to alternative polyadenylation. The 3' UTRs of both Sox4 and Sox11 have tandem terminal polyadenylation signals on both sense and antisense strands (data not shown), which supports the occurrence of multiple transcript forms or SAGE tags. Multiple mRNA forms with different 3' UTRs can lead to cellspecific regulation, different nuclear or cytoplasmic mRNA stability and translation rates [99,100]. The 3' UTR of Sox4 and Sox11 may contain AU-rich elements that play an important role in determining mRNA stability through deadenylation, decapping or 3'  5' decay [101]. Besides, different 3' UTR lengths may be targeted by different miRNAs, thus interfering with the translation process. Both Sox4 and Sox11 transcripts may be targeted by various miRNAs at different predicted positions across the 3' UTR (Tables S7 and S9 and Figures S15 and S18 in Additional data file 1). Therefore, 3' UTR lengths of Sox4 and Sox11 may be an important feature in the regulation of their protein expression during cerebral corticogenesis.
In the study, NATs were found at both the Sox4 and Sox11 gene loci overlapping the sense transcripts. Overlapping NATs may function as templates for the generation of pre-miRNA and mature miRNA with exceptional high sequence conservation that complement the overlapping sense protein-coding transcripts [102]. To date, no mature or pre-miRNAs have been predicted on the Sox4 and Sox11 sense and antisense strands (data not shown) or have been reported in miR-Base [103]. In addition, NATs can self-complement to form double stranded RNA or pair with sense transcripts and function as templates for the generation of endogenous small interfering RNAs, which could subsequently interfere with translation or transcription of multiple protein-coding transcripts [104]. Because both Sox4 and Sox11 proteins are highly expressed in the cerebral cortex, the overlapping NATs do not seem to be involved in the regulation of Sox4 and Sox11 through miRNA-or small interfering RNA-mediated translation repression mechanisms, but rather through antisenseregulated sense transcription within the nucleus.
Our ISH analysis showed complementary cellular expression profiles of Sox4 sense and antisense transcripts at the piriform cortex, olfactory bulb and dentate gyrus. This finding implies that Sox4 antisense transcripts may be essential in intracellular and interlocus negative feedback loop regulation of the Sox4 sense transcripts. Similar expression profiles of Sox4 sense and antisense transcripts in multiple mouse organs and brain regions, however, suggest that these transcripts may be co-expressed. This observation is also supported by the temporal co-expression of Sox4 sense and antisense transcripts in the cortical plate or layers I to III of the cerebral cortex. Taken together, the sense and antisense transcripts of Sox4 are co-expressed in some cells and expressed complementarily in other cells, suggesting crucial cell-type-specific regulation.
Sox4 and Sox11 have been shown to have redundant roles during mouse development [105], and Sox11 may play a compensatory role in the absence of Sox4 during brain development [106]. We demonstrated that Sox4 and Sox11 sense and antisense transcripts have a similar expression in the brain, but not in other organs, suggesting a compensatory role for Sox11 only in the brain. Sox11 antisense transcripts were expressed in the brain, skin and stomach only, suggesting organ-specific regulation.
Our data show upregulation of Sox4 and Sox11 sense transcripts in differentiating P19 cells, consistent with the findings of others [107,108], and demonstrate upregulation of antisense transcripts as well. We also find both Sox4 and Sox11 sense transcripts expressed in the NSPCs cultured as neurospheres, which is in agreement with Dy et al. [109]. Furthermore, we identify upregulation of both Sox4 sense and antisense transcripts but downregulation of Sox11 sense transcripts in differentiating neurospheres. Taken together, our findings show that there are potentially common and distinct roles for Sox4 and Sox11 sense and antisense transcripts during neuronal and non-neuronal cell proliferation and differentiation. The underlying regulatory mechanism of these transcripts, particularly the antisense ones, remains unknown and requires further investigation.

Conclusions
This study provides avenues for future research focus in understanding the fundamental processes and development of neurological disorders related to the cerebral cortex. We confirm the regionalized expression of new candidate genes in the E15.5 cerebral cortex as well as differential regulation of multiple overlapping sense and novel antisense transcripts within Sox4 and Sox11 gene loci during cerebral corticogenesis. We also report for the first time the spatio-temporal regulation of Sox4 antisense transcripts in the brain as well as differential regulation of novel Sox4 and Sox11 antisense transcripts in various mouse organs and in proliferating and differentiating NSPCs and P19 cells. The finding provides an insight for future investigations into the role of antisense transcripts during cerebral corticogenesis and neuronal differentiation.

Handling of animals and dissection of the cerebral cortex
All experiments that involved animal breeding and handling were performed according to protocols approved by the Melbourne Health Animal Ethics Committee (Project numbers 2001.045 and 2004.041). All animals involved in the study were C57BL/6 mice unless specified otherwise. All mice were kept under conditions of a 12-h light/12-h dark cycle with unlimited access to food and water. All mice were culled by cervical dislocation prior to dissection. Cortical tissue was procured in the following fashion. For adult samples, after removal of the meninges, coronal cuts were used to excise the olfactory bulb from the rostral region, and the superior colliculus from the caudal region. A sagittal cut to separate the two cortical hemispheres was performed. The cortical pallium was dissected from the subpallial striatum and the septum. The neocortex was then dissected away from the cingulate cortex and the entorhinal cortex. For embryonic samples, the cortical tissue was dissected free from the underlying ganglionic eminences at the pallial-subpallial border. An orthogonal cut was made to remove the presumptive striatum and the overlying piriform cortex. On the medial aspect, the medial limbic cortex was included for analysis, but the adjacent hippocampal primordium, including the cortical hem, was excluded. For the E15.5 cerebral cortex, the resulting hemispheres containing cortical tissue only were placed on the bottom of the Petri dish and, using a fine scalpel, divided into four equal quadrants per hemisphere, namely RL, RM, CL and CM. Rostral and caudal quadrants from both hemispheres were pooled for SAGE library construction but separately tracked for RT-qPCR analysis. Procurement of other adult brain tissues and related mouse organs for Sox4 and Sox11 antisense transcript screening was carried out according to the standard mouse necropsy protocol accessible at the National Institute of Allergy and Infectious Diseases (NIAID) website [110].

SAGE libraries and analysis of tags
Ten SAGE libraries were constructed from the cerebral cortex of E15.5, E17.5 and 4-to 6-month-old (Ad) mice according to either one of the two methods described previously [25,26], using I-SAGE™ or I-SAGE™ Long Kits (Invitrogen, Mulgrave, Victoria, Australia). Additional libraries from E15.5 and P1.5 of the cerebral cortex described previously [27] were also included in the analysis. These libraries contain a total of 26,436 traces. SAGE tags were preprocessed -that is, TAGs were extracted and corrected for sequencing errors, and artifacts like SAGE linkers, ribosomal RNA and duplicated ditags were removed using the 'sagenhaft' package, which is available from the Bioconductor website [111,112]. To compare libraries that contain long tags with those that contain short tags, all short tags were mapped to the existing long tags from the other libraries. A table for all libraries containing the unique long or short tags was generated and redundant tags were removed. Only tags with a total count >2 (across all libraries) were considered for subsequent comparisons. Each unique tag was mapped to the mouse genome using ESTgraph, which employs ESTs and their genomic position information. ESTgraph was created by Tim Beissbarth (unpublished) [113]. Identity was assigned to these tags and they were further grouped into the following categories: matching to a gene, a genomic sequence, or an EST, or ambiguous matches or no alignment at all. All annotations were based on the latest mouse assembly (mm9 released in July 2007) accessible from the UCSC Genome Bioinformatics website [63].

Identification of differentially expressed tags
Library comparisons were performed using two methods.
Fisher's exact test was used to compare two individual SAGE libraries. In the analysis, multiple testing correction [114] was carried out to control for false-discovery rate and adjusted Pvalue cutoffs (Q-values) were used to select DETs. In cases where several libraries were combined to focus on a specific biological comparison (for example, different stages of development), a Bayesian model, as described previously [115], was used to integrate multiple libraries in pairwise comparisons involving biological replicates of libraries. The model accounts for within-class variability by means of mixture distributions. The resulting E-values were used to select DETs. A table of all relevant comparisons, the comparison method and Q-or E-value cutoffs is provided in Table S1 in Additional data file 1.

Hierarchical clustering of SAGE tags
To identify co-regulated genes, the clustering of DETs was performed based on the log2 of normalized counts. Each library was normalized to 100,000 tags per library to account for size differences. A pseudocount of 0.5 was added before taking the log2 of the normalized tag counts. The tag-wise mean was subtracted from the log2 tag intensities before computing the Euclidean distance of the individual tag profiles. Hierarchical clustering was performed on the tags using the 'hclust' function and complete linkage, which was implemented using the statistical computing environment of R [116].

Genomic clustering of SAGE tags
To assess whether there was any genomic clustering of tags, a method previously described [117] was adopted. In brief, first gene lists (based on all DETs in both pairwise and multiple library comparisons as well as gene lists from the hierarchical clustering analysis) were selected. The genomic clustering of either of these selections was compared to the total unique tag list (all 25,165 unique tags). The tags were mapped to the mouse genome. The number of selected tags in ten consecutive tag positions for each window of the chromosome was calculated. One thousand permutations were used to compute the null distribution of maximum tag counts per window. The method was implemented using the statistical computing environment of R [116].

Functional classification and characterization of DETs Gene Ontology enrichment analysis
The DET lists generated from various comparisons were subjected to systematic functional annotation using the standardized Gene Ontology term analysis tools at the DAVID [30]. Functional clustering was performed using high stringency with a kappa similarity threshold of 0.85 and a minimum term overlap of 3. Classification was carried out using a multiple linkage threshold of 0.5 with both numbers of initial and final group members set to 3. A term was considered statistically significant when the computed P-value was < 0.05. All queries were performed in September 2009.

Molecular interactions and pathway analysis
Identification of molecular network interactions and pathway analysis of validated DETs or co-regulated genes was completed using the IPA [118] tools from Ingenuity Systems ® (Redwood City, California, USA). Accession numbers for all genes with their corresponding fold changes or normalized counts were imported into the IPA software. No focus genes were set at the beginning of the analysis. To start building networks, the application queries the list of input genes and all other gene objects stored in the Ingenuity knowledge base. Networks with a maximum of 30 genes or proteins were constructed, and scores were computed based on the likelihood of the genes being connected together due to random chance. A score of 2 indicates that there is a 1/100 chance that these genes are connected in a network due to random chance. Therefore, any networks with a score of 2 or above are considered statistically significant (with >99% confidence). The most significant novel networks and their interactions with existing canonical pathways were investigated further.

Relative quantification using a standard curve method
The crossing point (Cp) from each signal was calculated based on the Second Derivative Maximum method [121]. A set of serially diluted cDNAs was used to construct a four-data point standard curve for every PCR system in each run. A total of three reference genes (from Hprt1, Psmb2, phosphoglycerate kinase 1 gene (Pgk1) or Hmbs) were used as endogenous controls. An estimated starting amount of each target gene was calculated and intra-samples multiple reference genes normalization was performed (Section G in Additional data file 1). A linear model was fitted to the time course of expression values for each gene. Genes differentially expressed between the various stages of development or regions were selected using empirical Bayesian moderated t-statistics, which borrow information between genes [122]. Standard errors for the mean expression at various developmental stages were obtained from the linear model. For each comparison, P-values were adjusted using the Benjamini and Hochberg [114] method to control the false discovery rate. See Section H in Additional data file 1 for the R code used to execute the analysis.

Strand specific RT-PCR
All RNA was prepared as described above. Total RNA from all developmental stages (N = 3 per developmental stage) was equally pooled prior to cDNA synthesis. Four first strand cDNA synthesis reactions were prepared for each cluster as follows: with a primer complementary to the sense strand only; with a primer complementary to the antisense strand only; with oligo-d [T] 15 as a positive control; and without any primers as a negative control. In all four reactions, both primers were added in subsequent PCRs (Section G in Additional data file 1). PCR amplifications were carried out using Fast-Start PCR High Fidelity System (Roche Diagnostics, Castle Hill, New South Wales, Australia) according to the manufacturer's protocol. More than one primer set was used in the sense-antisense strand specific RT-PCR (Additional data file 8).

RACE
First strand cDNA synthesis was carried out using pooled total RNA extracted from three biological replicates of rostral and caudal E15.5 and whole E15.5, E17.5, P1.5 and adult (5 to 6 months old) cerebral cortices. Oligo-d [T] 15 with an adaptor sequence (5'-TACGACGTCTGCTAGGACTG-3') was used to prime the first strand cDNA synthesis. Second strand synthesis or PCR was then carried out using a strand-specific primer and the adaptor primers (Additional data file 8). All specific primers were designed to be complementary to the SAGE tags or their upstream sequences. PCR amplifications (Section G in Additional data file 1) were carried out using FastStart PCR High Fidelity System (Roche Diagnostics, Castle Hill, New South Wales, Australia) according to the manufacturer's protocol.

Southern blotting analysis
Amplified 3' RACE products were transferred to Hybond N +TM (GE Healthcare, Rydalmere, New South Wales, Australia) nylon membrane using the neutral transfer method. Prehybridization and hybridization steps were performed in Rapid-Hyb buffer (GE Healthcare, Rydalmere, New South Wales, Australia) according to the manufacturer's protocol. All oligonucleotides were designed to be complementary to sequence between the specific primer-priming site and the tag of interest. Synthetic oligonucleotides were 5' end-labeled using T4 polynucleotide kinase (Promega, Alexandria, New South Wales, Australia) and [-32 P]ATP (GE Healthcare, Rydalmere, New South Wales, Australia) with modifications to the manufacturer's protocol. After the hybridization step, the membrane was washed with 5× sodium chloride sodium citrate solution (with 0.1% v/v sodium dodecyl sulphate (SDS)) and 1× sodium chloride sodium citrate solution (with 0.1% w/v SDS) (Section G in Additional data file 1). See Additional data file 8 for primer sequences and oligonucleotides used for detection.

Northern blotting analysis
Independent preparations of total RNA from the cerebral cortex of seven mice at E15.5 and E17.5, and three adult mice were equally pooled to a final concentration of 20 g per developmental stage. These pooled total RNAs were electro-phoresed overnight and capillary transferred onto Hybond N +TM (GE Healthcare, Rydalmere, New South Wales, Australia) nylon membrane. Double-stranded DNA probes were radioactive-labeled using the Amersham Megaprime DNA Labeling System (GE Healthcare, Rydalmere, New South Wales, Australia) and [-32 P]CTP, according to the manufacturer's protocol. Hybridization was carried out overnight (approximately 18 h) at 65°C in aqueous buffer (7% w/v SDS with 0.5 M phosphate). After hybridization, blots were washed using 1% w/v SDS at 65°C for 5 to 6 times until the background signal was low.

Strand-specific RT-qPCR
Total RNA was extracted from harvested organs using the TRIzol ® 's reagent (Invitrogen, Mulgrave, Victoria, Australia) according to the manufacturer's protocol. To avoid genomic DNA contamination, all isolated total RNA was treated with the recombinant DNAse I enzyme provided by the DNA-free™ kit (Applied Biosystems, Scoresby, Victoria, Australia) according to the manufacturer's protocol. First strand cDNA synthesis was carried out using strand-specific primers followed by qPCR analysis as described above.

Embryonic neural stem cells grown as neurospheres
Mouse used for generation of neurospheres had a mixed genetic background including MF1, 129SvEv, C57BL/6 and CBA. Cerebral cortices from E14 embryos were dissected out into calcium-magnesium-free phosphate-buffered saline. The tissue was mechanically dissociated and centrifuged. The cells were plated in complete neuroculture medium (Section G in Additional data file 1) for 4 days followed by induction of neuronal differentiation. These cells were then plated on poly-D-lysine (catalogue number P6407, Sigma Aldrich, Castle Hill, New South Wales, Australia) and laminin (catalogue number 23017-015, Invitrogen, Mulgrave, Victoria, Australia) coated culture dishes in neuroculture medium with the presence of 2% (v/v) fetal bovine serum but not epidermal growth factor and basic fibroblast growth factor. The differentiation was allowed to proceed for 5 days. Total RNA was extracted from both proliferating and differentiating cells using TRIzol ® reagent as described above.

P19 embryonal carcinoma cells
P19 mouse embryonal carcinoma cells were cultured and differentiated into neurons as described previously [124].
Briefly, P19 cell cultures were maintained in P19GM complete medium (Section G in Additional data file 1). For induction of neuronal differentiation, 1 × 10 6 P19 cells were cultured in suspension form using bacteriological Petri dishes. The P19GM medium with additional supplementation of 5 × 10 -7 M all-trans retinoic acid (catalogue number R-2625; Sigma Aldrich, Castle Hill, New South Wales, Australia) was used for the induction. After 4 days, P19 cells formed embryoid body stages. Embryoid bodies were collected from suspension cultures and re-plated in adherent culture flasks in the P19GM medium with only 5% (v/v) fetal bovine serum and without retinoic acid supplementation. The cells were allowed to differentiate for 5 days. Total RNA was extracted from both proliferating and differentiating cells using TRIzol ® reagent as described in above.

Authors' contributions
KHL performed all the SAGE validation experiments. CAH, PZC and SST procured the mouse cerebral cortex and constructed the SAGE libraries. KHL, TB, LH and GKS designed, performed and supervised the SAGE, RT-qPCR and IPA analyses. KHL and TT performed all the ISH studies. KHL, KB, PSC, CNH and PQT carried out the expression studies on Sox4 and Sox11 transcripts. KHL, CAH and CNH drafted the manuscript. CAH, GKS, TT and HSS conceived of the study, and participated in its design and coordination. All authors read and approved the final manuscript.

Additional data files
The following additional data are available with the online version of this paper: analysis of SAGE, DETs, IPA, Sox4 and Sox11 genomic cluster analysis, and R script for implementing empirical Bayesian moderated t-test on multiple groups (Additional data file 1); SAGE tag information for 561 DETs (Additional data file 2); functional annotations clustering analysis using DAVID (Additional data file 3); RT-qPCR validation of DETs based on multiple comparisons between two developmental stages (E versus Ad, PN1.5 versus Ad and E15.5 versus PN1.5) (Additional data file 4); RT-qPCR validation of gene clusters based on hierarchical clustering analysis (Additional data file 5); RT-qPCR validation of DETs based on the rostral versus caudal E15.5 cerebral cortex comparison (Additional data file 6); statistically significant over-represented genomic loci based on genomic clustering of tags (Additional data file 7); list of primers, probes, clones and assays designed for RT-qPCR, RACE, Southern, Northern and ISH analysis (Additional data file 8).