Overview of the Giotto toolbox
Giotto provides a comprehensive spatial analysis toolbox that contains two independent yet fully integrated modules (Fig. 1a, b). The first module (Giotto Analyzer) provides step-by-step instructions about the different steps in analyzing spatial single-cell expression data, whereas the second module (Giotto Viewer) provides a responsive and interactive viewer of such data on the user’s local computer. These two modules can be used either independently or iteratively.
Giotto Analyzer requires as minimal input a gene-by-cell count matrix and the spatial coordinates for the centroid position of each cell (Fig. 1a). At the basic level, Giotto Analyzer can be used to perform common steps often similar to scRNAseq analysis, such as pre-processing, feature selection, dimension reduction, and unsupervised clustering; on the other hand, the main strength comes from its ability to integrate gene expression and spatial information in order to gain insights into the structural and functional organization of a tissue and its expression patterns. To this end, Giotto Analyzer creates a spatial grid and neighborhood network connecting cells that are physically close to each other. These objects function as the basis to perform analyses that are associated with cell neighborhoods.
Giotto Analyzer is written in the popular language R. The core data structure is the Giotto object, which is specifically designed for spatial expression data analysis based on the flexible S4 object system in R. The Giotto object stores all necessary (spatial) information and is sufficient to perform all calculations and analyses (Additional file 1: Fig. S1A). This allows the user to quickly evaluate and create their own flexible pipeline for both spatial visualization and data analysis. The Giotto Viewer module is designed to both interactively explore the outputs of Giotto Analyzer and to visualize additional information such as cell morphology and transcript locations (Fig. 1b). Giotto Viewer provides an interactive workspace allowing users to easily explore the data in both physical and expression space and identify relationships between different data modalities. Taken together, these two modules provide an integrated toolbox for spatial expression data analysis and visualization.
The spatial omics field is diverse and rapidly expanding; each technology has its strength and weaknesses. In order to demonstrate the general applicability of Giotto, we selected and analyzed 10 public datasets obtained from 9 state-of-the-art technologies (Fig. 1c, Additional file 2: Table S1), which differ in terms of resolution (single-cell vs multiple cells), physical dimension (2D vs 3D), molecular modality (protein vs RNA), number of cells and genes, and tissue of origin. Throughout this paper, we use these datasets to highlight the rich set of analysis tools that are supported by Giotto.
Cell type identification and data visualization
Giotto Analyzer starts by identifying different cell types that are present in a spatial transcriptomic or proteomic dataset. As an illustrating example for the first common steps, we considered the seqFISH+ mouse somatosensory cortex dataset, which profiled 10,000 genes in hundreds of cells at single-cell resolution using super-resolved imaging [9]. The input gene-by-cell count matrix was first pre-processed through a sequence of steps including normalization, quality control of raw counts, and adjustment for batch effects or technical variations. Then downstream analyses were carried out for highly variable gene selection (Additional file 1: Fig. S1B), dimensionality reduction (such as PCA, tSNE [19], and UMAP [20]), and clustering (such as Louvain [21] and Leiden clustering algorithms [22]) (Additional file 1: Fig. S1C). Cluster-specific marker genes (Additional file 1: Fig. S1D-E) were identified through a number of algorithms (such as Scran [23] and MAST [24]) and a new algorithm based on Gini coefficients [25]. Whereas the strongest marker genes are typically identified by all three methods, each method has its own strength in detecting specific types of genes (Additional file 1: Fig. S2A-G, see Methods and Additional file 3: Supplementary Notes for details). In total, we identified 12 distinct cell types, including layer-specific excitatory neurons (eNeurons) (Syt17 in layer 2/3, Grm2 in layer 4, Islr2 in layer 5/6), two types of inhibitory neurons (iNeurons) (Lhx6 vs Adarb2), astrocytes (Gli3), oligodendrocytes (Plekhh), oligodendrocyte precursors (OPCs) (Sox10), endothelial cells (Cldn5), mural (Vtn), and microglia (Itgam) cells. The distribution of these cell types can then be visualized in both expression and physical space (Additional file 1: Fig. S1F).
Next, we analyzed additional complex imaging-based spatial transcriptomic datasets generated by merFISH [14], STARmap [7], and osmFISH [11]. In the merFISH dataset, 12 selected thin slices from a 3D mouse pre-optic cortex sample were imaged, resulting in a total of roughly 75,000 cells and 155 genes. Here Giotto was used to identify 8 distinct clusters. Based on known marker genes, we were able to annotate these clusters as microglia (Selplg), ependymal cells (Cd24a), astrocytes (Aqp4), endothelial cells (Fn1), mature (Mbp), and immature (Pdgfra) oligodendrocytes, excitatory (Slc17a6) and inhibitory (Gad1) neurons, respectively, which is in agreement with the original paper [14] (Fig. 2a, b, Additional file 3: Supplementary Notes). Cells that did not fall into these clusters were collectively assigned to an “ambiguous” group, as done in the original paper. Next, to visualize the results, Giotto can create an interactive 3D plot for the whole dataset or specifically highlight one or more selected 2D slices (Fig. 2c). Together with overlaying gene expression information (Fig. 2d, e), such visualization enables the user to explore tissue structure and concomitant gene expression variation in a detailed manner.
In a similar manner, we analyzed the mouse visual cortex STARmap (Additional file 1: Fig. S3 A-D) and mouse somatosensory cortex osmFISH (Additional file 1: Fig. S4 A-D) datasets (see Additional file 3: Supplementary Notes for details). Both datasets show the typical anatomical multi-layered structure of the cortex. In the 3D STARmap analysis, we present an additional functionality that allows the user to create 2D virtual sections of a 3D sample (Additional file 1: Fig. S3 A, C and E), which could be useful for more refined structural analysis, as demonstrated in our analysis of the STARmap dataset (see section “Giotto identifies distinct cellular neighborhoods and interactions” and Additional file 3: Supplementary Notes for details).
Due to the similarity of data structure, it is straightforward to apply Giotto to analyze large-scale spatial proteomic datasets such as CyCIF, CODEX, and MIBI (see Additional file 3: Supplementary Notes for details). As an illustrating example, we analyzed a public dataset obtained by CyCIF [10]. The dataset profiled the spatial distribution of 21 proteins and 3 cellular compartment or organelle markers at single-cell resolution in a human pancreatic ductal adenocarcinoma (PDAC) sample that spanned across three distinct tissues: the pancreas, small intestine, and tumor. In total, 160,000 cells were profiled. Giotto identified 13 coarse clusters which include mesenchymal, epithelial, immune, and cancer cells (Fig. 2f, g). Next, we zoomed into each tissue to refine the cell type structure in the pancreas and small intestine separately (Fig. 2h). For example, we can now see clearly that the pancreas is structured in distinct zones enriched with epithelial (E-cadherin) and mesenchymal or stromal (Vimentin) cells, respectively. On the other hand, the small intestine shows a clear proliferating zone (PCNA) and the activation of Wnt signaling (b-catenin) in intestinal epithelial cells (Fig. 2i, j). Both observations are consistent with the original paper [10]. Applying the same approach to analyze a mouse spleen dataset from CODEX [12] allowed us to identify zones enriched with CD8(+) T cells (Zone 1) and enriched with erythroblasts and F4/80 macrophages (Zone 2) (Additional file 1: Fig. S4 E-I). As such, the employment of Giotto to quickly zoom in to different regions is useful for uncovering the organization of spatial tissues or expression levels in a hierarchical manner.
Analysis of data with lower spatial resolution
Recently, a number of lower-resolution spatial transcriptomic technologies have been developed, such as 10X Genomics Visium [26], Slide-seq [8], and DBiT-seq [27]. Despite their lower spatial resolution, these technologies are useful because they are currently more accessible. To overcome the challenge of lower resolution, Giotto implements a number of algorithms for estimating the enrichment of a cell type in different regions (Fig. 3a). In this approach, a continuous value representing the likelihood of the presence of a cell type of interest is assigned to a spatial location which contains multiple cells. To this end, Giotto requires additional input representing the gene signatures of known cell types. Currently, the input gene signatures for the known cell types can either be provided by the user directly as cell type marker gene lists, or be automatically inferred by Giotto based on an additional scRNAseq data matrix input. Giotto then evaluates the match between each cell type’s gene signatures and the expression pattern at each spatial location and reports an enrichment score by using one of the three algorithms: PAGE [28], RANK, and Hypergeometric testing (Fig. 3a, see “Methods” for details). PAGE calculates an enrichment score based on the fold change of cell type marker genes for each spot. RANK does not require predefined marker genes but instead creates a full ranking of genes ordered by the cell-type specificity score in the scRNAseq data matrix, and computes a ranking-based statistic. Hypergeometric test computes a p value based on the overlap between each cell-type-specific marker gene set and the set of spot-specific genes, i.e., those that are expressed at significantly higher levels at certain spots than others (see “Methods”). As negative controls, enrichment scores are also calculated for scrambled spatial transcriptomic data. This allows us to evaluate the statistical significance of an observed enrichment score.
To rigorously evaluate the performance of these cell-type enrichment algorithms, we created a simulated dataset based on the aforementioned seqFISH+ dataset, for which the cell type annotation has been established at the single-cell resolution. To mimic the effect of spatial barcoding, such as that being used in Visium, the merged fields of view were divided into spot-like squares from a regular spatial grid (500 × 500 pixels, ~ 51.5 μm) (Fig. 3b). For cells located in each square, their gene expression profiles were averaged, thereby creating a new dataset with lower spatial resolution. To apply cell-type enrichment analysis, we obtained scRNAseq data and derived marker gene lists for somatosensory cortex associated cell types from a previous study [29]. To facilitate cross-platform comparison, we focused on the six major cell types that were annotated by both studies: astrocytes, microgila, endothelial mural, excitatory neurons, inhibitory neurons, and oligodendrocytes (Fig. 3b). For each cell type, we assigned an enrichment score and p value for each spot by using one of the three enrichment analysis methods mentioned above (Fig. 3c). To quantify the performance of each method, we evaluated the area under curve (AUC) score, which was obtained by using the ranking of enrichment score values to predict the presence of a cell type at each spot. Both PAGE and RANK provide high accuracy (median AUC = 0.95 and 0.96, respectively, Fig. 3c, Additional file 1: Fig. S5 and S6A). Even if a spot contains only one cell from a given type, it can often be identified (47 out of 67, ~ 70%). The only cell type that cannot be well predicted by this approach is inhibitory neurons, whose gene signatures are less distinct than others.
In comparison, the hypergeometric test and Spearman correlation methods are less accurate (median AUC = 0.86 and 0.72, respectively, Additional file 1: Fig. S5 and S6A, Additional file 3: Supplementary Notes). We also compared it with RCTD [30], which is a newly developed method for deconvolution. RCTD also performed well (median AUC value of 0.95, Additional file 1: Fig. S5. and S6A), but it was considerably slower than the other methods (Additional file 1: Fig. S6C). The four methods that performed well were also robust to changes in number of transcripts (UMIs) (Additional file 1: Fig. S6B).
Next, we applied cell-type enrichment analysis to a publicly available mouse brain Visium dataset (downloaded from https://www.10xgenomics.com/). Spatial transcriptomic information was obtained by using 2698 spatially barcoded array spots, each covering a circled area with 55 μm in diameter. To comprehensively perform enrichment analysis, cell type annotations and corresponding gene signatures were obtained from a public scRNAseq dataset [31]. Here, we applied PAGE to identify the spatial patterns of the major cell taxonomies identified previously [31]. We found that a number of cell types are spatially restricted to distinct anatomical regions (Fig. 3d, Additional file 1: Fig. S6D). The spatial patterns of the enrichment scores are consistent with the literature for a number of cell types, such as peptidergic cells, granule neurons, ependyma astrocytes, and medium spiny neurons (Fig. 3d) [32,33,34]. Similar but less obvious trends can be observed by inspecting the expression pattern of specific marker genes (Fig. 3d), which is consistent with the fact that cell types are typically defined by the concerted activities of multiple genes. Of note, the enrichment analysis also correctly predicted the absence of cell types that should not be present in the sample, such as cerebellum, olfactory bulb, and spinal cord cells (Additional file 1: Fig. S6D).
To test the general applicability of the enrichment analysis algorithms, we re-analyzed a Slide-seq dataset [8] (see Additional file 3: Supplementary Notes for details), where the read coverage is lower than Visium. This dataset profiles the mouse cerebellum, containing 21,000 beads and 10,500 genes at a coverage of 80 UMIs per bead. Cell-type gene signature information was obtained from a public scRNAseq dataset for a similar region [31]. Analysis of this scRNAseq dataset identified 15 different cell types. We applied the RANK method for enrichment analysis (Additional file 1: Fig. S7A) and noticed distinct spatial enrichment of cell types in the Slide-seq data that are consistent with prior knowledge. For example, the Purkinje cells were correctly mapped to the Purkinje layer, granule cells were correctly mapped to the nuclear layer, and GABAergic interneurons were mapped to the molecular layer (Additional file 1: Fig. S7B). For comparison, we also applied RCTD to analyze the same dataset and obtained similar results (Additional file 1: Fig. S7B, C).
Giotto uncovers different layers of spatial expression variability
A key component of Giotto Analyzer is the implementation of a wide range of computational methods to identify spatial patterns based on gene expression. On a fundamental level, Giotto Analyzer represents the spatial relationship among different cells as a spatial grid or network (Fig. 4a). To create a spatial grid, each image field is partitioned into regular squares and the gene expression patterns associated with cells within each square are averaged. As such, the spatial grid is a coarse-resolution representation of the data. A spatial network preserves single-cell resolution, and it is created by connecting neighboring cells through a Delaunay triangulation network (see “Methods”). As an alternative approach to create a spatial network, the user can create a spatial network by selecting the k-nearest neighbors or using a fixed distance cut-off, which allows the user to fine-tune the influence of neighboring cells in more downstream applications (Additional file 1: Fig. S8A). However, as shown in Additional file 3: Supplementary Notes, our analysis results are typically insensitive to the specific choice of parameter values.
A basic and often first important task in spatial transcriptomic or proteomic analysis is to identify genes whose expression displays a coherent spatial pattern. To this end, Giotto implements a number of methods, including SpatialDE [35], Trendsceek [36], SPARK [37], and two novel methods that are based on spatial network calculations. More specifically, the latter two methods are based on statistical enrichment of spatial network neighbors in the high gene expression state after binarization therefore named as BinSpect (Binary Spatial Extraction). The two methods, called BinSpect-kmeans and BinSpect-rank, respectively, differ in the way of binarization (Additional file 1: Fig. S8B, Methods). To evaluate the performance of these methods, we applied each to the seqFISH+ dataset, where many genes are expected to display layer-specific patterns that are consistent with the anatomical structure of the somatosensory cortex (Fig. 4b, Additional file 1: Fig. S8C). For each method, we selected the top 1000 genes as candidates for spatially coherent genes. Of note, a large subset of these genes was identified by at least four of the methods (Fig. 4c), these include previously established layer-specific genes such as Cux2, Grm2, and Rprm, indicating that genes with a known and strong spatially coherent expression pattern can be found in a robust manner. On the other hand, subsets of genes were detected by only one or a combination of specific method(s) (Additional file 1: Fig. S8D-G, see Additional file 3: Supplementary Notes), suggesting it may be beneficial to combine results from all methods for comprehensiveness. The main advantage of the BinSpect methods introduced here is that they are significantly faster compared to SpatialDE (~ 6–8×), SPARK (~ 29–45×), and Trendsceek (~ 816–3300×) (Additional file 1: Fig. S8H-I, see Additional file 3: Supplementary Notes).
Next, to assess how effective each method could retrieve known spatial patterns, we performed a quantitative evaluation based on simulated patterns. In this evaluation, we excluded trendsceek since its speed inhibited its use in large-scale simulation studies. Since each method was based on different assumptions or statistical models, we established an unbiased simulation strategy based on real high-quality data (seqFISH+), which did not rely on a simplified and arbitrary representation of spatial expression data, but instead incorporated all known and unknown variability factors observed in a large subset of genes (Additional file 1: Fig. S9, see Additional file 3: Supplementary Notes for details). These results show that BinSpect (kmeans and rank) perform systematically better at retrieving known spatial genes, especially when additional noise is introduced. This observation is consistent irrespective of the spatial expression pattern that is evaluated (Additional file 1: Fig. S8J).
Giotto implements two approaches to systematically summarize the spatial patterns of a large number of spatial genes (Fig. 4a). First, Giotto identifies spatial domains with coherent gene expression patterns by implementing our recently developed hidden Markov random field (HMRF) model [38]. An HMRF model detects spatial domains by comparing the gene expression patterns of each cell with its neighborhood to search for coherent patterns (see “Methods” for details). The inference is based on the joint probability of the intrinsic factor (expression pattern of each cell) and the extrinsic factor (domain state distribution of the surrounding cells) [38]. The analysis starts with the identification of spatial genes using one of the previously described methods. Then we apply our HMRF model to infer the spatial domain state for each cell or spot. In applying HMRF to the seqFISH+ dataset, our analysis identified 9 distinct spatial domains that were consistent with the anatomic layer structure (Fig. 4d). For example, Domain D7 is similar to Layer L1, and Domain D2 is similar to Layer L2/3. Of note, such layered structure is not completely reflected by the distribution of different cell types (Additional file 1: Fig. S1F), as numerous cell types (such as inhibitory neurons and endothelial cells) are distributed across multiple layers.
In addition, Giotto also implements a summary view of spatial gene expression patterns based on co-expression analysis. As an illustrating example, we analyzed a Visium dataset obtained from the kidney coronal section, which has known and distinguishable anatomic structures [39, 40]. Using the BinSpect-kmeans algorithm in Giotto, we selected the top 500 spatially coherent genes. To identify spatial patterns, we created a co-expression matrix as follows. First, we spatially smoothed the gene expression data through spatial neighbor averaging, and then created co-expression modules by clustering the spatially smoothed data (Fig. 4e). Next, the spatial pattern of each module was summarized by a metagene defined by averaging the expression of all associated genes, which were stored and visualized (Fig. 4f). These spatial metagene profiles resemble the known anatomical structures of the mouse kidney and its surrounding environment, which is further corroborated by the spatial co-expressed genes in each module (see Additional file 3: Supplementary Notes). Moreover, individual genes representing the co-expression patterns were easily extracted and displayed (Fig. 4g), providing researchers the opportunity to explore these spatial co-expression patterns in an unbiased manner on a transcriptome wide level. In addition, Giotto also provides a co-expression network based on single-cell expression data so that users can further filter or distinguish spatial co-expression within a local neighborhood from co-expression within the same cell. Finally, these global co-expression patterns are largely insensitive to the characteristics of the underlying spatial network (Additional file 1: Fig. S10A-D, see Additional file 3: Supplementary Notes for details).
Giotto identifies distinct cellular neighborhoods and interactions
Most cells reside within complex tissue structures, where they can communicate with their neighboring cells through specific molecules and signaling pathways. Hence, gene expression within each cell is likely driven by the combination of an intrinsic (cell-type-specific) component and an extrinsic component mediated by cell-cell communications (Fig. 5a). Giotto Analyzer provides a number of tools to explore and extract information related to the cell neighborhood organization, cell-cell communication, and the effect of neighboring cell types on gene expression. To identify distinct cell-type/cell-type interacting patterns, Giotto evaluates the enrichment of the frequency that each pair of cell types is proximal to each other. When analyzing the seqFISH+ somatosensory cortex data, we observed that layer-specific neurons usually interact with each other, which agrees with the known multi-layered organization of the cortex (Fig. 5b). Such homo-typic (same cell types) relationships are in agreement with what has been observed by others, including in other tissues [12, 41]. Here we also notice that astrocytes and oligodendrocytes, L2/3 and L4 excitatory neurons, and L5 and L6 excitatory neurons form frequent hetero-typic (two different cell types) interactions. This is again in line with the expected anatomical structure of the cortex, due to positioning of the cortex layers and the increased presence of astrocytes and oligodendrocytes close to where they originate in the subventrical zone (Additional file 1: Fig. S1F and Additional file 3: Fig. 3F). These observations are robust to changes in number of spatial neighbors (k) (Additional file 1: Fig. S11A, see Additional file 3: Supplementary Notes) and are furthermore observed in both the seqFISH+ and osmFISH somatosensory cortex datasets (Additional file 1: Fig. S11B, see Additional file 3: Supplementary Notes).
To extend this type of analysis to less defined tissues, we also analyzed a public MIBI dataset profiling the spatial proteomic patterns in triple negative breast cancer (TNBC) patients [13]. Over 200,000 cells from 41 patients were analyzed together to generate over 20 cell populations (Additional file 1: Fig. S12A). Of note, the preferred mode of hetero-typic cell-type interactions is highly patient specific (Additional file 1: Fig. S12B, C). For example, in patients 4 and 5, the Keratin-marked epithelial cells and immune cells are well segregated from each other, whereas patients 10 and 17 feature a rather mixed environment between T cells, Keratin, and Ki67 cancer cells. (Additional file 1: Fig. S12B, C). These observations are consistent with prior findings [13].
Giotto builds further on the concept of interacting cell types and aims to identify which known ligand-receptor pairs show increased or decreased co-expression, as a reasonable proxy for activity, in two cell types that spatially interact with each other (Fig. 5c). By creating a background distribution through spatially aware permutations (see “Methods”), Giotto can identify which ligand-receptor pairs are potentially more or less active when cells from two cell types are spatially adjacent to each other. By comparing with a spatially unaware permutation method, similar as previously done [42], we can see that the predictive power is limited without spatial information (AUC = 0.43) (Fig. 5c,d). This analysis is relatively stable to different numbers of spatial neighbors (k) within the spatial network (Additional file 1: Fig. S11C, see Additional file 3: Supplementary Notes) and is observed for multiple ligand-receptor pairs spread out over multiple cell type pairs (Fig. 5e). Two potential examples of increased co-expression of a ligand-receptor pair are seen in spatially interacting astrocytes and Lhx6+ inhibitory neurons displaying increased expression of Ddr2-Col1a1 and Bmp6-Bmpr1b in corresponding endothelial cells and oligodendrocytes, respectively (Fig. 5f).
More generally, Giotto implements a number of statistical tests (t-test, limma, Wilcoxon, and a spatial permutation test) to identify genes whose expression level variation within a cell type is significantly associated with an interacting cell type (see “Methods”). After correcting for multiple hypothesis testing, we identified 73 such genes (|log2 FC| > 2 and FDR < 0.1), which we refer to as the interaction changed genes (ICGs). These ICGs are distributed among different interacting cell type pairs (Additional file 1: Fig. S11D). For example, we noticed that endothelial cells interacting with Lhx6 iNeuron were associated with increased expression of Jakmip1 and Golgb1, whereas both Dact2 and Ddx27 expression levels were increased in cells from the same cell type but interacting with L4 eNeurons (Fig. 5g). On the opposite direction, interaction with astrocytes was associated with decreased expression of Abl1 and Zswim8. Of note, all these subsets of endothelial cells do not show any difference in expression of their known marker genes, such as Pltp, Cldn5, and Apcdd1 (Fig. 5g, Additional file 1: Fig. S2D).
Giotto Viewer: interactive visualization and exploration of spatial transcriptomic data
Giotto Viewer is designed for interactive visualization and exploration of spatial transcriptomic data. Compared to the figure outputs from Giotto Analyzer, the objective of Giotto Viewer is to provide an interactive and user-friendly workspace where the user can easily explore the data and integrate the results from various analyses from Giotto Analyzer, and further incorporate additional information that cannot be easily quantified, such as cell staining images.
Giotto Viewer is a web-based application running in a local environment. It supports a multi-panel view of the spatial expression data. Each panel can be configured to display either the cells in physical or the expression space and overlays gene expression information on top. Complex geometries such as the 2D cell morphology and the associated large antibody staining images of the cells can be toggled easily within each panel. We use a Google Map-like algorithm to facilitate efficient navigation of large data (in terms of either images or cell numbers, see “Methods”). Importantly, panels are interlinked and interactive through sharing of cell ID and annotation information (see “Methods”). This allows seamless integration of different views and facilitates synchronous updates across all panels. We were able to apply Giotto Viewer to display over 500,000 data points (or mRNA transcripts) within a group of cells on one screen without any problem. Indicating the Giotto viewer is capable of handling large datasets.
As an illustrating example, we used Giotto Viewer to visualize the Visium brain dataset. By default, Giotto Viewer creates two panels, representing the data in physical and gene expression space, respectively (Fig. 6a, left). Any property that is contained in a Giotto object, such as gene expression levels, spatial cell-type enrichment values, cell-type or spatial domain annotations, can be selected for visualization. Additional imaging-related information, such as cell staining and segmentation, can also be overlaid. The size and location of field of view can be easily adjusted via the zoom and pan functions. At one end of the spectrum, the image content at each single spot can be visualized (Fig. 6a, right), revealing the underlying H&E staining pattern. An animated video is provided to illustrate how the user can interactively explore the data and high-level annotations (Additional file 4: Supplementary Video).
Additional file 4: Supplementary Video. Supplementary video demonstrating the usage and options of Giotto Viewer.
To demonstrate the utility of Giotto Viewer for exploring and integrating a large amount of information generated by Giotto Analyzer, we used the aforementioned seqFISH+ dataset again. Through the analysis described above, we identified various annotations such as cell types, spatially coherent genes, and spatial domains. Therefore, it is of interest to compare the cell type and spatial domain annotation to investigate their relationship. To this end, we created four interlinked panels corresponding to cell type and spatial domain annotations represented in the physical and expression space, respectively (Fig. 6b). The view of these panels can be synchronously updated through zoom and pan operations, enabling the user to easily explore the data and inspect any area of interest as desired. For example, as the user zooms in to the L1–L2/3 region (Fig. 6c), it becomes apparent that domain D7 consists of a mixture of cell types including astrocytes, microglias, and interneurons. Giotto Viewer also provides a lasso tool that allows users to select cells of interest for further analyses. The borders of the selected cells are highlighted and can be easily traced across different panels. As an example, cells from domain D7 are selected and highlighted (Fig. 6c). By inspecting the pattern in the interlinked panels, it becomes obvious that this domain contains cells from multiple cell types. As such, both cell type and spatial domain differences contribute to cellular heterogeneity.
To gain further insights into the difference between cell type and spatial domain annotations, we saved the selected cells to an output file. The corresponding information was directly loaded into Giotto Analyzer for further analysis. This allowed us to identify a number of additional marker genes, such as Cacng3 and Scg3 (Fig. 6d). The seamless iteration between data analysis and visualization is a unique strength of Giotto.
In addition, Giotto Viewer also provides the functionality to explore subcellular transcript or protein localization patterns. As an example, we used Giotto Viewer to visualize the exact locations of individual transcripts in selected cells from the seqFISH+ dataset (Fig. 6e, Additional file 1: Fig. S12). To facilitate real-time exploration of the transcript localization data, which is much larger than other data components, we adopted a position-based caching of transcriptomic data (see “Methods”). From the original staining image (Additional file 1: Fig. S13A), the users can zoom in on any specific region or select specific cells and visualize the locations of either all detected transcripts (Additional file 1: Fig. S13B) or selected genes of interest (Additional file 1: Fig. S13C). The spatial extent of all transcripts is useful for cell morphology analysis (Additional file 1: Fig. S13B), whereas the localization pattern of individual genes may provide functional insights into the corresponding genes (Additional file 1: Fig. S13C). For example, transcripts of Snrnp70 and Car10 are preferentially localized to the cell nucleus (delineated by DAPI background), while Agap2 and Kif5a transcripts are distributed closer to the cell periphery (Additional file 1: Fig. S13C).