Anatomical terms
GXD has extensive experience with the Mouse Embryo Anatomy Nomenclature Database, available through Theiler Stage (TS) 26, which is used by GXD and EMAGE to describe developmental gene expression patterns. Based on our annotation work, we continue to contribute to this ontology in the form of extensions and revisions, and by adding synonyms. Consequently, an early objective was to ensure that the anatomy ontology for the postnatal (TS 28) mouse corresponds as much as possible, both in content and in structure, with the developmental ontology. This was done for consistency of nomenclature, because we were familiar with and confident of the utility of this format, and to facilitate the future integration of these ontologies. Eventually, the goal is to combine and integrate the ontologies to generate an anatomy ontology covering the entire lifespan of the laboratory mouse.
With the developmental ontology as its framework, the effort was then focused on compiling an extensive list of anatomical terms for the postnatal mouse. The list was based on a number of major sources, including mouse atlases as well as anatomy and histology text resources [7–22]. For the most part, the preference was to focus on those that were mouse-specific. However, others that were more general were nevertheless extremely valuable. The non-atlas format references were especially useful in the effort to refine anatomic and histological details.
Once the basic list of terms had been generated, we confirmed that each term on the initial list represented actual mouse structures. These determinations were usually clear but at times ambiguous. For example, for numerous structures described in anatomy and histology textbooks, no clear documented evidence was found for their existence in the mouse. Consequently, these have not been included in the ontology. Further work is ongoing to ensure accuracy. Careful attention was paid to validating each term, with the requirement for two or more reliable sources whenever possible. Concurrent with the textbook-based identification of terms was the continuing effort to expand the vocabulary using a research data-driven approach. This method included extensive evaluation of published biomedical research literature, as well as data with anatomical attributes that have been collected in scientific databases. For example, several mouse-specific datasets [23–26] were used as resources to find pertinent anatomical terms. The MGI list of all mouse tissues from which major publicly available cDNA libraries have been generated [24] includes cell types and tumors, as well as gross anatomical concepts. The relevant anatomical structures will eventually be translated using terms from the Adult Mouse Anatomical Dictionary. The data-driven approach was especially useful in determining the level of granularity (that is, level of detail of spatial resolution) expected to be required by users of the ontology.
An additional consideration in determining the content of the vocabulary had to do with whether to include cell types. While cell type information is an important component in anatomical descriptions, this also introduces a level of complexity that is difficult to address adequately. We felt that it would be unfeasible to extend the representation to the cellular level owing to the large number of required hierarchical levels and leaf nodes. Therefore, it was concluded that the adult mouse anatomy ontology would not contain cell types, but that cell type terms would eventually be provided by the orthogonal controlled vocabulary for cell types currently being developed as part of the Open Biological Ontologies (OBO) effort [27]. However, to conform to the Edinburgh developmental ontology, we have included tissue type terms such as epithelium and mesenchyme, as well as defined cell type structures such as purkinje cell layer. In addition, we have also elected to include the term unfertilized egg and its synonyms.
Hierarchical organization
The anatomy ontology for mouse development is currently structured as a straight hierarchy. In this format, an anatomical term can have only one parent and, thus, one place in the hierarchy. For example, the term femur is placed in the hierarchy according to this limb bone's spatial location, as a substructure of the upper leg, rather than as a part of the skeleton. In contrast, the brain is described as being part of the central nervous system, rather than as a part of the head. Based on our experience with the developmental ontology and anticipating planned revisions for it, we decided to represent the adult mouse anatomy ontology as a DAG, in which a given anatomical term is able to have more than one hierarchical parent. This allowed us flexibility in organizing the hierarchies, and provided a mechanism to create a more comprehensive view of the relationships between the anatomical terms.
For each of the anatomical terms being evaluated, any one of a number of pathways to that term could be conceptualized. However, it also soon became apparent that two fundamental characteristics could be determined for most of the terms: its spatial location within the animal and its functional contribution as part of a particular organ system. Consequently, we decided to use the distinction between spatial versus organ system representation as an organizational principle. Since 'spatial part' does not itself represent a unique anatomical entity, it was not included as an independent node in the ontology. However, the initial division of the hierarchy into spatial and organ system components is immediately apparent in the first level of substructures below the root node, TS28. As shown in Figure 1, this level is predominantly comprised of spatial parts: for example body, body cavity/lining, head/neck, limb and tail. Accordingly, terms defined by these superstructures are primarily organized according to spatial localization. In contrast, another branch of the hierarchy is indicated by the superstructure organ system, where the anatomical terms are organized, as much as possible, according to their respective contribution to a specified functional system.
Currently, the distinction between spatial and functional relationships is represented only implicitly. However, based on the parentage of anatomical structures, biologists will be able to intuitively discern both types of relationships. Furthermore, they should be able to perform most of the queries related to expression and phenotype data that are currently envisioned. Explicit representation of both relationship types might be a desirable feature for advanced knowledge representation and computational analysis. On the other hand, it might also introduce unnecessary complexities to a biologist because, for example, many anatomical structures would have both spatial and functional relationships between them. Shielding the user from those complexities would require additional software development. A careful evaluation of the advantages and disadvantages of both approaches will direct our future work in this area.
During the construction of the adult mouse anatomy DAG, we had to take into account the fact that terms representing some tissues would logically be spatially located in numerous parts of the ontology. Groups of tissues which meet this criteria include: blood vessel, bone, connective tissue, muscle, nerve, organ and skin, which are represented as terms in the organ system part of the hierarchy. To accommodate the need to represent these tissues in specific body regions, we devised modules (outlined as blocks in Figure 1) representing these generic groups. These have been included as subterms, when appropriate, within each spatial region. For nomenclature standardization (more on this below), the subgroup terms are preceded by superstructure name, in noun form (that is, abdomen) rather than as an adjective (for example, abdominal) whenever possible.
Consequently, using the DAG format, we have been able to describe adult mouse anatomy from a variety of spatial and organ system perspectives. For example, the heart (Figure 2) is represented as a type of thoracic cavity organ, as well as a substructure of the cardiovascular system. As will be discussed below, some of these distinctions are conceptual and by their nature may be somewhat arbitrary. However, from our annotation work we know that the different breakdowns of the anatomy are indeed required to annotate, for example, different types of expression and phenotype data. It should be emphasized that refinements to the hierarchical organization of the ontology will continue to be made. These changes will not affect the identity of the terms themselves.
Another issue in constructing the DAG was the use of is-a and part-of relationships between the terms. Overall, most of the relationships could be classified intuitively as part-of, indicating that the term is a component of the more general term above it in the tree. For example, the upper body is considered to be part-of the body, and the heart is part-of the cardiovascular system. In contrast, is-a relationships are used to indicate that an anatomical term represents an instance of the certain type or kind of the concept denoted by its parent term. For instance, the cardiovascular system is-a specific organ system, while cardiac muscle is-a type of muscle. It should be noted that there is no correlation between the is-a and part-of relationships and the spatial versus organ system organization of the ontology, as shown in Figure 2. Further refinement of the relationships will undoubtedly be required, as well as additional types of relationships. For example, it may be useful to distinguish between 'regional' parts (for instance, head, neck, limb) versus 'systemic' parts (for instance, body muscle, body organ, body skin). These modifications can be easily accomplished using the DAG-Edit tool (see Software section below).
Nomenclature considerations
Our experience with the mouse developmental ontology, as well as extensive literature review, provided the primary basis for the naming conventions that were employed. Early in building the ontology, we realized that consistent nomenclature, not only for a given term itself but for related terms and groups of terms, would be a critical requirement. Consequently, whenever possible, the same name was used for a given anatomical structure or concept throughout the ontology. For instance, we have used the term lung rather than 'pulmonary' to precede each of the terms representing lung substructures. Another consideration regarded the need to clearly distinguish between terms. It is theoretically possible to precisely define an anatomical term based on a combination of the term name and the hierarchical lineage of the term. The term epithelium, for example, is represented as a subterm for many anatomical structures, and a given term's precise identity could be defined by its parental lineage. From a practical standpoint, this convention has proved to be problematic; multiple structures with the same term name would be impossible to distinguish in absence of its hierarchical context. This would be complicated further by any additional pathway to a given term. For instance, epithelium of the lung alveoli is represented both as a part of the alveolus and as a type of lung epithelium. To address this issue, we have attempted to provide sufficient information in the term name (for example alveolus epithelium) so that it becomes easy to interpret and use the term unambiguously.
Other factors that were considered were the requirements of the DAG-Edit software (see below), as well as features promoting unambiguous identification of terms. Additional conventions employed for the naming of anatomical terms included: structure names are preceded by superstructure names, in noun form; terms are used in singular form, whenever possible; all term names at the same level in the hierarchy are ordered alpha-numerically; and all characters are in lower case. Nomenclature consistency will also facilitate querying for specific anatomical terms within the ontology.
Software issues
An ontology should contain a level of detail appropriate to the data being classified and the level at which queries are likely to be performed, while simultaneously providing sufficient flexibility to enable regular updating without needing to significantly modify the hierarchies. Therefore, we recognized that the adult mouse anatomy ontology would require a format that was both robust and flexible, as well as the tools to accommodate the need for maintenance and updating. The DAG-Edit tool developed by the Gene Ontology (GO) Consortium provides a graphical interface to handle any vocabulary that has a DAG data structure, and has been used by other groups to build ontologies for a wide range of biological subjects, including the GO [28] and Mammalian Phenotype ontology [29]. We have utilized DAG-Edit both for construction of the adult mouse anatomy ontology and for maintenance and editing. Furthermore, the MGI software group has developed a range of tools to handle a DAG-formatted ontology, enabling navigation through the ontology and querying for terms (see below), as well as integration of the ontology with other information stored within the MGI database.
Comments
View archived comments (1)