Large-scale approaches for glycobiology
© BioMed Central Ltd 2005
Published: 3 November 2005
Skip to main content
© BioMed Central Ltd 2005
Published: 3 November 2005
Glycosylation, the attachment of carbohydrates to proteins and lipids, influences many biological processes. Despite detailed characterization of the cellular components that carry out glycosylation, a complete picture of a cell's glycoconjugates remains elusive because of the challenges inherent in characterizing complex carbohydrates. This article reviews large-scale techniques for accelerating progress in glycobiology.
Glycobiology - the study of carbohydrates in biology - combines expertise in synthetic and analytical chemistry and carbohydrate biochemistry, as well as molecular and cellular biology, to unravel the structural complexity, chemistry, biosynthesis, and biological functions of sugar-bearing biomolecules. Over the past three decades, complex carbohydrates have become widely recognized as more than just an energy source . Indeed, glycosylation has been established as a ubiquitous post-translational modification in higher organisms that enables one protein (or lipid) to function as many, and provides structural diversity that offers an explanation for the unexpectedly low number of genes in the human genome . Complex sugars are major players in numerous biological processes, including developmental biology, the immune response and inflammatory disease, cell proliferation and apoptosis, the pathogenesis of infectious agents including prions, viruses, and bacteria, and a wide range of diseases ranging from rare congenital disorders to diabetes and cancer.
Although many recent developments in 'glycomics' focus on structural and functional analysis of surface-displayed sugars, the biosynthetic machinery that builds these complex molecules also greatly interests the glycobiologist. We briefly discuss carbohydrate biosynthesis here, both to acknowledge the heroic researchers who laid an impressive foundation without benefit of large-scale technologies and to illustrate the need for high-throughput strategies to accelerate progress. We use the term glycosylation machinery to describe biochemical pathways that convert monosaccharides (for example, dietary glucosamine) into nine different high-energy sugar-nucleotide building blocks (for example, UDP-N-acetylglucosamine (UDP-GlcNAc)) and assemble them into the complex oligosaccharides found on proteins and lipids (Figure 1). Basic components of this metabolic factory were discovered in a painstakingly slow, one-at-a-time process over many decades (for a detailed perspective, see the fascinating historical overview by Saul Roseman ). Traditional biochemical studies from the 1950s to the 1970s identified many small-molecule metabolites and characterized the enzymatic activities that link them into metabolic pathways. Once metabolites were arranged into putative pathways, the next requirement was to match genes with enzymatic activities; this formidable task was tackled, primarily one gene at a time, by elegant but time-consuming methods such as the forward genetic screens developed in the 1970s, and by the DNA cloning and recombinant gene expression strategies that became routine in the 1980s . More recently, RNA-inhibition techniques have begun to yield insights into glycosylation by downregulating individual genes .
Around 2% of human genes are involved in glycosylation, as judged from the most recent developments in large-scale biology, primarily the sequencing of the human genome coupled with predictive algorithms for gene function. This information, along with 'metabolomic' methods for large-scale characterization of small-molecule metabolites , has sped up the placement of the finishing touches on the framework of the glycosylation machinery. Almost all its metabolic components are known and have been assembled into well defined pathways, as can be seen by following the links for 'Carbohydrate metabolism' and 'Glycan biosynthesis and metabolism' in the Kyoto Encyclopedia of Genes and Genomes, KEGG . A static picture of glycosylation does not, however, reflect dynamic moment-by-moment, developmental, and disease-related metabolic fluctuations, nor does it provide much insight into subcellular organization and organelle topography, which are critical factors in shaping final oligosaccharide structures . In the future, computational 'systems biology' promises to bring the glycosylation machinery to life  and thereby offers insights into repairing glycosylation abnormalities associated with widespread diseases, including diabetes  and cancer .
Structures of sugars have long fascinated chemists and biologists, beginning with Emil Fischer's landmark efforts to decipher the isoforms of hexoses more than a century ago . Since then, even with modern techniques, biologists have been outpaced by the difficulty of obtaining a glycosylation profile - the specific complement of glycoconjugates present - of even a single cell. To illustrate that there is no simple task in carbohydrate analysis, Figure 1 shows a few biologically significant glycoconjugates. Even the addition of a single N-acetylglucosamine moiety to a protein to give the O-GlcNAc modification, which regulates numerous biochemical pathways by acting in a yin-yang manner with phosphorylation  (Figure 1c), is complicated by its occurrence on hundreds of different cytosolic and nuclear proteins, and on multiple sites within a single protein. The various biological activities of glycosphingolipids, relatively simple sugar-bearing biomolecules exemplified by the ganglioside GM3 (Figure 1d), demonstrate that very subtle changes to sialic acid (N-acetylneuraminic acid or Sia), an unusual nine-carbon sugar found in more than 50 different chemically distinct forms , can regulate apoptosis, senescence, and proliferation, thereby highlighting the need for careful analysis of fine structural details.
Moving to larger glycoconjugates, prions are glycosylated proteins that possess only two sites where oligosaccharides attach (Figure 1a). Even so, any one of several dozen different sugar chains can reside at either site; consequently, prions exist as hundreds of distinct entities. The discovery of the influence of carbohydrates on prion infectivity and on the development of spongiform encephalopathies [14, 15] underlines the importance of fully defining structural heterogeneity of this kind. As a final example, the heavily glycosylated cell-surface glycoprotein CD34 (Figure 1a), found on hematopoietic cells and epithelial cells, serves as a developmental marker for hematopoietic cells, mediates leukocyte homing, and contributes to cancer metastasis. It bears 20 or more separate oligosaccharide chains , implying that, if ten different oligosaccharide structures randomly occur at each site (a conservative estimate), 1020 different forms of CD34 can exist and each of the approximately 104 to 105 copies of this protein found in a typical cell has a reasonable probability of being unique.
While the isolation and characterization of highly complex glycoproteins are impressive feats, the sobering reality is that only a handful of the thousands of different glycoconjugates in the human body have been analyzed so far, which leaves the enormous carbohydrate diversity of even a single cell unknown in molecular detail. To further complicate matters, glycosylation profiles are not static, but rapidly change as cells differentiate, undergo apoptosis, or become diseased. Today's technologies are inadequate for determining the dynamic glycosylation profile of a cell and fall well short of the ultimate goal of glycomics - the evaluation of an entire organism. To dispel the gloom, however, underlying technologies for innovative, large-scale glycomic techniques are developing rapidly - both by bringing new techniques to carbohydrate analysis and by refining established methods to increase throughput. These two approaches, exemplified by array-based technologies and the automation of mass spectrometry, respectively, are discussed below.
Conventional methods, including chromatography or two-dimensional gel electrophoresis, used in proteomics to separate proteins isolated from a cell or tissue (Figure 2), are rapidly and effectively being adapted for oligosaccharide characterization . In contrast to microarrays, identification is not inherent in these techniques, necessitating a reliance on mass spectrometry for identification of glycoconjugates after separation; mass spectrometry is extremely sensitive, allowing minute amounts of samples isolated from biological samples or purified by capillary electrophoresis or two-dimensional gels to be identified successfully . Unfortunately, the need to isolate individual oligosaccharides by chromatography or electrophoresis prior to mass spectrometry, and the lack of automated identification algorithms, limits the throughput of these methods, leading to techniques such as fluorescence differential gel electrophoresis (DIGE ), that do not characterize all products and settle for the less ambitious goal of identifying a limited number of molecules that differ between two samples (for example, healthy versus diseased tissue) . To overcome the bottleneck of identification, much effort is being put into developing automated, high-throughput computational tools for the interpretation of glycoconjugate mass spectra [23, 32].
Chemical tools have been vitally important for the development of large-scale glycomics. These range from automated synthesis  to development of chemoselective coupling reactions  that facilitate attachment of oligosaccharides to arrays [35, 36] and underlie high-sensitivity methods for isolating sugars from biological extracts [29, 37]. Another increasingly important contribution of chemists is the synthesis of abiotic monosaccharide analogs that are used in oligosaccharide-engineering strategies based on metabolic substrates. This approach exploits the unusual permissiveness of certain biochemical pathways involved in carbohydrate biosynthesis to accommodate non-natural metabolic intermediates . By intercepting a targeted pathway with an analog, it is possible to install abiotic, chemically distinct sugars into mature glycoconjugates. The incorporation of azide-modified analogs of sialic acid into the B-lymphocyte surface glycoprotein CD22, an important modulator of B-lymphocyte activity, provided a recent example of this technique's ability to discover new insights into biological roles of glycosylation: photoaffinity cross-linking of the azide-modified sialic acid allowed in situ identification of a potentially important modulator of B-cell activity - previously unappreciated homomeric binding among neighboring CD22 molecules .
An adaptation of the tagging-via-substrate (TAS) proteomics approach  is now transforming metabolic oligosaccharide engineering into a high-throughput technology. TAS technology involves the biosynthetic incorporation of an azide functional group into the design of a basic building block such as an amino acid  or monosaccharide , followed by isolation of labeled biomolecules via this chemical tag. In a pioneering study, N-azidoacetylglucosamine, an analog of GlcNAc, was used to tag O-GlcNAc-labeled proteins . The subsequent identification of around 25 O-GlcNAc-modified proteins in the brain established a biochemical link between O-GlcNAc modification and neuronal signaling, synaptic plasticity, and gene expression . Of equal importance, this study provides a precedent for expanding the TAS strategy to other tissues and for applying it to uncover subtle metabolic differences between healthy and diseased cells.
In conclusion, the hope for an increased pace of discovery in glycobiology, where progress has lagged because "carbohydrates are complex" , lies in several large-scale technologies now in the early stages of development. Continued progress is not without its problems. For example, the current versions of arrays contain only a very small fraction of all the carbohydrates found in nature . A second issue is that the exact presentation of oligosaccharides is often important to achieve the 'cluster glycoside effect', whereby carbohydrate-binding interactions are specified by multiple simultaneous interactions that achieve both specificity and avidity [44, 45]. Today's methods of attaching carbohydrates to an array, whereby they are spotted onto inflexible flat surfaces that have very different biophysical properties from the flexible peptide backbone of, say, CD34 (Figure 1a) or the spherical geometry of highly branched dendrimers , are unlikely to faithfully reproduce physiological binding.
Other nascent high-throughput methods, such as the automation of mass spectrometry, must also overcome significant barriers. The use of mass spectrometry in glycomics, for instance, is hampered in various ways: glycan databases are incomplete; that is, many of the oligosaccharides found in nature have not yet been isolated and characterized by mass spectrometry; the structural complexity of oligosaccharides limits current identification algorithms to structures of less than ten monosaccharides; and the identification of the correct oligosaccharide from many isomeric options remains a challenge . Mass spectrometry must also overcome its aversion to sialic acids. In the past, this structurally diverse , negatively charged sugar has typically been removed to simplify analysis; the critical role of sialic acid in modulating the bioactivity of GM3 (Figure 1e) is but one of numerous examples that insist that this sugar cannot continue to be ignored. To end optimistically, these challenges, although appearing daunting today, will be overcome in the near future - within two to three years in one prediction  - if scientific curiosity and the potential multibillion dollar market for therapeutic glycoproteins continue to accelerate the current pace of technological development.