- Open Access
Large-scale approaches for glycobiology
Genome Biologyvolume 6, Article number: 236 (2005)
Glycosylation, the attachment of carbohydrates to proteins and lipids, influences many biological processes. Despite detailed characterization of the cellular components that carry out glycosylation, a complete picture of a cell's glycoconjugates remains elusive because of the challenges inherent in characterizing complex carbohydrates. This article reviews large-scale techniques for accelerating progress in glycobiology.
The problem with sugars
Glycobiology - the study of carbohydrates in biology - combines expertise in synthetic and analytical chemistry and carbohydrate biochemistry, as well as molecular and cellular biology, to unravel the structural complexity, chemistry, biosynthesis, and biological functions of sugar-bearing biomolecules. Over the past three decades, complex carbohydrates have become widely recognized as more than just an energy source . Indeed, glycosylation has been established as a ubiquitous post-translational modification in higher organisms that enables one protein (or lipid) to function as many, and provides structural diversity that offers an explanation for the unexpectedly low number of genes in the human genome . Complex sugars are major players in numerous biological processes, including developmental biology, the immune response and inflammatory disease, cell proliferation and apoptosis, the pathogenesis of infectious agents including prions, viruses, and bacteria, and a wide range of diseases ranging from rare congenital disorders to diabetes and cancer.
The incredible complexity of a cell's glycosylation machinery and its final products, a vast array of oligosaccharides (Figure 1), provides a research challenge in urgent need of high-throughput, large-scale technologies. Unfortunately, methods for studying and manipulating complex carbohydrates lag behind the tremendous advances made for nucleic acids and proteins . Progress has been sluggish, in part because many biologists were slow to recognize the importance of sugars. But even when prescient researchers sought to uncover the role of glycosylation they were often frustrated by the difficulty of characterizing carbohydrates and the near impossibility of manipulating them with precision in living cells. In this article, we give a brief overview of the overriding factor hindering glycobiology - the incredible complexity of carbohydrates - before describing current technologies available for studying glycosylation and concluding with a guarded, but optimistic, prediction that glycobiology will catch up with other areas of biochemistry and molecular biology largely by virtue of promising large-scale technologies that are now on the horizon.
Unraveling the biosynthetic glycosylation machinery
Although many recent developments in 'glycomics' focus on structural and functional analysis of surface-displayed sugars, the biosynthetic machinery that builds these complex molecules also greatly interests the glycobiologist. We briefly discuss carbohydrate biosynthesis here, both to acknowledge the heroic researchers who laid an impressive foundation without benefit of large-scale technologies and to illustrate the need for high-throughput strategies to accelerate progress. We use the term glycosylation machinery to describe biochemical pathways that convert monosaccharides (for example, dietary glucosamine) into nine different high-energy sugar-nucleotide building blocks (for example, UDP-N-acetylglucosamine (UDP-GlcNAc)) and assemble them into the complex oligosaccharides found on proteins and lipids (Figure 1). Basic components of this metabolic factory were discovered in a painstakingly slow, one-at-a-time process over many decades (for a detailed perspective, see the fascinating historical overview by Saul Roseman ). Traditional biochemical studies from the 1950s to the 1970s identified many small-molecule metabolites and characterized the enzymatic activities that link them into metabolic pathways. Once metabolites were arranged into putative pathways, the next requirement was to match genes with enzymatic activities; this formidable task was tackled, primarily one gene at a time, by elegant but time-consuming methods such as the forward genetic screens developed in the 1970s, and by the DNA cloning and recombinant gene expression strategies that became routine in the 1980s . More recently, RNA-inhibition techniques have begun to yield insights into glycosylation by downregulating individual genes .
Around 2% of human genes are involved in glycosylation, as judged from the most recent developments in large-scale biology, primarily the sequencing of the human genome coupled with predictive algorithms for gene function. This information, along with 'metabolomic' methods for large-scale characterization of small-molecule metabolites , has sped up the placement of the finishing touches on the framework of the glycosylation machinery. Almost all its metabolic components are known and have been assembled into well defined pathways, as can be seen by following the links for 'Carbohydrate metabolism' and 'Glycan biosynthesis and metabolism' in the Kyoto Encyclopedia of Genes and Genomes, KEGG . A static picture of glycosylation does not, however, reflect dynamic moment-by-moment, developmental, and disease-related metabolic fluctuations, nor does it provide much insight into subcellular organization and organelle topography, which are critical factors in shaping final oligosaccharide structures . In the future, computational 'systems biology' promises to bring the glycosylation machinery to life  and thereby offers insights into repairing glycosylation abnormalities associated with widespread diseases, including diabetes  and cancer .
Structures of sugars have long fascinated chemists and biologists, beginning with Emil Fischer's landmark efforts to decipher the isoforms of hexoses more than a century ago . Since then, even with modern techniques, biologists have been outpaced by the difficulty of obtaining a glycosylation profile - the specific complement of glycoconjugates present - of even a single cell. To illustrate that there is no simple task in carbohydrate analysis, Figure 1 shows a few biologically significant glycoconjugates. Even the addition of a single N-acetylglucosamine moiety to a protein to give the O-GlcNAc modification, which regulates numerous biochemical pathways by acting in a yin-yang manner with phosphorylation  (Figure 1c), is complicated by its occurrence on hundreds of different cytosolic and nuclear proteins, and on multiple sites within a single protein. The various biological activities of glycosphingolipids, relatively simple sugar-bearing biomolecules exemplified by the ganglioside GM3 (Figure 1d), demonstrate that very subtle changes to sialic acid (N-acetylneuraminic acid or Sia), an unusual nine-carbon sugar found in more than 50 different chemically distinct forms , can regulate apoptosis, senescence, and proliferation, thereby highlighting the need for careful analysis of fine structural details.
Moving to larger glycoconjugates, prions are glycosylated proteins that possess only two sites where oligosaccharides attach (Figure 1a). Even so, any one of several dozen different sugar chains can reside at either site; consequently, prions exist as hundreds of distinct entities. The discovery of the influence of carbohydrates on prion infectivity and on the development of spongiform encephalopathies [14, 15] underlines the importance of fully defining structural heterogeneity of this kind. As a final example, the heavily glycosylated cell-surface glycoprotein CD34 (Figure 1a), found on hematopoietic cells and epithelial cells, serves as a developmental marker for hematopoietic cells, mediates leukocyte homing, and contributes to cancer metastasis. It bears 20 or more separate oligosaccharide chains , implying that, if ten different oligosaccharide structures randomly occur at each site (a conservative estimate), 1020 different forms of CD34 can exist and each of the approximately 104 to 105 copies of this protein found in a typical cell has a reasonable probability of being unique.
Conventional glycosylation profiling
Only recently has methodology advanced sufficiently to obtain complete glycosylation profiles of glycoconjugates such as prions or CD34 (Figure 2). To briefly summarize today's technology, a plethora of mass spectrometry (MS) methods are becoming affordable and user-friendly [17, 18], pulsed-amperometric detection methodology is making the separation of carbohydrates by high-pressure liquid chromatography (HPLC) attractive, increasingly sensitive nuclear magnetic resonance (NMR) technology is allowing this powerful technique of structure determination and identification to be applied to glycoconjugates isolated from natural sources, and lectins are finding new uses as detection agents for carbohydrates in chromatography and protein arrays [19–21]. Excellent reviews provide a detailed picture of how different methodologies are coalescing into a powerful set of tools for sophisticated and highly sensitive investigation of glycoconjugates [22, 23].
While the isolation and characterization of highly complex glycoproteins are impressive feats, the sobering reality is that only a handful of the thousands of different glycoconjugates in the human body have been analyzed so far, which leaves the enormous carbohydrate diversity of even a single cell unknown in molecular detail. To further complicate matters, glycosylation profiles are not static, but rapidly change as cells differentiate, undergo apoptosis, or become diseased. Today's technologies are inadequate for determining the dynamic glycosylation profile of a cell and fall well short of the ultimate goal of glycomics - the evaluation of an entire organism. To dispel the gloom, however, underlying technologies for innovative, large-scale glycomic techniques are developing rapidly - both by bringing new techniques to carbohydrate analysis and by refining established methods to increase throughput. These two approaches, exemplified by array-based technologies and the automation of mass spectrometry, respectively, are discussed below.
Development of high-throughput technologies for glycomics
The success of DNA microarrays, on which thousands of discrete interactions are observed at once, has spawned array-based methods for confronting almost every problem. Carbohydrate analysis is no exception, and two array-based strategies are now being pursued. The more mature approach - which has reached the point of using robotic microspotting - involves attaching hundreds of different oligosaccharides of known composition to a surface, and is used to identify binding partners (Figure 3) [24–26]. This approach reproduces the 'glycocode' found on the cell surface and helps determine how biological systems decode the vast information-carrying capacity of carbohydrates . In a second type of array, carbohydrate-binding proteins such as lectins are arrayed on the surface. This technique, made possible by protein-array printing techniques that avoid altering the recognition capacity of proteins, has recently been demonstrated in concept for a modestly sized lectin array . In the future, when the hundreds of lectins now available, as well as the growing number of antibodies that bind specific glycan structures, are incorporated, such arrays will facilitate the rapid profiling of cellular glycosylation states.
Conventional methods, including chromatography or two-dimensional gel electrophoresis, used in proteomics to separate proteins isolated from a cell or tissue (Figure 2), are rapidly and effectively being adapted for oligosaccharide characterization . In contrast to microarrays, identification is not inherent in these techniques, necessitating a reliance on mass spectrometry for identification of glycoconjugates after separation; mass spectrometry is extremely sensitive, allowing minute amounts of samples isolated from biological samples or purified by capillary electrophoresis or two-dimensional gels to be identified successfully . Unfortunately, the need to isolate individual oligosaccharides by chromatography or electrophoresis prior to mass spectrometry, and the lack of automated identification algorithms, limits the throughput of these methods, leading to techniques such as fluorescence differential gel electrophoresis (DIGE ), that do not characterize all products and settle for the less ambitious goal of identifying a limited number of molecules that differ between two samples (for example, healthy versus diseased tissue) . To overcome the bottleneck of identification, much effort is being put into developing automated, high-throughput computational tools for the interpretation of glycoconjugate mass spectra [23, 32].
Chemistry and glycomics
Chemical tools have been vitally important for the development of large-scale glycomics. These range from automated synthesis  to development of chemoselective coupling reactions  that facilitate attachment of oligosaccharides to arrays [35, 36] and underlie high-sensitivity methods for isolating sugars from biological extracts [29, 37]. Another increasingly important contribution of chemists is the synthesis of abiotic monosaccharide analogs that are used in oligosaccharide-engineering strategies based on metabolic substrates. This approach exploits the unusual permissiveness of certain biochemical pathways involved in carbohydrate biosynthesis to accommodate non-natural metabolic intermediates . By intercepting a targeted pathway with an analog, it is possible to install abiotic, chemically distinct sugars into mature glycoconjugates. The incorporation of azide-modified analogs of sialic acid into the B-lymphocyte surface glycoprotein CD22, an important modulator of B-lymphocyte activity, provided a recent example of this technique's ability to discover new insights into biological roles of glycosylation: photoaffinity cross-linking of the azide-modified sialic acid allowed in situ identification of a potentially important modulator of B-cell activity - previously unappreciated homomeric binding among neighboring CD22 molecules .
An adaptation of the tagging-via-substrate (TAS) proteomics approach  is now transforming metabolic oligosaccharide engineering into a high-throughput technology. TAS technology involves the biosynthetic incorporation of an azide functional group into the design of a basic building block such as an amino acid  or monosaccharide , followed by isolation of labeled biomolecules via this chemical tag. In a pioneering study, N-azidoacetylglucosamine, an analog of GlcNAc, was used to tag O-GlcNAc-labeled proteins . The subsequent identification of around 25 O-GlcNAc-modified proteins in the brain established a biochemical link between O-GlcNAc modification and neuronal signaling, synaptic plasticity, and gene expression . Of equal importance, this study provides a precedent for expanding the TAS strategy to other tissues and for applying it to uncover subtle metabolic differences between healthy and diseased cells.
Towards high-throughput glycobiology
In conclusion, the hope for an increased pace of discovery in glycobiology, where progress has lagged because "carbohydrates are complex" , lies in several large-scale technologies now in the early stages of development. Continued progress is not without its problems. For example, the current versions of arrays contain only a very small fraction of all the carbohydrates found in nature . A second issue is that the exact presentation of oligosaccharides is often important to achieve the 'cluster glycoside effect', whereby carbohydrate-binding interactions are specified by multiple simultaneous interactions that achieve both specificity and avidity [44, 45]. Today's methods of attaching carbohydrates to an array, whereby they are spotted onto inflexible flat surfaces that have very different biophysical properties from the flexible peptide backbone of, say, CD34 (Figure 1a) or the spherical geometry of highly branched dendrimers , are unlikely to faithfully reproduce physiological binding.
Other nascent high-throughput methods, such as the automation of mass spectrometry, must also overcome significant barriers. The use of mass spectrometry in glycomics, for instance, is hampered in various ways: glycan databases are incomplete; that is, many of the oligosaccharides found in nature have not yet been isolated and characterized by mass spectrometry; the structural complexity of oligosaccharides limits current identification algorithms to structures of less than ten monosaccharides; and the identification of the correct oligosaccharide from many isomeric options remains a challenge . Mass spectrometry must also overcome its aversion to sialic acids. In the past, this structurally diverse , negatively charged sugar has typically been removed to simplify analysis; the critical role of sialic acid in modulating the bioactivity of GM3 (Figure 1e) is but one of numerous examples that insist that this sugar cannot continue to be ignored. To end optimistically, these challenges, although appearing daunting today, will be overcome in the near future - within two to three years in one prediction  - if scientific curiosity and the potential multibillion dollar market for therapeutic glycoproteins continue to accelerate the current pace of technological development.
Rees DA: Shapely polysaccharides. The eighth Colworth Medal Lecture. Biochem J. 1972, 126: 257-273.
Bertozzi CR, Kiessling LL: Chemical glycobiology. Science. 2001, 291: 2357-2364. 10.1126/science.1059820.
Roseman S: Reflections on glycobiology. J Biol Chem. 2001, 276: 41527-41542. 10.1074/jbc.R100053200.
Stanley P, Raju TS, Bhaumik M: CHO cells provide access to novel N-glycans and developmentally regulated glycosyltransferases. Glycobiology. 1996, 6: 696-699.
Masson E, Troncy L, Ruggiero D, Wiernsperger N, Lagarde M, El Bawab S: a-Series gangliosides mediate the effects of advanced glycation end products on pericyte and mesangial cell proliferation: a common mediator for retinal and renal microangiopathy?. Diabetes. 2005, 54: 220-227.
Buchholz A, Hurlebaus J, Wandrey C, Takors R: Metabolomics: quantification of intracellular metabolic dynamics. Biomol Eng. 2002, 19: 5-15. 10.1016/S1389-0344(02)00003-5.
KEGG: Kyoto Encyclopedia of Genes and Genomes. [http://www.genome.jp/kegg/pathway.html]
Roth J: Protein N-glycosylation along the secretory pathway: relationship to organelle topography and function, protein quality control, and cell interactions. Chem Rev. 2002, 102: 285-303. 10.1021/cr000423j.
Murrell MP, Yarema KJ, Levchenko A: The systems biology of glycosylation. Chembiochem. 2004, 5: 1334-1347. 10.1002/cbic.200400143.
Dennis JW, Granovsky M, Warren CE: Protein glycosylation in development and disease. BioEssays. 1999, 21: 412-421. 10.1002/(SICI)1521-1878(199905)21:5<412::AID-BIES8>3.0.CO;2-5.
Kunz H: Emil Fischer - unequalled classicist, master of organic chemistry research, and inspired trailblazer of biological chemistry. Angew Chem Int Ed Engl. 2002, 41: 4439-4451. 10.1002/1521-3773(20021202)41:23<4439::AID-ANIE4439>3.0.CO;2-6.
Zachara NE, Hart GW: The emerging significance of O-GlcNAc in cellular regulation. Chem Rev. 2002, 102: 431-438. 10.1021/cr000406u.
Angata T, Varki A: Chemical diversity in the sialic acids and related α-keto acids: an evolutionary perspective. Chem Rev. 2002, 102: 439-469. 10.1021/cr000407m.
Rudd PM, Endo T, Colominas C, Groth D, Wheeler SF, Harvey DJ, Wormald MR, Serban H, Prusiner SB, Kobata A, Dwek RA: Glycosylation differences between the normal and pathogenic prion protein isoforms. Proc Natl Acad Sci USA. 1999, 96: 13044-13049. 10.1073/pnas.96.23.13044.
Lawson VA, Collins SJ, Masters CL, Hill AF: Prion protein glycosylation. J Neurochem. 2005, 93: 793-801. 10.1111/j.1471-4159.2005.03104.x.
Lanza F, Healy L, Sutherland DR: Structural and functional features of the CD34 antigen: an update. J Biol Regul Homeost Agents. 2001, 15: 1-13.
Zaia J: Mass spectrometry of oligosaccharides. Mass Spectrom Rev. 2004, 23: 161-227. 10.1002/mas.10073.
Sagi D, Kienz P, Denecke J, Marquardt T, Peter-Katalinic J: Glycoproteomics of N-glycosylation by in-gel deglycosylation and matrix-assisted laser desorption/ionisation-time of flight mass spectrometry mapping: application to congenital disorders of glycosylation. Proteomics. 2005, 5: 2689-2701. 10.1002/pmic.200401312.
Qiu R, Regnier FE: Use of multidimensional lectin affinity chromatography in differential glycoproteomics. Anal Chem. 2005, 77: 2802-2809. 10.1021/ac048751x.
Pilobello KT, Krishnamoorthy L, Slawek D, Mahal LK: Development of a lectin microarray for the rapid analysis of protein glycopatterns. ChemBioChem. 2005, 6: 985-989. 10.1002/cbic.200400403.
Hirabayashi J: Lectin-based structural glycomics: glycoproteomics and glycan profiling. Glycoconjug J. 2004, 21: 35-40. 10.1023/B:GLYC.0000043745.18988.a1.
Mechref Y, Novotny MV: Structural investigations of glycoconjugates at high sensitivity. Chem Rev. 2002, 102: 321-369. 10.1021/cr0103017.
Morelle W, Michalski J-C: Glycomics and mass spectrometry. Curr Pharm Des. 2005, 11: 2615-2645. 10.2174/1381612054546897.
Shin I, Park S, Lee M-r: Carbohydrate microarrays: an advanced technology for functional studies of glycans. Chemistry. 2005, 11: 2894-2901. 10.1002/chem.200401030.
Blixt O, Head S, Mondala T, Scanlan C, Huflejt ME, Alvarez R, Bryan MC, Fazio F, Calarese D, Stevens J, et al: Printed covalent glycan array for ligand profiling of diverse glycan binding proteins. Proc Natl Acad Sci USA. 2004, 101: 17033-17038. 10.1073/pnas.0407902101.
Feizi T, Fazio F, Chai W, Wong C-H: Carbohydrate microarrays - a new set of technologies at the frontiers of glycomics. Curr Opin Struct Biol. 2003, 13: 637-645. 10.1016/j.sbi.2003.09.002.
Gabius H-J, Siebert H-C, André S, Jiménez-Barbero J, Rüdiger H: Chemical biology of the sugar code. ChemBioChem. 2004, 5: 740-764. 10.1002/cbic.200300753.
Zhou Q, Park SH, Boucher S, Higgins E, Lee K, Edmunds T: N-Linked oligosaccharide analysis of glycoprotein bands from isoelectric focusing gels. Anal Biochem. 2004, 335: 10-16. 10.1016/j.ab.2004.07.028.
Nishimura S-I, Niikura K, Kurogochi M, Matsushita T, Fumoto M, Hinou H, Kamitani R, Nakagawa H, Deguchi K, Miura N, et al: High-throughput protein glycomics: combined use of chemoselective glycoblotting and MALDI-TOF/TOF mass spectrometry. Angew Chem Int Ed Engl. 2004, 44: 91-96. 10.1002/anie.200461685.
Tonge R, Shaw J, Middleton B, Rowlinson R, Rayner S, Young J, Pognan F, Hawkins E, Currie I, Davison M: Validation and development of fluorescence two-dimensional differential gel electrophoresis proteomics technology. Proteomics. 2001, 1: 377-396. 10.1002/1615-9861(200103)1:3<377::AID-PROT377>3.3.CO;2-Y.
Block TM, Comunale MA, Lowman M, Steel LF, Romano PR, Fimmel C, Tennant BC, London WT, Evans AA, Blumberg BS, et al: Use of targeted glycoproteomics to identify serum glycoproteins that correlate with liver cancer in woodchucks and humans. Proc Natl Acad Sci USA. 2005, 102: 779-784. 10.1073/pnas.0408928102.
Tang H, Mechref Y, Novotny MV: Automated interpretation of MS/MS spectra of oligosaccharides. Bioinformatics. 2005, 21 (Suppl 1): i431-i439. 10.1093/bioinformatics/bti1038.
Borman S: Carbohydrate advances. Chem Eng News. 2005, 83: 41-50.
Peri F, Nicotra F: Chemoselective ligation in glycochemistry. Chem Commun (Camb). 2004, 21: 623-627. 10.1039/b308907j.
Ratner DM, Adams EW, Disney MD, Seeberger PH: Tools for glycomics: Mapping interactions of carbohydrates in biological systems. ChemBioChem. 2004, 5: 1375-1383. 10.1002/cbic.200400106.
Love KR, Seeberger PH: Carbohydrate arrays as tools for glycomics. Angew Chem Int Ed Engl. 2002, 41: 3583-3586. 10.1002/1521-3773(20021004)41:19<3583::AID-ANIE3583>3.0.CO;2-P.
Niikura K, Kamitani R, Kurogochi M, Uematsu R, Shinohara Y, Nakagawa H, Deguchi K, Monde K, Kondo H, Nishimura S-I: Versatile glycoblotting nanoparticles for high-throughput protein glycomics. Chemistry. 2005, 11: 3825-3834. 10.1002/chem.200401289.
Kayser H, Zeitler R, Kannicht C, Grunow D, Nuck R, Reutter W: Biosynthesis of a nonphysiological sialic acid in different rat organs, using N-propanoyl-D-hexosamines as precursors. J Biol Chem. 1992, 267: 16934-16938.
Han S, Collins BE, Bengtson P, Paulson JC: Homomultimeric complexes of CD22 revealed by in situ photoaffinity proteinglycan crosslinking. Nat Chem Biol. 2005, 1: 93-97. 10.1038/nchembio713.
Kho Y, Kim SC, Jiang C, Barma D, Kwon SW, Cheng J, Jaunbergs J, Weinbaum C, Tamanoi F, Falck J, Zhao Y: A tagging-via-substrate technology for detection and proteomics of farnesylated proteins. Proc Natl Acad Sci USA. 2004, 101: 12479-12484. 10.1073/pnas.0403413101.
Saxon E, Bertozzi CR: Cell surface engineering by a modified Staudinger reaction. Science. 2000, 287: 2007-2010. 10.1126/science.287.5460.2007.
Vocadlo DJ, Hang HC, Kim E-J, Hanover JA, Bertozzi CR: A chemical approach for identifying O-GlcNAc-modified proteins in cells. Proc Natl Acad Sci USA. 2003, 100: 9116-9121. 10.1073/pnas.1632821100.
Khidekel N, Ficarro SB, Peters EC, Hsieh-Wilson LC: Exploring the O-GlcNAc proteome: direct identification of O-GlcNAc-modified proteins from the brain. Proc Natl Acad Sci USA. 2004, 101: 13132-13137. 10.1073/pnas.0403471101.
Lundquist JJ, Toone EJ: The cluster glycoside effect. Chem Rev. 2002, 102: 555-578. 10.1021/cr000418f.
Kiessling LL, Pohl S: Strength in numbers: non-natural polyvalent carbohydrate derivatives. Chem Biol. 1996, 3: 71-77. 10.1016/S1074-5521(96)90280-X.
Hanover JA: Glycan-dependent signaling: O-linked N-acetylglucosamine. FASEB J. 2001, 15: 1865-1876. 10.1096/fj.01-0094rev.