Metabolite profiling in plant biology: platforms and destinations
© BioMed Central Ltd 2004
Published: 18 May 2004
Skip to main content
© BioMed Central Ltd 2004
Published: 18 May 2004
Optimal use of genome sequences and gene-expression resources requires powerful phenotyping platforms, including those for systematic analysis of metabolite composition. The most used technologies for metabolite profiling, including mass spectral, nuclear magnetic resonance and enzyme-based approaches, have various advantages and disadvantages, and problems can arise with reliability and the interpretation of the huge datasets produced. These techniques will be useful for answering important biological questions in the future.
Genes and genomes can be routinely sequenced, the resulting information stored, accessed and analyzed, and organisms with altered gene expression produced. Use of these resources requires powerful phenotyping platforms, including approaches for the systematic analysis of metabolite composition. Whereas the chemistry of nucleic acids is relatively simple and uniform, there are tens of thousands of metabolites, with an immense range of types of structure. This has led to a plethora of different extraction, separation and detection systems for different groups of metabolically important compounds. Researchers have typically measured a handful of metabolites, chosen on the basis of assumptions about what was relevant and the technical capacity of their laboratory. But now, in parallel with the development of genome-wide gene-expression arrays, there has been a shift to an 'unbiased' approach to metabolite analysis.
It is helpful to distinguish between metabolite fingerprinting, metabolite profiling and metabolomics. Metabolic fingerprinting is the application of a broad analytic technology to discover some big differences between two samples, for example two different genotypes. It provides information that helps to orientate a research project. Metabolite profiling is the measurement of hundreds or potentially thousands of metabolites. It requires a streamlined pipeline for extraction, separation and analysis, so that large numbers of metabolites can be measured in a robust and quantitative manner while in the presence of the extraordinarily complex mixture of chemicals ('matrix') that is found in cellular extracts. Metabolomics, in the strict sense, is the measurement of all metabolites in a given system. It is not yet technically possible, and will probably require a platform of complementary technologies, because no single technique is comprehensive, selective, and sensitive enough to measure them all . This article provides an overview of technologies for metabolite profiling, discusses problems relating to the reliability and interpretation of the huge datasets these technologies produce, and outlines how they can be used to answer important questions in plant biology in the future.
In gas chromatography coupled to mass spectrometry (GC-MS), compounds are separated by GC and then transferred online to the mass spectrometer for further separation and detection. This combines two strongly complementary technologies: GC can separate metabolites that have almost identical mass spectra (such as isomers), while MS provides fragmentation patterns that differentiate between co-eluting, but chemically diverse, metabolites. GC-MS provides quantitative information and is widely used for clinical diagnostics  and large-scale profiling of complex biological samples [3–5]. It has six important component steps.
Preparation of an extract should be as non-selective and comprehensive as possible. But treatments that stabilize one set of metabolites often lead to degradation or modifications of others. Furthermore, it may be necessary to separate fractions so as to profile trace metabolites when the sample is dominated by a small number of highly concentrated metabolites [6, 7].
Derivatization is necessary to render metabolites volatile, and so amenable to GC-MS. There is an extensive toolbox of chemical reagents for GC-MS derivatization, including alkylating, acylating and silylating reagents . At present, trimethylsilylation is the favored choice . In contrast to other reagents, which are in part highly specific for chemical moieties of certain metabolite classes, trimethylsilylation uses the most comprehensive reagent and thus complies best with the requirements of a non-biased metabolite profiling.
Highly standardized conditions are needed for separation of metabolites by GC, because slight changes in gas-flow conditions, temperature programming and the type of capillary column affect chromatographic retention, and can even alter the order in which compounds are eluted .
Three sorts of mass-detection device are used in GC-MS couplings: single quadrupole detectors (QUAD), ion-trap technology (TRAP), and time-of-flight detectors (TOF; see Box 1 for further details). The throughput of GC-QUAD-MS systems (10-20 samples per day) resembles that of typical high-performance liquid chromatography (HPLC) applications. GC-TRAP-MS technology has a similar throughput, but includes reaction monitoring (MSn) capability, in which a predefined fragment mass is sampled (parent ion trapping) and subjected to secondary fragmentation to generate daughter fragments. This increases selectivity and suppresses chemical 'noise', an advantage for the analysis of trace compounds in complex samples [6, 7]. It also aids in the identification of compounds. GC-TOF-MS systems allow higher throughput (10-50 scans per second, allowing 30-40 samples per day).
Compounds are identified by matching their chromatographic retention times and mass-spectral fragmentation patterns to known and predicted information available in databases . Typical GC-QUAD-MS software requires expert knowledge about the characteristic fragment masses and retention time windows of each metabolite, and of the pitfalls that can lead to misidentification. Accumulation of experience is time consuming, but is aided by creating mass-spectral and retention-time index reference libraries for all routinely occurring metabolites. This manually supported process requires around 2 minutes per metabolite, allowing about 20 chromatogram files to be evaluated per day (with increasing numbers of metabolites this is the rate-limiting step of sample analysis). A major advantage of GC-TOF-MS systems (such as the GC-TOF-Pegasus II MS from Leco Corp Inc., St. Joseph, USA) is their enhanced software capability [9, 10], which supports automated and comprehensive extraction of all mass spectra from a chromatogram, in-built mass-spectral correction for co-eluting metabolites, calculation of retention-time indices, and automated picking of a suitable fragment mass for selective quantification. GC-TOF-MS has the potential to be truly non-biased and fully automated with respect to metabolite identification, but at present it still requires expert input to correct inappropriate assignments.
Overall, about two days are needed to carry a batch of 50 samples through extraction and derivatization steps . Analysis and evaluation of one sample (derivatization, separation by GC and ionization) requires 60-75 minutes by GC-QUAD-MS or 35-45 minutes by GC-TOF-MS. The throughput is exceeded only by fingerprinting technologies, or targeted analyses of single metabolites. The major bottleneck is the evaluation and manual check for misidentified metabolites. GC-MS provides exact absolute quantification of the level of a given metabolite in a concentration range of up to four orders of magnitude, provided that appropriate external and internal standardization has been carried out. Each step during extraction, preparation and analysis can introduce general and substance-specific losses, however, and these can vary with the biological material. Ideally, each compound should be standardized using a stable-isotope-labeled isotopomer that is differentially detectable by mass spectrometry, or a xenobiotic stereoisomer that is distinguishable by its chromatographic properties [4, 11]
The other major limitation of GC-MS is that most peaks are still unidentified. This is a tribute to the high sensitivity and resolution of capillary GC-MS, but a frustration for the biologist. Some 'unknowns' may be analytes generated during extraction and sample preparation, or by fragmentation in the MS step, but others may be important and even novel metabolites. Their identification is therefore an important activity, which cumulatively increases the power of GC-MS platforms. The straightforward approach involves addition to the battery of authenticated standard metabolites that interest the biologist. Elucidating the identity of a peak of interest is more difficult, however, because GC-MS is destructive and usually does not generate enough pure substance for structural elucidation (for example, the upper microgram to milligram range required for offline nuclear magnetic resonance, NMR).
Liquid chromatography coupled to MS (LC-MS) exploits the high separation power of HPLC, including its ability to separate compounds of high molecular weight that cannot be analyzed by GC. An enormous range of columns and elution procedures are available. Traditionally, HPLC has been coupled to ultraviolet and visible light (UV/VIS) or diode-array detectors. Coupling it to MS instead provides further selectivity, unbiased detection, and information about the structures of the separated compounds. Metabolites are introduced into the mass spectrometer by electrospray ionization (ESI). ESI is an atmospheric pressure process, transferring analyte molecules that elute from an HPLC column into the gas phase suitable for mass analysis. The analytes enter the mass spectrometer as charged molecules that are transported in an electrical field between the end of the column and the entrance of the mass spectrometer. Ionization can occur via protonation (ESI+) or deprotonation (ESI-) [13, 14] and can lead to single and multiple ions. The presence of multiple ions shifts the mass-to-charge ratio (m/z) of even high-mass analytes into the scanning range of a typical mass analyzer. When ESI is combined with high-end mass spectrometers there is effectively no mass-range restriction, allowing complete proteins to be analyzed . ESI has a bias against less polar compounds, such as terpenes, carotenoids and aliphatics, for which it is better to use alternative procedures, such as atmospheric pressure photoionization (APPI)  or GC-MS.
Structural information is obtained by collision-induced decomposition . Parent ions produced by ESI are isolated and accelerated inside the mass spectrometer using quadrupole mass filters (see Box 1), forcing them to collide with molecules of the bath gas (usually helium or argon). The resulting fragment spectrum can be compared with fragmentation libraries for known chemical structures. Depending on the mass analyzer used, several fragment spectra per second can be performed 'on the fly'. Using quadrupole ion traps it is further possible to generate multiple fragment spectra of selected fragments of a parent ion mass (see Box 1 for further information).
Evaluation needs expert knowledge. Complications arise from chromatographic interference, the enhancement or suppression of ionization in complex matrices, and the presence of multiple ions . This makes it vital to develop robust and standardized protocols, and to include routine checks in case changes in the complex mix of compounds within a biological extract are affecting the separation and analysis.
Triple quadrupole instruments allow quantification by single-reaction monitoring (SRM). A specific mass ion - the metabolite of interest - is selected 'on-the-fly' with the first quadrupole mass filter, fragmented in the second quadrupole, and a corresponding fragment is then selected in the third quadrupole. SRM provides highly specific mass-ion traces for preselected metabolites, which can then be quantified by peak integration. It provides high selectivity and sensitivity, but it can only be applied to metabolites that have known fragmentation pathways. It can also only be applied to a certain number of metabolites per run.
LC-MS has mainly been used to analyze selected metabolites , but it has enormous potential for metabolite profiling, as a complement to GC-MS. High-resolution mass spectrometry  (detecting 11,000 mass ions in a single spectrum) and high-resolution chromatography [21, 22] will further increase the number of metabolites detected. The biggest challenges are to develop an automated procedure for evaluation and metabolite quantification from raw chromatograms similar to those already available for GC-MS [23, 24], and to discover the identity of the huge numbers of unknown metabolites and analytes detected by these powerful analytic platforms. This can be achieved using structural information from the MSn capactity of LC-MS systems [18, 25], and by combining LC-MS with Fourier-transform ion cyclotron resonance mass spectrometry (FTICRMS) and NMR.
In Fourier-transform ion cyclotron resonance mass spectrometry (FTICRMS), extracts are directly infused into the MS instrument using soft ionization techniques, to gain fingerprints of the molecular ions present . This technique requires a mass analyzer of sufficient accuracy to generate the definitive empirical formulae for several hundred ions. For profiling, it currently has two major limitations. Firstly, the lack of chromatography renders it incapable of distinguishing between isomers, because of their identical molecular masses, making unambiguous discrimination of many metabolites impossible. Secondly, there is no documentation of vigorous method validation, which is required to support its utilization for metabolite analyses. These caveats aside, it is clear that the coupling of such a machine to instrumentation allowing high-quality separation of analytes would allow a far greater accuracy of identification - albeit at a heavy cost in terms of sample throughput.
A radically alternative approach is to use NMR to detect and quantify metabolites, via the magnetic properties of isotopes of the constituent atoms. In principle, this approach will detect an exceedingly broad range of metabolites, because hydrogen, carbon, nitrogen, phosphorus and oxygen have magnetic isotopes that are detectable by NMR [27, 28]. The computational analysis and chemometric software are highly developed, enabling rapid processing of acquired spectral data and identification of metabolites from the signals.
In practice, however, NMR detects fewer metabolites and has a smaller dynamic range than MS-based technologies . Although around 2,700 analytes were detected in a study of plant extracts using LC-NMR , fewer than 50 could be quantified and unambiguously identified. The lower sensitivity of NMR-based techniques restricts them to quantification of the most abundant compounds [30, 31] or single classes of compounds [32, 33]. In wide surveys, they provide mainly non-quantitative information .
Although NMR spectroscopy is of limited utility for metabolite profiling, it is important for unequivocal determination of metabolite structure, which is one of the major bottlenecks of metabolite profiling. Two other features also make it invaluable for specific applications: it can be used to study metabolite levels in vivo, albeit for only a few major metabolites [27, 28], and it can be used to unravel complex metabolic fluxes by following labeled atoms through metabolic intermediates at the atomic level .
There are innumerable methods for detecting specific compounds or groups of compounds, using UV/visible light absorbance, fluorescence or luminescence. Metabolites are detected directly, or after chromatographic separation, or after specific chemical or enzymatic reactions have converted a given metabolite into an analyte that can be detected spectroscopically. There are often diverse methods available for any one metabolite, differing in sensitivity, specificity, throughput, or in the type of equipment needed . It is difficult to base a high-throughput profiling platform around such diverse procedures and instrumentation. They nevertheless provide an important component in any profiling platform because, for example, they allow sensitive quantification of low-level phosphorylated intermediates and coenzymes . Spectrophotometry usually provides sensitivity in the nanomole range, and this can be increased 100- to 1,000-fold by using enzyme-activation , enzyme-inhibition  or cycling  assays, fluorimetry [40, 41] and luminometry .
These dedicated assays allow high-throughput analysis, which is crucial for diagnostic purposes and for the design of profiling experiments, an aspect of functional genomics whose importance is frequently underestimated . A laboratory equipped with a simple microplate reader can assay and calculate the results for several hundred extracts in a day. Throughput can be increased to over 100,000 analyses per day by combining microplate technology with robotics . The bottleneck for such approaches is then the preparation and extraction of samples.
The evolution of high-throughput assays is linked to miniaturization to save material, costs and time. While microplate technology is approaching saturation in terms of scaling down, new microchip-based techniques such as microfluidic chips that operate in the nanoliter range  will dramatically increase throughput and also lower costs . One use will be in combination with enzymatic reactions which generate fluorogenic products that allow sensitive detection. This will probably be restricted by the availability of enzymes and substrates, however. Another approach will be to use apoenzymes (proteins that bind their substrate without further catalysis) fused to fluorophores, allowing substrate binding and the resulting conformational changes to be detected by fluorescence. The creation of vast libraries of apoenzymes covering the different parts of the metabolome could lead to a new generation of ultra-high-throughput profiling methods.
The chemical complexity of metabolites and extracts makes checks of the extraction and analytic procedures vital. Without them, the levels reported for a particular tissue may be incorrect, and differences reported between tissues may be artifacts due to differential losses in contrasting matrices. Compared to transcripts and proteins, metabolites have high turnover rates. It is imperative to harvest the tissue of interest without subjecting it to transients - for example of changing light intensity - which could alter the levels of metabolites (see ). Disruption of cellular structure leads to mixing of metabolites with enzymes that are normally sequestered in a different subcellular compartment or cell, and this degrades the metabolites rapidly. It is essential to quench metabolism fast enough to prevent post-extraction changes in metabolite levels (by freezing in liquid nitrogen, for example, or squashing tissue between large metal blocks precooled in liquid nitrogen). Given the chemical diversity of metabolites, it is impossible to devise a method that allows all of them to be quantitatively and totally extracted. The procedures for extraction and extract handling must therefore be tuned to the biological question of interest - which metabolites is it essential to measure with the highest possible precision? For example, to measure metabolites that are susceptible to enzymatic breakdown, it is essential totally to inhibit all enzyme activity (see ). This requires treatments (such as extraction in trichloroacetic acid) that will degrade or derivatize other metabolites.
Extraction and analysis should be optimized and routinely checked for specific metabolites of interest, by spiking tissue samples with small amounts of authenticated chemical standards just prior to extraction . Such 'recovery' experiments should be repeated with each new tissue, or with each treatment that strongly affects the spectra of enzymes or metabolites (for example, fruit ripening or pathogen attack). This becomes impractical when hundreds or thousands of metabolites are being analyzed. In such cases, a generic strategy is required for quality control. The simplest is to mix the tissues 1:1 with a standard tissue, for which the procedures have already been validated, and to check for each individual analyte that the level in the mixture is the arithmetical mean of the level in the individual tissues. This approach was used, for example, in the adaptation of GC-MS protocols for the measurement of tomato fruits . A bonus is that it provides additional information on retention-time shifts between the standard and novel tissues, and that it rapidly identifies differences in the dynamic range of metabolite levels between the tissues.
Because metabolite profiling is cheap once the hardware platforms have been established, it can be applied to a large number of samples to generate huge amounts of data. The next bottleneck is in finding ways to combine and interpret the data. A first, apparently trivial, step is to develop a clearly defined syntax for all the metabolites being measured. This is analogous to the establishment of a complete list of all the genes in an organism, but is more complex because there is no genome sequence that can act as a point of reference, and because of the large numbers of synonyms and baffling length of many chemical names. The first steps towards this goal are being taken, but it is an arduous journey and requires a combination of expert knowledge of metabolites and of text-mining algorithms.
Databases of metabolite profiles contain a large number of experimental data points for each parameter, and are well suited for data mining. Analysis with statistical tools to detect correlations and clusters drives unbiased knowledge acquisition, by identifying unknown relationships. Analysis via principal component analysis, individual component analysis or machine-learning approaches can be used to uncover important patterns or differences in metabolite levels [4, 12]. This will generate leads for further experimentation and define 'diagnostic' metabolites that can then be selectively measured in very high-throughput ways.
Metabolism is notoriously incomprehensible to the non-specialist. In parallel with informatics-driven approaches, tools are needed that place metabolite-profiling data in a biological context. Metabolic databases (such as KEGG ) provide exhaustive information about the possible roles of a given metabolite, based on information compiled from numerous prokaryotes and eukaryotes. They are useful for reference, but daunting for non-specialists. In such databases, pathways are defined as large networks, the information is inclusive rather than specific, and it is often unclear which pathways or sections of them operate in which organism. Recently a first build of the AraCyc resource was published [49, 50]. This database will cumulatively assemble information about different plant metabolic pathways, providing diagrams that show the metabolites and the genes that encode the enzymes in each pathway. The next step is to develop tools to display metabolite data onto diagrams of pathways. This has been approached by Thimm et al. [51, 52] with a tool called MapMan that allows users to paint metabolite-profiling datasets out onto existing templates, or onto diagrams they design themselves. MapMan also places metabolite data in a wider context, in combination with expression-profiling datasets and, potentially, with information about proteins and enzyme activities.
Measurements of metabolites provide basic information about biological responses to physiological or environmental changes. Metabolite profiling allows a shift from hypothesis-driven research to the analysis of system-wide responses, especially when it is integrated with other profiling technologies (see, for example, [53, 54]). After characterizing the response, the next task is to elucidate the regulatory mechanisms. Systematic investigation of all of the metabolites within a part, or segment, of a metabolic network provides a powerful and unbiased strategy for identifying the site or sites at which key mechanisms act to alter fluxes. The regulated enzyme is revealed because the level of its substrate(s) changes reciprocally to the flux through the pathway (see, for example, ). Metabolite-profiling datasets can also be chemometrically analyzed to uncover correlations between individual metabolites and the expression levels of specific genes  or proteins . In the post-genomic era, metabolite profiling will be increasingly used to phenotype mutants and transgenic organisms, so as to define the role of a gene. One of the major challenges in functional genomics is to assign functions to the many poorly or unannotated genes ; metabolite profiling will provide a key for those that encode proteins involved in metabolism.
In plant breeding, questions related to molecular composition and its implications for nutrition and health are moving to the fore. Advances in technology are speeding up the introduction of new diversity into breeding programs, either via transgenic technology or by using molecular markers in combination with wide crosses. Metabolite profiling can be used to characterize this diversity phenotypically with respect to its metabolite composition, providing a powerful resource to guide breeding programs and to alert researchers at an early stage to positive or detrimental traits. The power of this approach will be vastly increased if it can be combined with a systematic survey of the metabolite composition of the plant produce that is already on the market. As well as providing a baseline, this will also provide a rational framework for risk assessment via 'substantial equivalence': metabolite profiling will be used to determine the metabolite composition of novel products, which can be compared with the range of metabolites in produce already available. Furthermore, metabolic profiling will provide important input into nutritional research, and into the public debate about the acceptability of changes in food-production chains.
One of the major challenges for the life sciences in the coming decades is to move beyond detailed knowledge about organisms and their responses in the laboratory or controlled environments to understand complex interactions in natural ecosystems and during evolution. The combination of high throughput and breadth will allow metabolite profiling to be integrated with ecological studies, which require large sampling strategies but have often suffered from the limitations of preconception-driven research. Metabolite profiling can also be used to assess genotypic variation, without requiring the prior development of molecular tools for a particular species [4, 57]. We can expect metabolite profiling to become a key tool in a variety of fields in the years to come.