Biofuel and energy crops: high-yield Saccharinae take center stage in the post-genomics era

The Saccharinae, especially sugarcane, Miscanthus and sorghum, present remarkable characteristics for bioenergy production. Biotechnology of these plants will be important for a sustainable feedstock supply. Herein, we review knowledge useful for their improvement and synergies gained by their parallel study.

Th e replacement of fossil fuels by biofuels is an ongoing eff ort in many countries. With decreasing oil reserves and increasing fossil fuel prices, bioenergy is a promising alternative. Advantages of biofuels can include a positive energy balance, reduction of greenhouse gas emissions and indirect eff ects, such as rural development. Studies based on life-cycle analysis conclude that when ethanol from sugarcane is used to replace fossil fuels in transportation, a substantial reduction in net greenhouse gas emissions may result (from 80% to greater than 100% savings [1]). Biomass can also be used to generate electricity, with electric vehicles presenting several advan tages over combustion engines. Wood, cellulose and biofuel generation of electricity and stationary genera tion of energy can be very effi cient and are also being implemented as options. In the last 5 years we have seen a 109% increase in global biofuel production. World projections provided by the Organisation for Economic Co-operation and Development (OECD)/Food and Agricul ture Organization (FAO) indicate further increases in bioethanol and biodiesel production from the present 140 billion liters to 221 billion liters in 2021, corresponding to an additional 60% increase.
Increased biofuel production, and the associated increase in production of energy feedstocks, raises sustainability concerns over issues such as changes in land use, competition between energy crops and food and feed crops, and impacts on ecosystem services, including soil and water resources. Mandates in several countries to substitute gasoline for bioethanol require a substantial contribution from advanced fuels (sugar-derived and/or lignocellulosic bioethanol) to guarantee a reduction of greenhouse gas emissions.
Which plants are best suited to the requirements of future bioenergy feedstocks? To produce energy from plant-fi xed C-bonds, crops should be high yielding, fast growing, with C-bonds that are easy to convert to useful forms, and require relatively small energy inputs for growth and harvest. To achieve sustainability, energy crops should not require extensive use of prime agricultural lands and they should have a low cost of energy production from biomass. Both the realities of agriculture in environments that are always heterogeneous and energy security require that feedstocks include a portfolio of diverse crops rather than merely a single crop.
A strong case can be made that members of the Saccharinae subtribe, particularly Saccharum (sugarcane and energy cane), Miscanthus and Sorghum species (Figure 1), best encompass these requirements. For commer cial markets to develop, these crops are being evaluated with respect to their productivity as perennial crops (ratoon) in short growing seasons under diff erent conditions, such as periodic drought, low temperatures and low nutrient inputs [2]. A recent development includes breeding eff orts to produce an 'energy cane' (Saccharum species or interspecifi c hybrid) more amenable for hydrolysis of the bagasse and straw ligno cellulosic fi bers. Th e high yield of Saccharum (sugarcane) in tropical climates is particularly well documented, and Miscanthus and sorghum show similar promise in temperate climates. Herein, we review the merits of these grasses as a complementary package of bioenergy feedstock crops, the state of knowledge useful for their study and improvement, and synergies that might be gained by their parallel study.
Sorghum has been considered a member of the Sorghinae subtribe, although more recently a good case has been made (that we will accept herein) for expanding the Saccha rinae to include the Sorghinae [3].
Sugarcane is a common name of a group of predominantly tropical species that originated in Southeast Asia (Table 1). Modern varieties result from crosses of the sucrose-accumulating relative Saccharum offi cinarum and the wild relative Saccharum spontaneum, with contri butions from Saccharum robustum, Saccharum sinense, Saccharum barberi, Erianthus and Miscanthus [4,5]. Commercial varieties have the remarkable capability of storing high sucrose levels in the stem that can reach 40% of dry weight [6]. In a study of sugarcane yields across the world, commercial maximum cane yield averaged 69 t ha -1 year -1 and the experimental maximum averaged 98 t ha -1 year -1 in the countries with the highest sunlight [7]. Today, commercial yields closer to the experimental maximum are frequently reported. Sugarcane average annual production per hectare (39 t ha -1 of dry stalks and trash) compares favorably with other highyield bioenergy crops such as Miscanthus (29.6 t ha -1 ) and switchgrass (10.4 t ha -1 ) [8] (Table 2). Estimates from fi eld trials show an average yield of 22.8 and 12.2 t ha -1 for sugarcane ancestral species S. spontaneum and S. offi ci narum, respectively [9].
Complementing the tropical adaptation of Saccharum, with most species native to eastern or southeastern Asia [10], its adaptability to continental Europe [11][12][13] shows the feasibility of producing Miscanthus in temperate latitudes (Table 1). Miscanthus × giganteus, a sterile, vegetatively propagated hybrid (2n = 3x = 57) believed to originate from crosses between tetraploid Miscanthus saccharifl orus and diploid Miscanthus sinensis [14], generally produces high yields, similar to (and in some cases better than) other biomass crops [8,15]. Considerable leveraging of breeding, production and proces sing infrastructure might be gained by the close relationship of Miscanthus to Saccharum -thought to be the closest relatives of one another, and polyphyletic [16].  Saccharum × Miscanthus hybrids ('Miscanes') have been used for sugarcane improvement [17][18][19], and also show promise as a highly productive cellulosic biomass crop. Increased demand for limited fresh water, along with rising global temperatures and aridity, suggest that sustainable future biomass production will have to occur using little or no irrigation, highlighting an important role of sorghum in a portfolio of bioenergy crops. One of the most drought tolerant of cereal crops thanks to its origins in Sudan and Ethiopia [20], the multifaceted history of sorghum improvement offers a wider range of genetic variations than found in many crops, exemplified by the fact that sorghum is one of the few crops suited to all proposed approaches for renewable fuel production (such as from starch, sugar, and/or cellulose; Table 1). About 30% of the US sorghum crop is presently used as feedstock in the grain-to-ethanol process, which has also been commercialized in India and China. The completely sequenced genome of sorghum, which has the further advantages of being relatively small and with minimal gene duplication [21], together with transformation poten tial, knowledge of cell wall composition and architecture and other features ( [22] and references therein), make sorghum an important model for research concerning bioenergy grasses [22,23].
Plants in the Andropogoneae use C4 photosynthesis (Box 1), which avoids photorespiration, leading to higher maximal photosynthetic energy conversion efficiency than the C3 pathway used by rice, wheat and many other grasses [5,24], resulting in more biomass accumulation. In elevated CO 2 conditions, the C4 grasses sugarcane [25], maize and sorghum [26] show better responses to drought stress than C3 grasses. Plants in the Saccharinae have some further advantages in comparison with other C4 grasses, such as maize. First, many routinely produce a 'ratoon' crop, regrowing after harvest and thus elimi nating the need for replanting each year. Indeed, the Sorghum genus, with annual and perennial species that are genetically compatible, has become a botanical model for study of attributes related to perenniality [27][28][29]. Second, sugarcane and Miscanthus have lower nitrogeninput requirements [13,30], and the latter can relocate some nutrients from aerial parts to the roots and/or rhizomes at the end of the growing season [31]. Third, some reports show better photosynthetic features of Saccharinae plants than other Andropogoneae. Light interception by the leaves is higher in Miscanthus than in maize [15] and Miscanthus can sustain higher levels of CO 2 assimilation than maize in lower temperatures [32]. Sugarcane photosynthesis is enhanced in elevated CO 2 in open-top chambers, increasing biomass productivity [33], which does not occur in maize grown in open-air elevation of CO 2 [34]. However, this finding is

Box 1. C4 photosynthesis
Many of the most productive agricultural crops use the C4 photosynthetic pathway to increase net carbon assimilation at high temperature ( Figure 3, adapted from [97]). Discovered in sugarcane [98], C4 photosynthesis may have been an adaptation to hot, dry environments or CO 2 deficiency [99][100][101][102], and appears to have evolved repeatedly from ancestors that used C3 photosynthesis [103,104], including multiple origins within some angiosperm families [105,106]. Most C4 plants are grasses, including the entire Andropogoneae tribe (including sorghum, sugarcane and Miscanthus), and it has been inferred that C4 photosynthesis first arose in grasses during the Oligocene epoch (24 to 35 million years ago) [107,108]. The high photosynthetic capacity of C4 plants is achieved by CO 2 assimilation in mesophyll cells (by phosphoenolpyruvate carboxylase together with carbonic anhydrase to facilitate rapid equilibrium between CO 2 and HCO 3 -) then diffusion of the resulting C4 acids into bundle sheath cells, where CO 2 is discharged by various decarboxylases at up to 10-fold higher than atmospheric level at the site of ribulose-1,5-bisphosphate carboxylase oxygenase (Rubisco), the primary enzyme of C3 photosynthesis. This high CO 2 concentration mitigates wasteful fixation of oxygen by Rubisco, reducing photorespiration, or CO 2 loss during C3 photosynthesis, at high temperatures [109]. C4 plants are classified in part based on the type of decarboxylases used in the bundle sheath: NADP malic enzyme, NAD malic enzyme or phosphoenolpyruvate carboxykinase.
contro versial since enclosure and open-air studies give different results for the same crop, and some authors argue that enclosed studies are not the best scenario to mimic future increases in CO 2 concentration [35]. Moreover, experi ments with Miscanthus in ambient and open-air eleva tion of CO 2 show no differences in yield [36].
Since lignocellulosic biofuels use the plant cell wall as a source for fermentable sugars, it is important to understand the composition and architecture of the cell wall to develop strategies to degrade it efficiently. Grasses present a particular cell wall structure and composition (Figure 2), making a 'type II' cell wall that differs substantially from the 'type I' cell walls of other feedstocks, such as wood species [22,37,38]. This also implies the evolution of different gene families involved in the synthesis of the cell wall [22]. Recently, a model for sugarcane cell wall architecture and for hierarchical enzy matic hydrolysis was proposed [39]. By under standing the structure of the cell wall, it is possible to choose the best method to improve hydrolysis yield, and design breeding strategies or develop improved procedures to recover the released carbohydrates.

Genomics meets biotechnology for the improvement of Saccharinae biofuel grasses
Improvements in sorghum are characteristic of many other major food and feed crops, and Miscanthus improve ment is just beginning; examining sugarcane improvement therefore exemplifies the methods and approaches likely to be employed in biofuel grasses.
Sugarcane improvement efforts follow both molecularassisted breeding and transgenic routes [40]. S. offici narum is a plant with high sugar content in its stems but low productivity, and S. spontaneum has high tillering and biomass yield but low sugar accumulation. Modern sugarcane cultivars derive from a few crosses between S. officinarum and S. spontaneum and have been shown to be genetically very similar [41]. Breeding programs have been able to increase yield and sucrose content by crossing cultivars but gains are becoming slimmer. To continue the improvement of yield it may be necessary to The text in red denotes the main differences. Surrounding the cellulose microfibrils, the inner and outer hemicellulose circles show tightly and loosely bound polysaccharides, respectively. Grasses have glucuronoarabinoxylans (GAX) as the main crosslinking hemicellulose and a primary wall matrix enriched in mixed-linkage glucans, with lower pectin content. The thin red boundary in the primary wall of the grasses denotes the phenolic compounds, mainly ferulic acid, linked to GAX molecules. In grasses, seven cellulose microfibrils can be structured in a cellulose macrofibril. Typically, grasses have more lignin than other angiosperms. Non-grasses possess xyloglucan as the major crosslinking hemicellulose, a pectin-based matrix and structural proteins. In the secondary wall, note that pectins and mixed-linkage glucans are minor components. Also, we can see lignin forming a structural barrier surrounding the carbohydrates. Adapted from [39] and [110] with permission. turn back to ancestral genotypes and broaden the genetic basis of crosses. S. spontaneum and S. robustum are also being used as parents, with the goal of designing a crop more amenable for cellulosic biofuel production, with increased stress tolerance and increased yield but less emphasis on stalk sugar concentration, the so-called 'energy cane' . World collections of Saccharum germplasm are held in Florida [42] and India [43], which keep ancestral genotypes and cultivars, and many private collections are also kept and used for crosses in specific breeding programs. Each world collection has over 1,500 accessions of ancestral genotypes, most of them S. offici narum (about 750), S. spontaneum (about 600) and S. robustum (about 130), and 500 to 1,000 hybrids or cultivars. Sorghum, like sugarcane, has large germplasm collections held by the US National Plant Germplasm System and at the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT, the CGIAR center with a sorghum improvement mandate). Only a few small Miscanthus collections are held publicly, but several private collections associated with breeding programs are similar in size to the Saccharum collections.
Crosses between members of the Saccharinae are viable. In fact, sugarcane has been crossed to both Miscanthus and sorghum, generating viable progenies, and the strategy has been used to incorporate cold and drought resistance traits from Miscanthus into sugarcane [19].
The transformation of sugarcane is becoming an interesting and growing field. Methods for transformation are already established with efforts aimed mostly at sugar yield and quality [44][45][46], disease resistance [47,48], and the use of sugarcane as a biofactory to produce highvalue bioproducts [49,50]. For biofuel production, some approaches show interesting results, with lower biomass recalcitrance [51] and expression and accumulation of microbial cellulolytic enzymes in sugarcane leaves [52] to improve biomass hydrolysis. The most widely used promoters are the constitutive CaMV 35S and maize ubi1, but sugarcane promoters have already been used or characterized, including tissue-specific [46,47] and respon sive promoters [53]. However, sugarcane transfor mation is not a trivial task since problems such as transgene silencing frequently occur ( [40,54] and references therein). Sorghum transformation is also routine (although at lower efficiency than in some crops [55]), and Miscanthus transformation methods have been established [56].

Advantages of a reference genome
For both molecular-assisted and transgenic strategies outlined above, the availability of a reference genome sequence is highly desirable, as well as the definition of the complete complement of genes and proteins. For the Saccharinae, the relatively small (740 Mb) and diploid genome of sorghum, which has not experienced genome duplication in about 70 million years [21], has become the best reference for genomics and transcriptomics in sugarcane [57]. Nonetheless, the sugarcane genome itself is being sequenced using a combination of approaches. In a first phase, researchers are sequencing bacterial artificial chromosomes (BACs) combined with whole-genome shot-gun sequencing to produce a reference genome [58]. Currently, three sugarcane BAC libraries are available; from variety R570 [59], selfed progenies of R570 [60] and SP80-3280 [61]. The two former libraries have 103,000 to 110,000 clones comprising about 12 times coverage of the basic genome complement but only about 1.3 to 1.4 times coverage of the individual alleles. The latter library has   [61][62][63]. Unaligned regions between sorghum and sugarcane genomes are largely repetitive [62], enriched in transposon-related sequences [61,63]. Consistent with several genetic mapping efforts, the sequencing of BAC clones revealed high levels of gene structure/sequence conservation and collinearity among hom(oe)ologous haplotypes of the sugarcane genome [64], and several putative sugarcane-specific genes/ sequences [61][62][63]. Groups from Australia, Brazil, France, South Africa and the USA are advancing these efforts in genome sequencing, increasing the number of BACs sequenced and producing shot-gun data of several cultivars. It is expected that reference genome sequences will be made available for both cultivars and ancestral genotypes [65] and, to that end, researchers are developing statistical models using SNPs where homology groups with any ploidy level may be estimated [66]. This will be essential to obtain a saturated genetic map of the sugarcane genome that may aid genome assembly. The greatest challenge that distinguishes the sequencing of Saccharum and Miscanthus from the more tractable genomes of sorghum and other cereal models is large physical size (approximately 10 Gb) and large copy numbers of even 'low-copy' elements (8 to 12 in sugarcane; 4 to 6 in Miscanthus). During assembly of such genomes, many closely related alleles 'collapse' into single gene/element models that fail to capture allelic and perhaps also paralogous diversity within even a single genotype. The sorghum genome will greatly help in the assembly, but around 20% of the sugarcane expression sequence tags (ESTs) from the SUCEST project [67] appear to be specific to sugarcane, since they do not match sorghum, Miscanthus, maize, rice or Brachy podium [68], requiring other strategies in the assembly. Linkage maps based on molecular markers have shown synteny and collinearity of sorghum and sugarcane genomes, but are complicated to make in sugarcane due to the polyploidy and absence of inbred lines ( [69] and references therein). This problem was partly overcome with the use of single-dose markers [70], which segregate in a 1:1 ratio in the gametes of a heterozygous genotype, and account for approximately 70% of polymorphic loci in sugarcane [71]. However, among 20 to 30 linkage maps based on a few thousand markers available for sugarcane ( [71,72] and references therein), it remains true that only 33% to 60% of the sugarcane genome is represented on these maps [71]. A recent development that may help breeders in marker-assisted selection efforts has been the development of an algorithm and software (ONEMAP) for constructing linkage maps of outcrossing plant species that has been successfully applied to sugarcane [73]. Enriched mapping of DNA polymorphisms that also provide for deconvolution of closely related sequences may also aid in assembly of such highly polyploid genomes.

Saccharinae transcriptomics
Changes in gene expression associated with allo polyploidy are well known, but sugarcane functional genomics is a challenge due the complexity of its largely autopolyploid and aneuploid genome and the absence of a reference sequence. Again, the sorghum genome has been serving as a reference to define putative transcripts. The sorghum transcriptome has been studied by different high-throughput technologies such as cDNA microarrays and massively parallel sequencing (Tables 3 and 4) to understand the expression profiling and biological function of genes in response to herbivory, biotic and abiotic stress in different tissues and treatments [68], and how the genes and their structural/functional changes contribute to the morphological variations between sorghum lines integrating genome evolution and expression divergence [74]. Deep RNA sequencing methods have overcome many limitations of microarray technologies and have allowed recent studies to reveal sorghum genes, gene networks, and a strong interplay among various metabolic pathways in different treatments [75], as well as the identification of particular paralogs that putatively encode enzymes involved in specific metabolic networks [76]. Despite the absence of a sequenced genome and the complexities associated with the presence of about 8 to 12 copies of each gene, functional genomics has made considerable progress towards understanding unique bio logical attributes of sugarcane. These studies assist in the development of new applications for bioenergy, biomaterial industries and improved 'energy' cultivars [57]. The fundamental databases and resources for studies of functional genomics in sugarcane have been reviewed recently [57,77,78] and a sugarcane computa tional environment (SUCEST-FUN Database) has been developed for storage, retrieval and integration of genome sequencing, transcriptome, expression profiling, gene catalogs, physio logy measures and transgenic plant data [79]. Studies on sugarcane gene expression have been based mainly on EST information from different tissues, treatments and genotypes. The largest contri bution to the available ESTs (>80%) comes from the SUCEST project [67], and most of the remainder comes from Australia, USA, South Africa and India (reviewed by [57,68]). To obtain a less redundant dataset including ESTs not sampled by the SUCEST project, a comparison with SoGI [80] was carried out and 8,106 sequences lacking detectable similarity to SAS (sugarcane assembled sequences) were identified. The clustering strategy in SoGI produces redundant clusters and makes the SUCEST assembly more appropriate for gene and orthology-based analysis [81]. The SUCEST-FUN project and SAS sequences have been updated with the whole sugarcane ESTs from the National Center for Biotechnology Information (NCBI) and compared with the SoGI assembly (Table 5). A total of 282,683 ESTs are currently catalogued in the SUCEST-FUN Database.Comparison of ESTs from sorghum with sugarcane, maize and rice has revealed mean sequence identities of 97%, 93% and 86%, respectively, indicating a close relationship between sorghum and sugarcane ( Figure S7 of [21]). A total of 39,021 sugarcane proteins were predicted from 43,141 clusters [67] using ESTScan [82] and the Oryza sativa matrix (Table 5). Putative orthologs and paralogs were identified by pairwise proteome comparisons with InParanoid software [83].
With the aid of MultiParanoid software [84], we found orthology relationships among multiple proteomes (   Miscanthus sinensis [96] in maize, 16,913 in rice and 13,998 in Arabidopsis, with a confidence score ≥0.05 and group merging cut-off >0.5 using the BLOSUM80 matrix, suitable for closely related sequences ( Table 6).
The sugarcane transcriptome has been studied using technologies, including cDNA macroarrays (nylon membranes), cDNA microarrays spotted onto glass slides, and oligonucleotide arrays either spotted or synthesized in situ. A summary of the available platforms, samples and related works for sugarcane and sorghum using array technologies is shown in Table 3 and has been reviewed recently [57,68,78,85]. Sugarcane transcriptomics has identi fied genes associated with sucrose content, biotic and abiotic stresses, photosynthesis, carbon partitioning and roles of phytohormones and signaling pathways in adaptive responses. These studies also allowed for the identification of promoters that can be used to drive transgene components in a tissue-specific or controlled manner. Several other methods to study sugarcane expres sion profiles at a moderate scale have been used to confirm the expression patterns observed in large-scale transcript studies [57].
More recently, the use of oligoarrays has included studies on the regulation of antisense gene expression in sugarcane, pointing to a role for these transcripts in drought responses [86]. Some years ago, serial analysis of gene expression (SAGE) in sugarcane revealed an unexpectedly high proportion of antisense transcripts and chimeric SAGE [87]. High-throughput sequencing (Table 4) is useful for assessing transcriptomes, providing detailed information for transcript variants, particularly SNPs, assessment of the expression of hom(oe)ologous alleles in the polyploid genome, spliced isoforms and so on [88]. Using this strategy, some sugarcane genes were characterized for SNP density and gene haplotypes across varieties [89]. In recent studies, it has become apparent that small RNAs, particularly microRNAs, have important regulatory roles in sugarcane, playing a key role in development and responses to biotic and abiotic stresses [90][91][92]. Evidence suggests that long terminal repeat retro transposon (LTR-RT) families may affect nearby genes by generating a diverse set of small RNAs that trigger gene-silencing mechanisms [93].
In contrast to sorghum and sugarcane, genomic and transcriptomic studies on Miscanthus are just beginning.
The recent high-throughput sequencing of its genome and transcriptome identified the presence of repeats that are actively producing small RNAs [94], and the construction of a genetic map identified informative simple sequence repeats in sugarcane and a genome-wide duplication in Miscanthus relative to S. bicolor [95]. These studies will increase the understanding of complex genomes [96].

Conclusions
The Saccharinae grasses sugarcane, Miscanthus and sorghum are promising and complementary elements of a portfolio of bioenergy feedstocks. As sustainability criteria take dominant roles in the commercialization of biomass sources, these plants are likely to contribute to provide cheap, reliable and politically viable options for bioenergy production. Biotechnology for these crops is less advanced than in food crops such as maize and rice, but it is progressing quickly. Many efforts are underway to define genes associated with traits of interest such as sucrose content, drought tolerance, yield and adaptation to climate changes, and much is known about genes and markers for the improvement of these crops. Breeding programs are improving germplasm collections and defining routes to speed up selection of progenies and choice of ideal parents for crossing. It is expected that prudent integration of conventional breeding methods with marker-assisted and transgenic options may increase the (currently slow) rates of yield improvement, decreasing the amount of land required for large-scale biofuel production, as well as the need for inputs such as water, herbicides and fertilizers to maintain economical levels of production. Finally, the transition to a more biobased economy may be expedited by the increased value of biobased chemicals that might be harvested from the production chain through the adoption of integrated biorefinery systems. Better understanding of and greater control over carbon partitioning in these plants may greatly increase the number of co-products, including bioethanol, biodiesel, biokerosene, bioplastics and bioelectricity to name a few.
Abbreviations BAC, bacterial artificial chromosome; EST, expressed sequence tag; SAGE, serial analysis of gene expression; SAS, sugarcane assembled sequence; SNP, single nucleotide polymorphism.