Structural genomics of membrane proteins
© BioMed Central Ltd 2004
Published: 15 March 2004
Skip to main content
© BioMed Central Ltd 2004
Published: 15 March 2004
Improvements in the fields of membrane-protein molecular biology and biochemistry, technical advances in structural data collection and processing, and the availability of numerous sequenced genomes have paved the way for membrane-protein structural genomics efforts. There has been significant recent progress, but various issues essential for high-throughput membrane-protein structure determination remain to be resolved.
The goal of determining the structure of membrane proteins continues to define a substantial region of the structural biology horizon. While significant progress has been made over the past five years, the ratio of structures solved for membrane proteins to those solved for soluble proteins remains small, such that membrane proteins comprise less than 1 in 100 of the structures deposited in the Protein Data Bank (PDB) .
The relatively small number of membrane-protein structures determined to date stems primarily from the requirement for solubilization of membrane proteins before crystallization, while preserving the structural integrity of the solubilized protein. Despite this challenge, the need to increase the number of known membrane-protein structures is clear and is further emphasized by the estimate that more than 30% of a typical cell's proteins are membrane proteins  and that more than half of all membrane proteins are predicted to be pharmaceutical targets . The recent modest increase in the rate of determining membrane-protein structures has been facilitated by improvements in the areas of membrane-protein molecular biology and biochemistry, and through technical advances in synchrotron X-ray beamlines for crystallography, high-field nuclear magnetic resonance (NMR) and high-resolution electron microscopy. The availability of sequenced genomes spanning a broad range of species has vastly improved searches for structural homologs and the prediction of previously unknown membrane proteins. These factors have converged to help set the stage for the determination of membrane-protein structures rapidly and on a large scale.
In recent years a number of consortia, bringing together researchers from a variety of academic and research institutions [5–7]; have been established to address and execute the goals of structural genomics - that is, to dramatically increase the database of known protein structures by developing and applying methodologies to determine them as rapidly and cost-effectively as possible. To date, however, only one group (Mycobacterium tuberculosis Membrane Protein Structural Genomics ) has taken on as its primary mission the high-throughput determination of membrane-protein structures. While the efforts of this group are ongoing, substantial progress has already been made in the construction of expression vectors on a large scale.
This article provides an overview of the factors essential for the determination of membrane-protein structures in high-throughput fashion and the progress that has been made so far in these areas. The key issues that arise for a researcher who wishes to determine the structure of membrane proteins at the atomic level are: how to produce sufficient protein, and once produced how to solubilize and purify the protein; then, how to crystallize the protein, or whether instead to study it in solution; and finally, how to scale up such methods for high-throughput structure determination.
High-resolution structure-determination efforts typically require milligram quantities of proteins. Overexpression of prokaryotic genes in bacterial vectors currently provides the most direct and productive route to fulfilling this need [9, 10]. Studies on genes with introns will require full-length cDNAs derived from mRNA libraries, and this represents another degree of complexity. Groups such as the Mammalian Gene Collection (MGC) , for example, have created resources for the production and distribution of full-length human genes .
Prokaryotic genomes are also logical choices as target genomes for membrane-protein structural genomics efforts . The initial goals of these efforts will be to clone, overexpress and purify the known and putative membrane proteins of their selected genomes. Potential membrane-protein targets can be identified from functional studies or on the basis of knowledge from previously characterized homologous genes. In many instances, however, homology-based predictions of protein type and function will not be possible. For these proteins, assignment of putative membrane-protein status will have to be based on predictions of transmembrane segments using the many bioinformatic tools now available (for example, see ).
Ideally, all successfully expressed and solubilized target membrane proteins should be distributed to X-ray and electron crystallography groups, and appropriate protein samples to NMR spectroscopy teams, for simultaneous efforts at structure determination and maximization of the likelihood that rapid progress will be made.
Once a membrane protein has been prioritized, by whatever means, for structure determination, the next step must be to overexpress the protein in a way that allows significant quantities to be isolated and purified for further study. The majority of structural genomics consortia are pursuing high-throughput protein expression through constructs expressed in Escherichia coli. Expression in E. coli has numerous attributes that make it such a strong choice. It has clear advantages currently with respect to cost per gene expressed, the variety of specialized expression vectors available and the well-developed methods for labeling target proteins for NMR and X-ray diffraction studies [9, 10].
Expression vectors based on promoters used by T7 RNA polymerase are in widespread use for the overexpression of soluble proteins among the various consortia [9, 10]. It also appears that for the immediate future this class of vectors will be favored by research groups overexpressing membrane proteins. One concern, especially with respect to the overexpression of membrane proteins, is the effect of target-protein expression levels in the uninduced state ('leakiness') on host-cell health, and subsequently on the ability to overexpress properly folded proteins at high levels. Vectors with promoters less prone to leakage expression may have to be sought for the successful overexpression of certain membrane proteins.
Purification of overexpressed protein is greatly simplified and idealized for high-throughput studies through the use of constructs in which the target gene is fused to an affinity tag, whereby the tag can be placed at either the amino- or the carboxy-terminal end of the target protein, with a number of options in construct design. Examples of tags include glutathione S-transferase, maltose-binding protein and polyhistidine. By virtue of their ease of use, polyhistidine tags have seen the broadest application [9, 10]. Although there are indications that amino-terminal polyhistidine-tag fusion proteins may have a better expression record with respect to membrane proteins , in our view the performance of both amino- and carboxy-terminal tagged constructs should be evaluated on a case-by-case basis with respect to target-protein overexpression, solubility, and crystallizability. To facilitate the subsequent removal of affinity tags, protease recognition sites can also be incorporated into the constructs; and in the case of membrane proteins it is desirable that these sites support the use of detergent-resistant proteases, so as to be compatible with detergent-based purification procedures.
Structure-determination efforts on human gene products have been limited, because of difficulties in obtaining high expression levels of protein. Many human genes will probably require some form of eukaryotic expression vector for successful overexpression. Numerous yeast, insect and mammalian cell lines could potentially serve in this capacity ; the development of eukaryotic expression methodologies tailored for high-throughput applications, however, is still in the nascent stages.
The choice of host cells for overexpression of a given protein will depend on various factors, such as the source of the original gene, the protein's fold complexity and the potential need for folding partners, and requirements for post-translational modification. As discussed earlier, the use of E. coli expression vectors runs strongly across the various structural genomics consortia and, in turn, dictates that some strain of E. coli will serve as host cell. Although there are a number of strains that have been used to express membrane proteins, BL21 (DE3) and a derivative of BL21 optimized for membrane-protein expression (designated C43) appear to be best suited for the task. Strain variant C43 grows more slowly than BL21, and in doing so may provide more time for the host cell to deal properly with higher than normal levels of membrane-protein expression . Expression of both soluble and membrane proteins in a given bacterial strain can be quite sensitive to post-induction incubation temperature. The amount of overexpressed target membrane protein localized within lipid bilayers may be increased, and the occurrence of inclusion bodies containing aggregated protein reduced, by lowering incubation temperatures following induction.
Although not as practical for high-throughput purposes as bacterial expression systems, certain eukaryotic target proteins, either single polypeptides or those of multiple subunit complexes, may require 'higher' cell types to achieve adequate expression. Potential drawbacks to the use of eukaryotic cell types can derive from difficulties in protein isolation and yield, longer doubling times, and cost. Dealing with post-translational modifications, such as the removal of glycosylation usually required for success in crystallization, can be particularly challenging and may require inhibition of the host cell's glycosylation pathway during expression, or modification of the construct sequence, or treatment of expressed proteins with glycosidases .
Although still an evolving technology, cell-free expression systems offer an alternative methodology for the overexpression of proteins [17–19]. In such a system, essential protein expression machinery is obtained from cell lysates, which can be isolated and prepared in-house or obtained from commercial sources. Commercial systems are presently available with an advertised capability of producing 150 mg of protein from 30 ml of reaction mixture over a 24-hour time period . Clearly this level of expression is well suited for high-throughput structural studies. For membrane proteins, however, expression away from lipid bilayer environments can, understandably, result in problems with protein folding and solubility. Supplementation of the reaction mixtures with detergents and lipids may provide a means of extending the utility of this approach to membrane proteins.
Once sufficiently high level expression of the target membrane protein has been established, preferably in the host-cell membrane, the next step is to determine the detergents best-suited for solubilization and subsequent purification. A wide variety of detergents suitable for membrane-protein solubilization are currently available (see, for example, ). Some of the most popular detergent families include the alkyl glucosides and maltosides, polyoxyethylenes, alkyldimethylamine oxides, and cholate derivates. Experience has shown that the detergent selected for membrane extraction may not be the detergent of choice for crystallization. Broadly speaking, both the length of the detergent's hydrocarbon chain (hydrophobic domain) and the size of its polar head group (hydrophilic domain) are major factors affecting the stability of the solubilized protein - longer chain lengths and larger head groups are generally more favorable for the stability of the protein. When necessary, solubilizing detergents can be exchanged for other detergents through dialysis, or while the target protein is bound to chromatographic media. Some factors that need to be evaluated in choosing a detergent at the solubilization stage may be extraction yield, stability of the solubilized protein, and cost.
A particularly important criterion in selecting a detergent is its effect on a protein's structure and function. Certain detergents, particularly ionic ones, can denature membrane proteins, even when used at relatively low concentrations. Undesirable outcomes can involve varying degrees of denaturation, separation of subunits from multimeric or multisubunit complexes, and aggregation . Such potential results should be avoided prior to attempting crystallization and in collecting solution NMR data, for which samples should be monodisperse and stable, often at concentrations up to 10 mg/ml [10, 22].
Evaluation of a detergent's effect on target-protein stability can provide a relatively quick means of assessing a detergent's suitability. One simple but effective test we have used involves solubilizing target proteins in candidate detergents and storing the mixtures overnight at room temperature. The various preparations can then be quickly evaluated to determine whether or not the protein has precipitated. Those solubilized proteins appearing stable can be examined more closely to determine the extent of homogeneity. Molecular-sieve chromatography, which separates molecules primarily by size, can reveal protein aggregation and/or oligomerization, as well as provide a means of improving sample homogeneity. Dynamic light scattering is another approach that can provide much of the same information about particle size as molecular-sieve chromatography, but more rapidly . A form of NMR spectroscopy, heteronuclear single quantum correlation (HSQC), can also be used as a screening tool for the rapid assessment of target-protein quality . If the function of a target protein is known or confidently predicted, functional assays should ideally be used to ensure that the protein is fully active in the candidate detergent.
As in the case of high-throughput structure-determination efforts for soluble proteins, the process of purifying overexpressed membrane proteins has been substantially streamlined through the use of affinity tags. When coupled to the output of optimized host-cell systems, milligram quantities of relatively pure protein can be obtained following a single chromatographic step . In many instances the target protein will already be sufficiently pure at this stage to begin structure-determination efforts. During this phase of purification it is also appropriate to address the possibility of whether the solubilizing detergent used is suitable for crystallization and for maintaining a monodisperse solution when the protein is highly concentrated. The effects of alternative detergents can be investigated by exchanging detergents while the target protein is bound to the affinity column. Should the affinity-column-purified sample require further cleanup, use of molecular-sieve chromatography is usually sufficient to remove minor contaminants and aggregates. On those occasions when residual contaminants cannot be isolated from the target protein on the basis of molecular weight differences, an alternative additional chromatographic step, such as ion-exchange chromatography, may be necessary .
There are currently several approaches for determining the structure of membrane proteins, notably X-ray crystallography, electron crystallography and NMR spectroscopy. Given its history of demonstrated success, X-ray crystallography is regarded as the most widely proven tool for structure-determination efforts. But target-protein characteristics, such as molecular weight, solubility and crystallizability, may dictate that other methodologies are better suited for a particular gene product. For example, small detergent-solubilized membrane proteins or peptides with a very large hydrophobic surface area to volume ratio, which may not have good solubility properties at high concentration, may be excellent candidates for NMR spectroscopy.
X-ray crystallography provides an established means for obtaining high-resolution structural data from membrane proteins. With this approach, molecular weight seldom limits the choice of target protein, and determination of structures at atomic-level resolution is a very realistic goal. Often diffraction from a single crystal is sufficient for high-resolution structure determination. In many cases, the difficulties that were in the past associated with interpreting X-ray diffraction amplitudes in terms of how they reflect the underlying crystal structure - known as the 'phase problem' - have been dramatically reduced through the use of multi-wavelength anomalous diffraction techniques that rely on the use of X rays of multiple wavelengths and externally provided anomalously scattering atoms that yield reference points within the crystal structure . For example, tunable synchrotron X-ray sources facilitate the rapid phasing of diffraction data obtained from selenomethionine-derivatized target proteins, prepared through the metabolic labeling of proteins expressed in E. coli. Synchrotron X-ray sources also make it possible to obtain high-resolution datasets from microcrystals , which are often no larger than 50 μm in their longest dimension, reducing potential bottlenecks associated with the need to optimize crystallization conditions in an effort to obtain large crystals. Further gains in sample throughput rates can be realized through automation-assisted screening of sample wells for the presence of crystals, and automated crystal handling and data collection.
The major challenge of the X-ray diffraction structure-determination approach lies in obtaining suitable three-dimensional crystals. As with soluble proteins, homogeneity and stability of the purified protein at high concentration is often critical for obtaining crystals. The strategies for crystallizing membrane proteins are similarly centered on reducing the solubility of the target protein under conditions that allow for the establishment of crystal-forming contacts between neighboring molecules . Protein solubility is typically lowered through the use of precipitating agents, such as ammonium sulfate and polyethylene glycols. Experimentally variable parameters affecting the degree and nature of molecule-to-molecule contact include pH, ionic strength and temperature. A factor unique to the crystallization of membrane proteins is the presence of the substantial concentrations of detergent required to maintain solubility of the target protein. To ensure solubility of a target protein the concentration of detergent must be kept above the critical micelle concentration (CMC) which, depending on the detergent in question, could be well into the millimolar range.
Membrane proteins have been crystallized using vapor diffusion (in which hanging and sitting drops of a solution containing the target protein are allowed to equilibrate with a reservoir solution containing a higher concentration of precipitant), and less frequently by dialysis and batch methods (where protein, precipitant and buffer are mixed to be at or very near the final concentrations required for crystallization). Sparse matrix screens (relatively small sets of crystallization conditions that survey a broad range of parameter space in coarse intervals) allow for rapid sampling of a diverse range of precipitant, pH and ionic strength conditions [29–31] have been successfully applied and are even available commercially (see, for example, ). These crystallization and screening methods lend themselves well to high-throughput robotics-based automation [33, 34]. Recently developed microfluidics devices can also support the rapid setup and evaluation of extensive crystallization screens using extremely small amounts of sample . For example, in one commercially available system it is possible to survey up to 144 conditions from a total of 3 μl of sample [36, 37]. This method mixes reagent and sample through the process of free interface diffusion, whereby the protein and reagent are free to move throughout the system, and may allow for novel high-throughput surveys of crystallization space.
Techniques directly targeting the unique concerns of membrane-protein crystallization have also been developed; these include methods involving the use of lipidic cubic phases  and bicelles . The rationale behind these methods is the notion that placing the solubilized protein back into a native-like environment will improve the chances of crystallization. Both of these approaches involve crystallization of the membrane protein within the context of lipid bilayers and have been used to produce well-ordered crystals of bacterial rhodopsins; but it remains to be determined to what extent the same approaches will apply to other membrane-protein families.
Several different NMR technologies utilize a wide variety of membrane-mimetic environments. Solution NMR requires isotropic motions of the protein, and hence membrane proteins must be solubilized within detergent micelles. In homogeneous monodisperse samples, membrane proteins typically maintain not only their secondary and tertiary structure, but also their quaternary structure within micelles. In solid-state NMR, membrane proteins can be characterized in aligned planar bilayers by using orientational restraints that relate each atomic site of the protein to a reference axis perpendicular to the plane of the lipid bilayer. Solid-state NMR can also be used with samples that are not uniformly aligned, such as multilamellar liposomes, by using distance and torsional restraints that, respectively, constrain the structure by interatomic distances or define the relative orientation of adjacent atomic groups. Moreover, it may be possible to characterize membrane protein structure by solid-state NMR using micro- and nano-crystals of membrane proteins again through distance and torsional restraints. While NMR does not require diffraction-quality crystallization of membrane proteins, sample preparation is still a bottleneck, whether it is at the stage of detergent solubilization of the protein at high concentration, the reconstitution of protein into liposomes, or the uniform alignment of bilayer samples.
Solution NMR methodology has advanced with new procedures, such as transverse relaxation optimized spectroscopy (TROSY), that aid in data collection of samples that tumble slowly on the NMR frequency scale (500 to 900 MHz). NMR spectra of proteins require the use of 15N and 13C isotopic labeling to achieve sensitivity and resolution in the spectra. The collection of structural restraints using this methodology is primarily from residual dipolar couplings (RDCs) that are derived from samples that have a slight degree of alignment with respect to the magnetic field and nuclear Overhauser effect (NOE)-derived inter-proton distances from samples that are extensively deuterated . Such deuteration is required to improve the resolution in 1H spectra. Excellent progress has been made recently in the development of partial alignment of these proteins by using stretched polyacryamide gels that generate an anisotropic environment for the protein . In other words, the protein in these gels has a slight preference for one orientation over other possible orientations. For α helices, a pattern with 3.6 resonances per cycle is observed, in a phenomenon known as a dipolar wave . Because α helices have 3.6 residues per turn about the helical axis, a pattern in the RDCs of the backbone amide 15N resonances is observed with the same periodicity. The amplitude of the waves represented on plots of RDCs versus residue number is characteristic of the tilt angle of the helix with respect to the alignment axis and the magnetic field axis. Recent success with this approach has resulted in submissions of structures to the PDB  and progress with polytopic oligomeric proteins is progressing in additional laboratories (see , for example).
In solid-state NMR two technologies are utilized , one requiring aligned planar bilayer samples and the other using magic angle spinning (MAS) samples, in which samples of liposomes or micro- or nano-crystals are rotated about an axis inclined at 54.7° with respect to the magnetic field. In this way the anisotropic properties of the spectra are removed and a solution-like spectrum is observed. Uniformly aligned bilayers yield anisotropic observables, such as dipolar and quadrupolar couplings, as well as anisotropic chemical shifts. In other words, these NMR spin interactions display an orientation dependence with respect to the axis of the magnetic field of the NMR spectrometer. In this way the observed couplings and chemical shifts can be related to the orientation of the atomic sites with respect to the bilayer normal, which is aligned parallel to the magnetic field. As for the dipolar waves described above, the spectra of uniformly aligned samples in which the 15N-1H dipolar interaction is correlated with the 15N chemical shift, result in circular patterns of resonances for α-helical segments with 3.6 resonances per turn, reminiscent of helical wheels. Here the patterns, known as PISA wheels [46, 47], represent an opportunity to assess helix tilt angles and orientations without the need for, or with minimal, resonance assignments, respectively. This methodology represents an excellent screening tool for low-resolution structural information. Spectral sensitivity and resolution have been dramatically improved in the past decade, such that backbone structures of small proteins are now possible, and complete structures of peptides have been demonstrated. MAS experiments lead to distance and torsion restraints that are highly complementary to orientational restraints. Resolution is dramatically improved in MAS experiments by using crystalline samples in which the conformation and environment of each protein is nearly identical.
While the first solid-state NMR structure was deposited in the PDB in 1997, since then another nine structures have been deposited in the data bank, and these nine were determined using either uniformly aligned samples or MAS samples. In addition, the structures of three β-barrel membrane proteins have been solved by solution NMR.
Two-dimensional crystals, which are sheet-like crystals one unit-cell thick, are ideally suited for electron-crystallographic structure-determination methods. Membrane proteins crystallized within the context of lipid bilayers represent one excellent example; numerous electron crystallography structural studies of membrane proteins have been performed using such specimens. There is no 'phase problem' in electron crystallography as there is in X-ray crystallography, since electron micrographs of crystalline samples yield images from which phases can be determined directly. To produce a structural dataset consisting of diffraction amplitudes and phases, electron diffraction patterns (to obtain more accurate amplitudes) and electron micrographs (to obtain phases) are collected from tilted and untilted two-dimensional crystals. These data are subsequently processed and merged into three-dimensional sets of structure factors. Three-dimensional density maps of the target molecules obtained from their structure factors are then modeled and interpreted in much the same manner as electron density maps derived from X-ray diffraction data. Recent improvements in electron microscope automation have led to increased data-collection rates and reduced processing times.
Naturally occurring two-dimensional crystals of bacteriorhodopsin (found in the purple membrane of halobacteria) have yielded the best quality electron crystallographic data from a membrane protein to date, allowing an atomic-level model of this membrane protein to be obtained [48, 49]. A substantial number of detergent-solubilized membrane proteins have been reconstituted to form two-dimensional crystals, some very well ordered and over 1 μm in size. Because of limits in sample tilting, however, the distribution of resolution in density maps produced from two-dimensional crystals is anisotropic. The quality of diffraction, in the best direction, from the most useful of these crystals has ranged from about 7 Å resolution, which is sufficient to reveal the presence of transmembrane helices, up to the 4 to 3 Å resolution range, where the main chain of the polypeptide can be modeled and the larger side chains assigned.
Several methodologies have evolved for obtaining two-dimensional crystals from solubilized membrane proteins; these include reconstitution of membrane proteins into lipid bilayers, and crystallization along lipid monolayers at air-water interfaces or on preformed lipid tubes . The approach based on lipid-bilayer reconstitution is the only method that to date has yielded high-resolution structural data. The reconstitution procedures involve mixing the detergent-solubilized target protein and lipid at relatively low lipid-to-protein ratios, followed by removal, or reduction in the concentration, of detergent. This may be done by dialysis, by adsorption of detergent to polystyrene beads, or by dilution of the sample. Upon removal of detergent, protein and lipid can associate to form membranes with a high density of proteins; under appropriate conditions, these are organized into crystalline arrays. The formation and quality of the resulting crystals depend on parameters such as the choice of lipid, protein concentration, protein-to-lipid ratio, detergents, rates of detergent removal, temperature and other factors, such as pH and ionic strength that are often found useful in three-dimensional protein crystallization. As with the other structure-determination techniques described above, the target membrane protein to be studied should be in a pure, homogeneous and stable state. The protein concentration required for these experiments is, however, substantially lower than for X-ray and NMR methods, at about l mg/ml.
Dramatic improvements in a range of technologies associated with membrane-protein structure determination have been realized over the past ten years, particularly in the areas of protein solubilization, crystallization and NMR sample preparation, as well as data collection and processing. Automation of processes in some of these areas is expected to further accelerate progress. The availability of a broad spectrum of fully sequenced genomes, coupled with advanced molecular biology techniques, means that literally thousands of membrane proteins can be made available for study. Clearly the time is right for membrane-protein structural genomics efforts to move into full swing.
The authors gratefully acknowledge Robert Nakamoto for helpful discussions and Young Do Kwon for figure preparation. This work was supported by funding from the National Institutes of Health (P01-GM64676) and by the US Department of Energy.