Recent developments in membrane-protein structural genomics
© BioMed Central Ltd 2005
Published: 3 January 2006
Skip to main content
© BioMed Central Ltd 2005
Published: 3 January 2006
Recent work has identified the topology of almost all the inner membrane proteins in Escherichia coli, and advances in nuclear magnetic resonance spectroscopy now allow the determination of α-helical membrane protein structures at high resolution. Together these developments will help overcome the current limitations of high-throughput determination of membrane protein structures.
As noted in a previous review , the major bottlenecks in membrane protein structural genomics are the identification of potential membrane proteins in selected genomes and the production of the milligram quantities of protein necessary for most structure determination techniques. In most cases, accurate homology-based prediction of protein type and function is not possible for membrane proteins, as currently available bioinformatic tools detect membrane proteins in genomes solely on the basis of predicting transmembrane segments , and predictions from different programs sometimes do not agree with one another. To provide more information for identifying and characterizing predicted membrane proteins, Daley and colleagues  recently used a combination of bioinformatic and experimental approaches to develop a successful method for the topology analysis of almost all the inner membrane proteins in the Escherichia coli genome. Topological models of membrane proteins describe the numbers of transmembrane segments and the orientation of the protein with respect to the lipid bilayer. An accurate topology model of a membrane protein not only provides reliable information to aid the identification of membrane proteins but is also important for functional protein analysis.
Experimental approaches to determining topology usually deal with proteins individually and are very time-consuming. In contrast, Daley et al.  first used a simple and reliable experimental approach to determine the location of the carboxyl termini of nearly all the inner membrane proteins in E. coli. They genetically fused the reporter tags alkaline phosphatase (PhoA) or green fluorescent protein (GFP) to the carboxyl terminus of each prospective membrane protein sequence to exploit the fact that PhoA activity can only be detected in the periplasm (the space between the inner and outer membranes of E. coli), whereas GFP only fluoresces in the cytoplasm. The location of the carboxyl terminus of a membrane protein with respect to the cytoplasmic membrane can thus be accurately determined. The authors then used the experimentally determined carboxyl terminus location as a constraint for the widely used hidden Markov model (HMM) program TMHMM for transmembrane topology prediction  to generate a topology model for each protein.
Out of approximately 1,000 genes predicted by TMHMM to be inner membrane proteins in the E. coli genome, Daley and coworkers  focused on 737 proteins. Other proteins predicted to have a single transmembrane segment (monotopic proteins) were left out of the study, as it remains a major challenge to distinguish secreted proteins from monotopic integral membrane proteins; even so, Daley et al.  were able to determine the locations of the carboxyl termini of 502 proteins out of 665 proteins whose genes could be cloned into the vectors used. In addition, the carboxy-terminal location of another 99 proteins out of the 737 proteins was determined by finding their homologs among the 502 experimentally determined proteins. When the resulting set of 601 proteins was compared with 71 proteins for which the location of the carboxyl terminus was known previously, 69 agreed with previous assignments. Further studies are needed to resolve the discrepancies associated with the remaining two proteins. This brings the success rate of the carboxyl terminus assignment in the study by Daley et al.  to the order of 99% or higher. The accuracy of carboxyl terminus prediction using TMHMM alone was tested for all the 601 proteins, and was only 78%. Significant improvements in the quality of the topology models for these inner membrane proteins have therefore been achieved by using the experimentally determined constraints. This combination of bioinformatic and experimental approaches has laid a foundation for the functional analysis of these inner membrane proteins, and the method can be readily applied to integral membrane proteins of other genomes. An interesting finding by Daley et al.  is that 57% of the 601 proteins studied have both their amino and carboxyl termini on the cytoplasmic side of the membrane. This indicates that two closely spaced transmembrane helices separated by a short hydrophilic loop ('helical hairpin') might be a basic building block of membrane proteins.
One of the major concerns for membrane protein production in bacteria is the potential toxicity of these proteins to the host, limiting the ability to express proteins at high level . Another very important finding of Daley et al.  is therefore that the overexpression of a vast majority of the membrane proteins fusion constructs had only a limited effect on cell growth. Not only are these proteins typically not toxic, but it was also estimated that about 50% of the GFP fusion proteins could be overexpressed with little harmful effects - a rate similar to the overexpression usually achieved for soluble proteins. There are many possible reasons why the other 50% of these proteins were not overexpressed; their low stability in the host cells might be one of them. In a study of the attempted expression of 99 putative membrane proteins from Mycobacterium tuberculosis in E. coli, not a single case of cell lysis was observed . In the case of the mycobacterial proteins, the use of E. coli codons and strains, the T7 promoter, and short His-tags as reporters, together with the choice of strain for the expression host, was shown to allow the expression of 'foreign' proteins with a broad range of molecular weights and number of transmembrane helices. Some 50% of the 99 putative mycobacterial protein sequences were expressed and 25% were overexpressed, in good agreement with the results of Daley et al. .
Another significant challenge for structural genomics is the production of purified membrane proteins in large quantities from cloned genes. As just discussed, Daley et al.  and others  have shown that a significant percentage of prokaryotic integral membrane proteins can be readily produced. The GFP fusion construct used by Daley et al.  has a cleavable His8-tag, which allows the proteins to be purified by Ni-affinity chromatography by a standard protocol. It thus seems that the production of membrane proteins in large enough quantities for structure determination can be achieved in bacteria, and this may no longer be the rate-limiting step for membrane protein structural genomics.
It was noted by Daley et al.  that most of the E. coli membrane proteins whose function is still unknown have fewer than six transmembrane helices. This indicates a systematic lack of studies with the smaller integral membrane proteins and reflects the fact that most of the membrane protein structures obtained by X-ray diffraction represent large membrane proteins or membrane protein complexes. This bias is likely to be because the larger proteins form crystals more easily than smaller proteins. The larger the protein, the larger the ratio of protein volume to the protein surface area in contact with lipid, which is more favorable to the development of electrostatic contacts between unit cells in a crystal. The smaller the ratio, the more difficult it is to develop these electrostatic contacts. On the other hand, solution and solid-state NMR spectroscopy may be better suited for determining the structures of smaller proteins, and are therefore largely complementary to X-ray crystallography . Each of these NMR methodologies has its advantages, and very significant breakthroughs have been made in the past year in both technologies. For example, detailed comparisons of a wide range of detergents have guided improved sample preparation protocols for solution NMR . Further sample optimization for expression testing, purification and NMR sample preparation was reported by Tian and colleagues . Today, excellent tools are in place for obtaining excellent samples of membrane proteins of modest molecular weight. Slightly anisotropic (directionally dependent) samples of detergent-solubilized membrane proteins represent specific structural challenges, but methods for preparing such samples have recently become better [9, 10], and the characterization of helical tilt and orientation has also been improved .
After several decades of hard work, high-resolution structures of α-helical membrane proteins have finally been determined by solution NMR. Most recently, several new structures obtained by solution NMR have appeared that foreshadow a new wave of membrane-protein structures. Oxenoid and Chou  have determined the structure at atomic resolution of an α-helical membrane protein, human phospholamban pentamer, embedded in oriented aggregates (micelles) of the detergent dodecylphosphocholine, which substitutes for the lipid membrane. α-Helical membrane proteins are those in which the transmembrane portion of the protein is in the form of one or more α helices rather than a β barrel. The structure revealed that the phospholamban pentamer forms a channel that allows many physiologically relevant small ions, such as Na+, K+ and Cl-, to pass through the membrane. Howell et al.  have solved the backbone structure of the two α-helix membrane protein MerF, a component of the bacterial mercury detoxification system. These studies show that solution NMR spectroscopy can be used for structural determination of small and medium-sized α-helical membrane proteins.
It has long been thought that bicelles (bilayered mixed micelles) would be an ideal system in which to study membrane proteins, but in practice they have been used primarily to study synthetic peptides. An exciting development in this context is the optimization by De Angelis and colleagues  of the use of magnetically aligned bicelles for high-resolution structural determination of membrane proteins by solid-state NMR spectroscopy. The key to these workers' success is the use of nonhydrolyzable ether-linked lipids to prepare stable bicelles. They showed that purified small molecular membrane proteins in bicelles undergo rapid rotational diffusion around an axis perpendicular to the bilayer; high-resolution structure determination then becomes possible because of the averaging of the nuclear spin interactions, which would otherwise give a very broad NMR signal. Careful studies indicated that the membrane proteins were embedded in bicelles with little or no structural distortion, which often occurs in micelle preparation. Structural characterization is aided by the observation of a helical wheel-like pattern of the resonances in the spectra, called the PISA wheel [15, 16]. The structure of MerF in bicelles is close to being finished (S. Opella, personal communication). It will provide an ideal system for studying the structure and mechanism of action of this and other membrane proteins in a lipid bilayer environment under fully hydrated physiological conditions.
The current rate at which unique structures are being solved for membrane proteins resembles the situation for soluble proteins 20 years ago (see Figure 1). As the international efforts of structural genomics start to focus on membrane proteins it is reasonable to expect that more and more high-resolution structures will become available. The time may finally have come for membrane protein structural genomics to move forward at the same pace as the rest of the field, and both solution and solid-state NMR spectroscopy will be technologies central in achieving this goal.
The authors thank S.J. Opella for helpful discussions. The work is supported by funding from the National Institutes of Health (P01-GM64676).