Genomic analysis of membrane protein families: abundance and conserved motifs
© Liu et al., licensee BioMed Central Ltd 2002
Received: 21 May 2002
Accepted: 7 August 2002
Published: 19 September 2002
Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families.
Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels.
We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families.
Genome-wide structural analyses in terms of patterns of protein folding have been useful in revealing functional and evolutionary relationships [1,2,3,4]. Given the abundance of membrane proteins, it would be highly desirable to have a similar analysis for this major category of structures; however, the number of known membrane protein structures remains small. Here we exploit the fact that membrane proteins can be classified into families on the basis of sequence similarities and topology, and use the family groupings to analyze genomic characteristics of membrane protein families.
Most transmembrane proteins are formed from bundles of helices that traverse the membrane lipid bilayer. It is estimated that 20-30% of the proteins in known genomes are of this type [3,4,5,6]. The most general description of the transmembrane helical regions (TMs) is that they comprise a region of 18 or more amino acids with a largely hydrophobic character. This sequence feature can be identified in primary sequences using hydrophobicity scales [7,8,9]. The most abundant amino acids in transmembrane regions are leucine, isoleucine, valine, phenylalanine, alanine, glycine, serine, and threonine. Taken together, these amino acids account for 75% of the amino acids in transmembrane regions [10,11,12]. Analysis of the distribution of amino acids has revealed patterns in TM regions, for example GxxxG, which are thought to be important in helix-helix interactions [11,12,13,14].
We took advantage of the classification of protein domains provided by others (Pfam-A and Pfam-B) , to identify families that appear to be polytopic membrane proteins, and augmented these lists with additional family members based on amino-acid sequence comparisons. Furthermore, we identified additional families on the basis of clustering of amino-acid sequences, resulting in 637 distinct families. We used these families to analyze amino-acid compositions in the helical regions, pair motifs, domain structures, and patterns of families, and arrive at a number of generalizations. Among these are that glycine, tyrosine, and proline appear frequently in conserved locations within family transmembrane helices and that the specific pair motifs are found in families that seem to be transporters, symporters, and channels. The number of kinds of domains and families seems to increase with the number of open reading frames (ORFs) in most genomes. Here we present our analysis and discuss these findings.
Classification of polytopic membrane protein domains
Using TMHMM, a membrane protein prediction program based on a hidden Markov model , TM-helices of membrane proteins in 26 genomes were predicted. Polytopic membrane domains were identified using the loop size between TM-helices as a guide. These domains were then classified into 231 Pfam-A and 318 Pfam-B families either by direct SWISS-PROT ID matching or by sequence similarity matching using FASTA . Of the aligned domains, most of their TM-helices also aligned well, especially in Pfam-A families, which have alignments based on manually crafted hidden Markov models. Unclassified domains were clustered into 121 families by their sequence similarities. For each family, a profile was constructed, as shown in Figure 1b. This included: an averaged hydrophobicity plot of all members in the family based on the Goldman-Engelman-Steitz (GES) scale ; a consensus sequence of the family, represented by a sequence logo plot ; and consensus sequences of the TM-helices. By analyzing the hydrophobicity plots, we can locate TM-helices in the aligned sequences in protein families, and assign a number of TM-helices to each family. Some families, including 3 in Pfam-A and 20 in Pfam-B, were eliminated at this step, owing to the ambiguity of TM-helices observed in the plot. From this process, we identified 228 Pfam-A, 298 Pfam-B and 121 clustered families for our analyses, with approximately 95% domains classified in Pfam families.
Analysis of the number of TM-helices in Pfam-A families of polytopic membrane domains
In general, most Pfam-A families tend to have a small number of TM-helices. For those with seven or fewer TM-helices, the number of families does not vary significantly with helix number, although there are more families with two or four TM-helices than with three, five, six, or seven. For families with more than seven TM-helices, the number of families decreases sharply as the number of TM-helices increases. Families with 12 TM-helices are the exception, however; they have a small peak in numbers against the overall downward slope of the plot. We also carried out the same kind of analysis on Pfam-A families that are annotated as transporters, symporters, and channels, and found that 12-TM-helix families are preferred by transporter-like families. In addition, most (11 out of 12) Pfam-A families with 12 TM-helices are transporter-like families. There seems to be a tendency for the transporter-like families to have an even number of TM-helices, because families with 2, 4, 6, 8, and 12 TM-helices have a relatively higher occurrence than those with a neighboring odd number of TM-helices.
Analysis of amino-acid distribution and pair motifs
We selected 168 families from Pfam-A that had more than 20 members. For each of these families, we then generated consensus sequences with conservation value (Rsequence) using the Alpro program . Relatively conserved amino acids in the consensus sequences (Rsequence value > 3.0, representing the top 15% Rsequence value of all amino acids) and in TM-helical regions were analyzed for their composition as well as for pair motifs.
As might be expected, the changes in prevalence of certain amino acids reflect their conservation in the consensus sequence. Therefore, glycine, proline and tyrosine are relatively conserved residues in TM-helical regions, and isoleucine, valine, methionine and threonine have relatively high mutability. This result correlates very well with the mutation data matrix (MDM) for multi-spanning transmembrane regions in membrane proteins . In the MDM of multi-spanning transmembrane α helices, isoleucine, methionine and valine are found to have relatively high mutability as hydrophobic residues, and serine and threonine also rank high in mutability as polar residues. In the matrix, proline appears to be highly conserved. Our results confirm these findings; in addition, we find that glycine and tyrosine are also highly conserved residues in polytopic TM-helices.
Top amino-acid pairs in transmembrane helices of the consensus sequences of classified Pfam-A families
List 1: top 50 pairs and their significance from Senes et al. 
List 2: top 50 pairs and their occurrences from random pairs
List 3: top 50 pairs and their occurrences in lists 1 and 2
6.35 × 10-34
8.36 × 10-24
3.61 × 10-21
4.79 × 10-21
1.29 × 10-16
5.73 × 10-16
2.12 × 10-15
4.52 × 10-15
3.75 × 10-14
1.09 × 10-12
2.17 × 10-12
9.69 × 10-12
9.06 × 10-10
3.87 × 10-09
4.89 × 10-09
1.33 × 10-08
1.83 × 10-08
2.95 × 10-08
7.71 × 10-08
8.98 × 10-08
1.52 × 10-07
2.93 × 10-07
4.55 × 10-07
6.3 × 10-07
1.63 × 10-06
3.27 × 10-06
3.99 × 10-06
4.97 × 10-06
5.35 × 10-06
5.35 × 10-06
5.4 × 10-06
5.58 × 10-06
6.04 × 10-06
7.45 × 10-06
7.93 × 10-06
1.13 × 10-05
1.38 × 10-05
1.43 × 10-05
2.51 × 10-05
2.7 × 10-05
2.95 × 10-05
3.1 × 10-05
5.74 × 10-05
6.84 × 10-05
8.25 × 10-05
9.87 × 10-05
9.95 × 10-05
1.19 × 10-04
1.24 × 10-04
1.51 × 10-04
Association of GG4 and GG7 pairs with Pfam-A families annotated as transporters, symporters, and channels
Pfam-A families as transporter/ symporter/channel
All Pfam-A families
GA4 AG4 AA4
GS4 SG4 SS4
GA7 AG7 AA7
GS7 SG7 SS7
A comparison between amino-acid composition of the conserved residues in the TM-helices of 45 transporter Pfam-A families and that of the other 123 Pfam-A families
Conserved residues in TMs of transporter families (%)
Conserved residues in TMs of the other families (%)
Genome-wide analysis of families of polytopic membrane domains
We classified polytopic membrane domains into Pfam-A, Pfam-B and self-clustered families. Figure 4b shows the distribution of these three kinds of families in all the genomes. Most of the classified polytopic membrane domains belong to Pfam-A and Pfam-B, which cover 95% of classified domains.
This hypothesis was supported by analysis of Figure 5d, which shows the number of families of polytopic membrane domains in relation to the number of ORFs in studied genomes. The number of families seems to have a logarithmic relation in all studied genomes, including C. elegans. Given that C. elegans has an unusually large number of polytopic membrane domains but a normal number of families, the amplification of polytopic membrane domains is limited to a few families.
Polytopic membrane domains of integral membrane proteins in 26 genomes have been classified into 637 families, which include 218 Pfam-A, 298 Pfam-B and 121 clustered families. Only families that are reasonably big (≥ 4 members) were selected. The classified families were used for amino-acid distribution and pattern studies for genome-wide analysis.
Our studies on amino-acid distribution and patterns were conducted on Pfam-A families. We also analyzed Pfam-B and the clustered families, but found fewer conservations, probably because the Pfam-B and the clustered families are not as carefully aligned as Pfam-A families. In the analysis of amino-acid positions, glycine, proline and tyrosine were found to be the most conserved residues in TM-helical regions, whereas isoleucine, valine, methionine and threonine were identified as the least conserved residues, relative to average occurrence. This result is mostly consistent with previous results from an MDM . Although hydrophobic residues such as leucine and isoleucine are among the most abundant residues in TM-helices, they are not well conserved in position. The observed conservation in position for residues such as glycine, proline and tyrosine raises the question of whether these residues are associated with the functions of integral membrane proteins.
We also studied amino-acid pair motifs in the conserved sequences in classified families. We show that pairs consisting of a glycine and another small amino acid (glycine, alanine or serine) and facing the same direction in TM α-helices are common in conserved positions. As those pair motifs have been shown to be important for packing of TM-helices [12,13,14], conservation of those motifs probably implies their importance in folding stability of integral membrane proteins, as is the case with hydrophobic residues found in the core regions of soluble proteins.
Our results have some interesting implications for the classified Pfam-A families annotated as transporters, symporters and channels. First, there is a preference for 12 TM-helices among these families. As there is no 12-TM transporter protein structure available, we do not know exactly why a 12 TM-helix bundle is preferred for transport. The structure of MsbA from Escherichia coli , an ATP-binding cassette (ABC) transporter homolog, was recently solved. It contains 12 TM-helices in a homodimer of two 6-TM-helical bundles, which form a central chamber to translocate substrates. However, it is unlikely that polytopic membrane domains in the 12-TM Pfam-A families have a structure like that of ABC transporters; as there is no obvious sequence similarity within the sequence containing the 12 TM-helices, it is unlikely to form two 6-TM-helical bundles. By looking at structures of other transport proteins, including the potassium channel , the mechanosensitive ion channel , the aquaporin water channel , and the glycerol facilitator channel , it is apparent that 7-10 TM-helices are needed to form a tunnel and transport molecules. This means that proteins with a small number of TM-helices must oligomerize to form a proper tunnel to translocate molecules through the membrane. In addition, families of these proteins tend to have GxxxG and GxxxxxxG instead of related motifs that have one or both glycines changed to alanine or serine. While this preference is interesting, we do not know its origin. Perhaps it reflects especially tight packing among helices in transporters, permitting the Cα-H...O hydrogen bonding that has been discussed .
We also studied the distribution of classified families in 26 genomes. Although the classified families of polytopic membrane domains do not provide complete coverage of the total potential polytopic membrane domains, we think they include most membrane proteins that have essential functions in these genomes. The excluded domains are either unique in function for the organism or falsely predicted. In most genomes the number of classified polytopic membrane domains seems to have a linear relation with the number of ORFs. However, C. elegans is an outlier to this trend. By studying the families in C. elegans, we found that it has an exceptional number of 7-TM-helical membrane domains, most of which are annotated as chemoreceptors. As C. elegans cannot see or hear but must search for food, chemosensation is key to survival. C. elegans mediates chemosensation by 32 neurons that are mostly arranged in bilateral pairs on the left and right sides, and it is estimated that there are about 500 G-protein-coupled receptors that act in chemosensation . We have now identified many chemoreceptors (750), classified into three large families. Therefore, classification of polytopic membrane domains into families gives us another way to look at the distribution and functions of integral membrane proteins in genomes.
Materials and methods
In this study, the following databases were used: SWISS-PROT (release 39 and updated to 19 December, 2000) , which contains 91,132 protein entries; Pfam (release 6.1) , which contains 2,727 protein families in Pfam-A and 40,230 families in Pfam-B; Proteome Analysis Database , where complete non-redundant proteomes were downloaded. We selected eight genomes from archaea: Archaeoglobus fulgidus (AF), Aeropyrum pernix K1 (AP), Halobacterium sp. (HS), Methanococcus jannaschii (MJ), Methanobacterium thermoautotrophicum (MT), Pyrococcus abyssi (PA), Pyrococcus horikoshii (PH), and Thermoplasma acidophilum (TA); 14 genomes from bacteria: Aquifex aeolicus (AA), Borrelia burgdorferi (BB), Bacillus subtilis (BS), Chlamydia pneumoniae strain AR39 (CP), Chlamydia trachomatis (CT), E. coli strain K12 (EC), Haemophilus influenzae (HI), Helicobacter pylori strain 26695 (HP), Mycobacterium tuberculosis (MyTu), Mycoplasma genitalium (MG), Mycoplasma pneumoniae (MP), Rickettsia prowazekii (RP), Synechocystis sp. (SS), and Treponema pallidum (TP); four genomes from eukaryotes: Saccharomyces cerevisiae (SC), D. melanogaster (DM), C. elegans (CE), and Arabidopsis thaliana (AT).
Classification of polytopic membrane protein domains
Figure 1a shows our complete classification procedure. We extracted 8,301 protein entries in the SWISS-PROT database containing no less than two TRANSMEM annotations in the FT field. In these proteins, a total of 52,636 transmembrane (TM) regions were allocated to proteins in the Pfam database. By analyzing the location of TM regions in protein domains of each Pfam family, we were able to identify families that contain polytopic membrane protein domains. We went through a relatively conservative procedure to identify potential families of polytopic membrane domains. First, a Pfam family needed to have a significant number of proteins containing no fewer than two TM regions to be identified as a polytopic membrane domain family. Second, all families in Pfam-A and some in Pfam-B that have more than seven members are analyzed, as the Pfam-B database is under development and contains thousands of small protein families. Finally, we identified 183 Pfam-A and 152 Pfam-B families. Proteins in these families contain 36,878 TM regions, representing approximately 70% of the total TM regions extracted from SWISS-PROT. We analyzed sizes of the loops between all the TM regions, as shown in the inner chart of Figure 1. By Pfam's protein domain classification, most loops (> 95%) are short peptides, containing less than 80 amino acids.
Proteins from 26 genomes were submitted to TMHMM server for TM-helix prediction . Predicted membrane proteins were searched for polytopic membrane domains, using a rule, generated from the above result, that the intramembrane-domain loop sizes must be less than 80 amino acids. To identify domains that are included in the Pfam families that have been identified, we searched the defined polytopic membrane domains for SWISS-PROT ID matches and regional matches. Unmatched domains are further classified on the basis of Pfam's classification, and additional 48 Pfam-A and 166 Pfam-B families are identified (small size Pfam-B families with no less than four members and no less than three matches are selected). In total, we identified 231 Pfam-A and 318 Pfam-B families as polytopic membrane domains. As not all proteins from the 26 genomes are included in Pfam, we then tried to assign the unclassified polytopic membrane domains to the identified Pfam families by sequence similarity matching to proteins in these families. We used the FASTA program  to search for matches, and matches with E-values less than 0.01 were considered positive. Obviously, one can assign Pfam-A domains using the HMMer software , which they are closely associated with. However, we chose to take a somewhat simpler tack, using FASTA. This is a somewhat more conservative approach (finding fewer homologs) which has the advantage of using consistent thresholds that can be applied to all the searches. Query domains were assigned to Pfam families that their best matches belong to.
As for those that have not been classified into Pfam families by either ID match or by sequence-similarity match, we tried to cluster these into families on the basis of their sequence similarities. This procedure was done by an all-against-all sequence similarity search (E-value < 0.01) using FASTA, and polytopic membrane domains were clustered by applying a multiple linkage clustering method  to the FASTA results. N family members must have more than 0.9N (N-1) links to other members, with tolerance of 10% missing links among members. We selected 121 clustered families that contain no fewer than four members, and aligned protein sequences in each family using the CLUSTAL W program . For a complete list of assigned polytopic membrane domains see Additional data files and .
TM-helix identification in the families of polytopic membrane domains
We assume that all protein domains in a classified family have a defined number of TM-helices. To identify the number of TM-helices, we made a hydrophobic plot for each family of polytopic membrane domain. We took the aligned sequences in Pfam's families and in clustered families, and calculated the averaged GES hydrophobic values  of all the residues at each aligned position (Deleted and inserted residues, represented by '-' and '.' respectively, are given 0 individual values.) The plot for each family was generated by the averaged GES values along their corresponding aligned positions. Most hydrophobic regions were clearly defined, as most TM-helices aligned well in each family. By identifying hydrophobic regions in the plots, we assigned numbers of TM-helices to classified families of polytopic membrane proteins. We also eliminated 3 Pfam-A and 20 Pfam-B families, as they did not contain multiple hydrophobic regions in their hydrophobicity plots. Therefore, we have 228 Pfam-A, 298 Pfam-B and 121 clustered families for further analysis.
Analysis of amino-acid distribution and pair motifs
We analyzed 168 Pfam-A families with more than 20 members and generated consensus sequences with their sequence logos of all aligned sequences in these families using the Alpro sequence logo program . The selected family size threshold of 20 members is somewhat arbitrary. We chose it because: first, a significant portion (~75%) of the 228 classified Pfam-A families had more than 20 members; and second, the potential bias from small families could be reduced as they tend to have more conserved residues than big families. However, we can show that our results remain unaffected by changing this threshold. In particular, we analyzed Pfam-A families containing more than 25, 30, 35, or 40 members, and got essentially the same results. Amino acids with sequence conservation values (Rsequence) of no less than 3.0 (top 15% of all values) were considered as conserved residues. For all the families, we counted the occurrences of amino acids in the consensus sequences and in all aligned sequences in hydrophobic regions, which are defined to have no fewer than 10 continuous amino acids with GES hydrophobicity value greater than 0.
We used the pair definition from a previous study . For example, a pair XYn (X and Y represent amino acids and n a number) corresponds to amino acids X and Y separated by (n-1) residues. We analyzed occurrences of pair motifs of all combinations of amino acids separated by 1 to 10 residues. This result was compared with a previous study of the 200 most significant over-represented pairs [12,33].
Analysis of the families of polytopic membrane domain in genomes
Using simple cross-referencing based on the above procedure, proteomic entries in each genome were searched for matches of polytopic membrane domains of classified families. Numbers of membrane domains in classified families were counted and analyzed in all genomes studied.
Additional data files
M.G. thanks the Keck foundation for financial support. Y.L. is supported by an NLM postdoctoral fellowship. This research was supported in part by NIH grant T15 LM07056 from the National Library of Medicine. We thank Alessandro Senes and Steven Aller for helpful discussions.
- Paulsen IT, Sliwinski MK, Saier MHJ: Microbial genome analyses: global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities. J Mol Biol. 1998, 277: 573-592. 10.1006/jmbi.1998.1609.PubMedView ArticleGoogle Scholar
- Paulsen IT, Nguyen L, Sliwinski MK, Rabus R, Saier MHJ: Microbial genome analyses: comparative transport capabilities in eighteen prokaryotes. J Mol Biol. 2000, 301: 75-100. 10.1006/jmbi.2000.3961.PubMedView ArticleGoogle Scholar
- Gerstein M: A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J Mol Biol. 1997, 274: 562-576. 10.1006/jmbi.1997.1412.PubMedView ArticleGoogle Scholar
- Gerstein M: Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census. Proteins. 1998, 33: 518-534. 10.1002/(SICI)1097-0134(19981201)33:4<518::AID-PROT5>3.0.CO;2-J.PubMedView ArticleGoogle Scholar
- Wallin E, von Heijne G: Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 1998, 7: 1029-1038.PubMedPubMed CentralView ArticleGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305: 567-580. 10.1006/jmbi.2000.4315.PubMedView ArticleGoogle Scholar
- Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982, 157: 105-132.PubMedView ArticleGoogle Scholar
- Engelman DM, Steitz TA, Goldman A: Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biophys Chem. 1986, 15: 321-353. 10.1146/annurev.bb.15.060186.001541.PubMedView ArticleGoogle Scholar
- von Heijne G: Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol. 1992, 225: 487-494.PubMedView ArticleGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: A mutation data matrix for transmembrane proteins. FEBS Lett. 1994, 339: 269-375. 10.1016/0014-5793(94)80429-X.PubMedView ArticleGoogle Scholar
- Arkin IT, Brunger AT: Statistical analysis of predicted transmembrane alpha-helices. Biochim Biophys Acta. 1998, 1429: 113-128. 10.1016/S0167-4838(98)00225-8.PubMedView ArticleGoogle Scholar
- Senes A, Gerstein M, Engelman DM: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J Mol Biol. 2000, 296: 921-936. 10.1006/jmbi.1999.3488.PubMedView ArticleGoogle Scholar
- Russ WP, Engelman DM: The GxxxG motif: a framework for transmembrane helix-helix association. J Mol Biol. 2000, 296: 911-919. 10.1006/jmbi.1999.3489.PubMedView ArticleGoogle Scholar
- Senes A, Ubarretxena-Belandia I, Engelman DM: The Calpha-H...O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions. Proc Natl Acad Sci USA. 2001, 98: 9056-9061. 10.1073/pnas.161280798.PubMedPubMed CentralView ArticleGoogle Scholar
- Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res. 2000, 28: 263-266. 10.1093/nar/28.1.263.PubMedPubMed CentralView ArticleGoogle Scholar
- Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28: 45-48. 10.1093/nar/28.1.45.PubMedPubMed CentralView ArticleGoogle Scholar
- Gerstein M: How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Fold Des. 1998, 3: 497-512.PubMedView ArticleGoogle Scholar
- Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988, 85: 2444-48.PubMedPubMed CentralView ArticleGoogle Scholar
- Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990, 18: 6097-6100.PubMedPubMed CentralView ArticleGoogle Scholar
- Branden C, Tooze J: Introduction to Protein Structure. 1991, London: Garland Publishing;Google Scholar
- Corpet F, Servant F, Gouzy J, Kahn D: ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 2000, 28: 267-269. 10.1093/nar/28.1.267.PubMedPubMed CentralView ArticleGoogle Scholar
- Chang G, Roth CB: Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters. Science. 2001, 293: 1793-1800. 10.1126/science.293.5536.1793.PubMedView ArticleGoogle Scholar
- Doyle DA, Morais Cabral J, Pfuetzner RA, Kuo A, Gulbis JM, Cohen SL, Chait BT, MacKinnon R: The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science. 1998, 280: 69-77. 10.1126/science.280.5360.69.PubMedView ArticleGoogle Scholar
- Chang G, Spencer RH, Lee AT, Barclay MT, Rees DC: Structure of the MscL homolog from Mycobacterium tuberculosis: a gated mechanosensitive ion channel. Science. 1998, 282: 2220-2226. 10.1126/science.282.5397.2220.PubMedView ArticleGoogle Scholar
- Murata K, Mitsuoka K, Hirai T, Walz T, Agre P, Heymann JB, Engel A, Fujiyoshi Y: Structural determinants of water permeation through aquaporin-1. Nature. 2000, 407: 599-605. 10.1038/35036519.PubMedView ArticleGoogle Scholar
- Fu D, Libson A, Miercke LJ, Weitzman C, Nollert P, Krucinski J, Stroud RM: Structure of a glycerol-conducting channel and the basis for its selectivity. Science. 2000, 290: 481-486. 10.1126/science.290.5491.481.PubMedView ArticleGoogle Scholar
- Bargmann C: Neurobiology of the Caenorhabditis elegans genome. Science. 1998, 282: 2028-2033. 10.1126/science.282.5396.2028.PubMedView ArticleGoogle Scholar
- Apweiler R, Biswas M, Fleischmann W, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva EV, Mittard V, Mulder N, Phan I, Zdobnov E: Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes. Nucleic Acids Res. 2001, 29: 44-48. 10.1093/nar/29.1.44.PubMedPubMed CentralView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Kaufman L, Rousseeuw PJ: Finding Groups in Data: An Introduction to Cluster Analysis. 1990, New York: John Wiley and SonsView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.PubMedPubMed CentralView ArticleGoogle Scholar
- Index of genome/tms. [http://bioinfo.mbb.yale.edu/genome/tms]
- TMSTAT: statistical analysis of transmembrane sequences. [http://engelman.csb.yale.edu/tmstat/]