Polymorphism in regulatory gene sequences
© GenomeBiology.com 2000
Published: 20 December 2000
The extensive polymorphism revealed in non-coding gene-regulatory sequences, particularly in the immune system, suggests that this type of genetic variation is functionally and evolutionarily far more important than has been suspected, and provides a lead to new therapeutic strategies.
Periodontal , osteoarthritis  and others
A (not H)
N-C 5' and intron
N-C 5' and splice variant
Juvenile idiopathic arthritis 
Asthma and atopy 
RA, SLE and others (Several groups)
C (signal sequence)
Atopic dermatitis 
HIV, TB, JRA 
C and intron
N-C 5' and intron (and C)
MHC class II
IL-12R/Tpm1 (transcription factor?)
Cutaneous leishmaniasis 
The collection of genes in Table 1 is of miscellaneous origin. Some members, such as the polymorphisms in the major histocompatibility complex (MHC) class II promoters, emerged from a decades-long search for the genes responsible for variation in immune function and disease susceptibility. Tpm1 (T cell phenotype modifier-1) is a major determinant of Th1/Th2 balance of T-helper-cell subsets that falls into the same category. Others, such as the genes for interleukin (IL)-1, IL-4, IL-10 and tumor necrosis factor-α, (TNFα), were first identified because of their functional importance, and were then scrutinized for allelic variations associated with disease. Other instances of polymorphism - in interferon γ (IFNγ) and IL-5Rα, for example - were first identified by microsatellite-based genome searches (quantitative trait loci analysis). This last category is the one most likely to grow in future as it has no prior bias and, therefore, provides objective sampling of disease-associated genes . It has been validated by the concordance between searches carried out in humans and animals . The ACE1 gene, encoding the angiotensin I-converting enzyme involved in blood pressure regulation, is also included in Table 1 because the biological impact of its polymorphisms are so well understood. Substitutions in the regulatory regions of ACE1 are known to have a larger effect than those in the coding regions.
The preponderance of non-coding polymorphism
Non-coding polymorphisms make up a clear majority of the polymorphisms listed in Table 1. Furthermore, the coding polymorphisms include some special cases. Thus, that in IL-2 is of minor interest because of its limited disease association. Those in Tpm1 and TGFβ are, in a sense, regulatory, as discussed below. The coding polymorphisms of the chemokine receptors were presumably selected by infection , and thus fall into the extrovert category that has otherwise been excluded from this survey.
Most of the examples shown in Table 1 involve the orchestration of the immune response by tightly regulated cytokines and their receptors, where regulatory polymorphism might be expected to play an unusually important part. Reassuringly, increasing evidence indicates that cis-regulatory variation also predominates in the evolution of body shape, as recently reviewed . Carroll's review  cites the central role of cis-regulatory elements in the evolution of modern maize from its ancient progenitor teosinte - as predicted .
It is striking how rich the immune system turns out to be in introvert polymorphism, considering the great importance to it of extrovert polymorphism - most notably in the MHC. Indeed, the genetics of the immune system was concerned in the past almost exclusively with the extrovert function of antigen-recognition. Furthermore, genetic variation of the extrovert type is a major feature of the evolution of the host-parasite relationship, extending far beyond the immune system. In the well-studied case of resistance to malaria, it includes the genes encoding the hemoglobins, glucose-6-phosphate dehydrogenase, and the Duffy blood group. Further examples of parasite-selected coding polymorphisms can be expected, for instance in the host proteins that are exploited by intracellular bacteria such as Listeria and Salmonella. Extrovert genes vary not only between alleles (for example, the MHC and the nutrient-handling plant allozymes) but also between gene duplicates (as in lymphocyte receptors for antigens, olfactory receptors and plant avirulence receptors). Yet in spite of this remarkable range, it is hard to believe that the extrovert genes comprise more than a minor part of the total genome.
Heterozygote advantage - or not?
To what extent are polymorphisms in the regulatory regions subject to natural selection? They could in principle be neutral, transient (reflecting allele replacement or population mixing) or balanced (through increased fitness of the heterozygote). Balanced polymorphism is potentially valuable for the insight it provides into therapeutic approaches. Ideas that have been proposed for balance of non-coding polymorphisms include: a low rate ACE promoter for regulation of systemic blood pressure and a high-rate one for local wound healings; a constitutive insulin promoter for negative selection in the thymus and a regulated one for insulin production in the pancreas; a low-rate cytokine promoter for resistance to some types of parasite and a high-rate one for resistance to others; and slow-rate MHC class II promoter for T-cell Th2 responses and a high-rate one for Th1 responses. The common theme is that the alleles are expressed in different circumstances. As differentiation proceeds, one allele gets transcribed in one type of cell, while the other gets transcribed in another. These ideas seem reasonable enough, but so far lack firm support. Little is known about differential allelic transcription in different cell types. The various mechanisms proposed would all provide selective advantage for heterozygotes, and would thus increase the flexibility of the system.
The information obtained from disease associations may be compared with the much larger body of sequence comparisons emerging from studies using DNA microarrays, which do not (yet) relate to gene function. A recent study  found sequence diversity to be almost identical for coding and non-coding regions, but with over twice as much silent as replacement substitution in the coding sequences. These newer data give the same overall impression as the disease associations surveyed here, even though the DNA microarray screen did not distinguish between regulatory and non-regulatory sequences. The study also made comparisons with sequence data from apes, but did not compare coding with non-coding regions. A 15-fold variation in nucleotide diversity across genes (coefficient of variation was 74% for non-coding segments) was noted. This opens up a reasonable prospect of finding functionally significant 'nests' of single nucleotide polymorphisms (SNPs) such as are present in MHC class II promoters  - sequences that overlap transcription factor binding sites and that are rich in SNPs. Of particular interest are transcription factor recognition elements such as the cAMP response element (CRE), which occurs in many genes, allowing comparisons of its level of polymorphism in differing circumstances. CRE activates transcription of target genes in response to a diverse array of stimuli, including peptide hormones, growth factors and neuronal activity. Our published  and unpublished data on the CRE sequences in MHC gene promoters indicate that polymorphism varies significantly in level between genes with different functions and, as expected, tends to localize not so much in the CRE sequence itself as in its immediate neighborhood.
Care is called for in interpreting these evolutionary patterns. One cannot just extrapolate from a snapshot of contemporary intraspecific variation, even though it is this that provides much of the raw material for long-term change. It may be reassuring to find the importance of cis-regulatory elements repeating itself in the evolution of mice, maize and the vertebrate body, as emphasized here, but that does not mean that the course of evolution is so simple.
New avenues for therapy
The importance of non-coding polymorphism sends a strong message to the therapeutic strategists: search the genome for sites of high non-coding polymorphism, identified as SNP nests, by whatever tools come to hand. They will tell you where nature has found a way of intervening at an important checkpoint. Follow her lead! But it is not quite that simple. The counter argument is that nature tends to conserve functionally important sequences. So the art of therapeutic strategy will be to strike the right balance between these two messages. It would make things easier if we could point to some really important checkpoint that has already been identified by genetics alone. That has not yet been achieved, but current progress in genomics tells us that we may not have long to wait.
Hitherto the collection of human polymorphism data has been a sort of cottage industry, where various groups have chosen to focus on different genes without much rhyme or reason. A more systematic approach can now be formulated, based on whole-genome sequencing. Before long the mouse, and later the chimpanzee, genome will be sequenced. Then the following scenario could be applied: First, proximal promoters in the human genome would be identified via the Eukaryotic Promoter Database [14,15] and by other means. Next, upstream cis-regulatory sequences would be identified via their conservation between mouse and human and by other means . Conservation between chimp and human may be too high for this kind of use. Finally, the cis-regulatory sequences identified in this way would then be scanned for divergence between human and chimpanzee, thus identifying the sites that have most responded to selective pressure. These would therefore specify candidate checkpoints of importance to the functioning of the body. But before setting out to exploit this information, it would be wise to find out whether these candidates were also sites of polymorphism in humans. Intraspecific and short-term interspecific variation may not always run in parallel, as mentioned above, but it will be reassuring when they do so. One suspects that divergence between mouse and man may be too high for this kind of use, as the differences driven by selection would be submerged in junk variation.
My study of progressive evolutionary divergence in MHC gene promoters in the series Mus musculus - M. pahari (approximately 4 million years) - rat (approximately 10 million years) - mole rat (over 10 million years) provides, I hope, a rehearsal of this scenario. One can watch the divergence evident in modern laboratory mice, which is located preferentially around the transcription binding sites, becoming gradually swamped by random divergence - presumably junk - as one moves back in time. For humans, the trick will be to find an interspecies comparison where the divergence in cis-regulatory sequences is high enough to be informative, without being swamped by the junk. With luck, the comparison with chimp may provide much of what is needed. If not, then another one or two primate genomes may be required - and the additional information that they could provide would surely justify the effort. One might expect the human-chimp comparison to tell us more about checkpoints in the brain than in the cardiovascular system, for instance. Because the genome will not behave in the same way throughout, having a few comparisons to choose from should, in any case, be useful. In summary, one can envisage a royal road running from comparative genomics to the identification of key checkpoints, and then leading on to novel drug discovery.
The Leverhulme and Wellcome Trusts supported this work.
- Mitchison A: Partitioning of genetic variation between regulatory and coding gene segments: the predominance of software variation in genes encoding introvert proteins. Immunogenetics. 1997, 46: 46-52. 10.1007/s002510050241.View ArticleGoogle Scholar
- Mitchison NA, Schuhbauer D, Muller B: Natural and induced regulation of Th1/Th2 balance. Springer Semin Immunopathol. 1999, 21: 199-210. 10.1007/s002810050063.View ArticleGoogle Scholar
- Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalayanaraman N, et al: Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet. 1999, 22: 231-238. 10.1038/10290.View ArticleGoogle Scholar
- Becker KG, Simon RM, Bailey-Wilson JE, Freidlin B, Biddison WE, McFarland HF, Trent JM: Clustering of non-major histocompatibility complex susceptibility candidate loci in human autoimmune diseases. Proc Natl Acad Sci USA. 1998, 95: 9979-9984. 10.1073/pnas.95.17.9979.View ArticleGoogle Scholar
- Smith MW, Dean M, Carrington M, Winkler C, Huttley GA, Lomb DA, Goedert JJ, O'Brien TR, Jacobson LP, Kaslow R, et al: Contrasting genetic influence of CCR2 and CCR5 variants on HIV-1 infection and disease progression. Hemophilia Growth and Development Study (HGDS), Multicenter AIDS Cohort Study (MACS), Multicenter Hemophilia Cohort Study (MHCS), San Francisco City Cohort (SFCC), ALIVE Study. Science. 1997, 277: 959-965. 10.1126/science.277.5328.959.View ArticleGoogle Scholar
- Online Mendelian Inheritance in Man. [http://www.ncbi.nlm.nih.gov/Omim/]
- Carroll SB: Endless forms: the evolution of gene regulation and morphological diversity. Cell. 2000, 101: 577-580.View ArticleGoogle Scholar
- Ashley-Koch A, Yang Q, Olney RS: Sickle hemoglobin (HbS) allele and sickle cell disease: a HuGE review. Am J Epidemiol. 2000, 151: 839-845.View ArticleGoogle Scholar
- Ruwende C, Hill A: Glucose-6-phosphate dehydrogenase deficiency and malaria. J Mol Med. 1998, 76: 581-588. 10.1007/s001090050253.View ArticleGoogle Scholar
- Pier GB, Grout M, Zaidi T, Meluleni G, Mueschenborn SS, Banting G, Ratcliff R, Evans MJ, Colledge WH: Salmonella typhi uses CFTR to enter intestinal epithelial cells. Nature. 1998, 393: 79-82. 10.1038/30006.View ArticleGoogle Scholar
- Mitchison NA, Muller B, Segal RM: Natural variation in immune responsiveness, with special reference to immunodeficiency and promoter polymorphism in class II MHC genes. Hum Immunol. 2000, 61: 177-181. 10.1016/S0198-8859(99)00141-X.View ArticleGoogle Scholar
- Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A, Cooper R, Lipshutz R, Chakravarti A: Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet. 1999, 22: 239-247. 10.1038/10297.View ArticleGoogle Scholar
- Cowell LG, Kepler TB, Janitz M, Lauster R, Mitchison NA: The distribution of variation in regulatory gene segments, as present in MHC class II promoters. Genome Res. 1998, 8: 124-134.Google Scholar
- Perier RC, Junier T, Bonnard C, Bucher P: The Eukaryotic Promoter Database (EPD): recent developments. Nucleic Acids Res. 1999, 27: 307-309. 10.1093/nar/27.1.307.View ArticleGoogle Scholar
- Eukaryotic Promoter Database. [http://www.epd.isb-sib.ch]
- Duret L, Bucher P: Searching for regulatory elements in human noncoding sequences. Curr Opin Struct Biol. 1997, 7: 399-406. 10.1016/S0959-440X(97)80058-9.View ArticleGoogle Scholar
- Kornman KS, Crane A, Wang HY, di Giovine FS, Newman MG, Pirk FW, Wilson TG, Higginbottom FL, Duff GW: The interleukin-1 genotype as a severity factor in adult periodontal disease. J Clin Periodontol. 1997, 24: 72-77.View ArticleGoogle Scholar
- Moos V, Rudwaleit M, Herzog V, Hölig K, Sieper J, Muller B: Association of genotypes affecting the expression of interleukin-1β or interleukin-1 receptor antagonist with osteoarthritis. Arthritis Rheum. 2000, 43: 2417-2422. 10.1002/1529-0131(200011)43:11<2417::AID-ANR7>3.0.CO;2-R.View ArticleGoogle Scholar
- Denny P, Lord CJ, Hill NJ, Goy JV, Levy ER, Podolin PL, Peterson LB, Wicker LS, Todd JA, Lyons PA: Mapping of the IDDM locus Idd3 to a 0.35-cM interval containing the interleukin-2 gene. Diabetes. 1997, 46: 695-700.View ArticleGoogle Scholar
- Chouchane L, Sfar I, Bousaffara R, El Kamel A, Sfar MT, Ismail A: A repeat polymorphism in interleukin-4 gene is highly associated with specific clinical phenotypes of asthma. Int Arch Allergy Immunol. 1999, 120: 50-55. 10.1159/000024219.View ArticleGoogle Scholar
- Daser A, Koetz K, Bätjer N, Jung M, Rüschendorf F, Goltz M, Ellerbrok H, Renz H, Walter J, Paulsen M: Genetics of atopy in a mouse model: polymorphism of the IL-5 receptor a chain. Immunogenetics. 2000, 51: 632-638. 10.1007/s002510000206.View ArticleGoogle Scholar
- Fishman D, Faulds G, Jeffery R, Mohamed-Ali V, Yudkin JS, Humphries S, Woo P: The effect of novel polymorphisms in the interleukin-6 (IL-6) gene on IL-6 transcription and plasma IL-6 levels, and an association with systemic-onset juvenile chronic arthritis. J Clin Invest. 1998, 102: 1369-1376.View ArticleGoogle Scholar
- Eskdale J, Kube D, Tesch H, Gallagher G: Mapping of the human IL10 gene and further characterization of the 5' flanking sequence. Immunogenetics. 1997, 46: 120-128. 10.1007/s002510050250.View ArticleGoogle Scholar
- Heinzmann A, Mao X, Akaiwa M, Kreomer RT, Gao P, Ohshima K, Umeshita R, Abe Y, Braun S, Yamashita T, et al: Genetic variants of IL-13 signalling and human asthma and atopy. Hum Mol Genet. 2000, 9: 549-559. 10.1093/hmg/9.4.549.View ArticleGoogle Scholar
- Barnes KC, Freidhoff LR, Nickel R, Chiu YF, Juo SH, Hizawa N, Naidu RP, Ehrlich E, Duffy DL, et al: Dense mapping of chromosome 12q13.12-q23.3 and linkage to asthma and atopy. J Allergy Clin Immunol. 1999, 104: 485-491.View ArticleGoogle Scholar
- Hutchinson IV: The role of transforming growth factor-beta in transplant rejection. Transplant Proc. 1999, 31: 9S-13S. 10.1016/S0041-1345(99)00785-X.View ArticleGoogle Scholar
- Jiang Y, Hirose S, Sanokawa-Akakura R, Abe M, Mi X, Li N, Miura Y, Shirai J, Zhang D, Hamano Y, Shirai T: Genetically determined aberrant down-regulation of FcgammaRIIB1 in germinal center B cells associated with hyper-IgG and IgG autoantibodies in murine systemic lupus erythematosus. Int Immunol. 1999, 11: 1685-1691. 10.1093/intimm/11.10.1685.View ArticleGoogle Scholar
- McGinnis RE, Spielman RS: Linkage disequilibrium in the insulin gene region: size variation at the 5' flanking polymorphism and bimodality among "class I" alleles. Am J Hum Genet. 1994, 55: 526-532.Google Scholar
- Nickel RG, Casolaro V, Wahn U, Beyer K, Barnes KC, Plunkett BS, Freidhoff LR, Sengler C, Plitt JR, Schleimer RP, et al: Atopic dermatitis is associated with a functional mutation in the promoter of the C-C chemokine RANTES. J Immunol. 2000, 164: 1612-1616.View ArticleGoogle Scholar
- Blackwell JM, Searle S: Genetic regulation of macrophage activation: understanding the function of Nramp1 (=Ity/Lsh/Bcg). Immunol Lett. 1999, 65: 73-80. 10.1016/S0165-2478(98)00127-8.View ArticleGoogle Scholar
- Lee PL, Gelbart T, West C, Halloran C, Beutler E: The human Nramp2 gene: characterization of the gene structure, alternative splicing, promoter region and polymorphisms. Blood Cells Mol Dis. 1998, 24: 199-215. 10.1006/bcmd.1998.0186.View ArticleGoogle Scholar
- Villard E, Tiret L, Visvikis S, Rakotovao R, Cambien F, Soubrier F: Identification of new polymorphisms of the angiotensin I-converting enzyme (ACE) gene, and study of their relationship to plasma ACE levels by two-QTL segregation-linkage analysis. Am J Hum Genet. 1996, 58: 1268-1278.Google Scholar
- Challah M, Villard E, Philippe M, Ribadeau-Dumas A, Giraudeau B, Janiak P, Vilaine JP, Soubrier F, Michel JB: Angiotensin I-converting enzyme genotype influences arterial response to injury in normotensive rats. Arterioscler Thromb Vasc Biol. 1998, 18: 235-243.View ArticleGoogle Scholar
- Martin MP, Dean M, Smith MW, Winkler C, Gerrard B, Michael NL, Lee B, Doms RW, Margolick J, Buchbinder S, et al: Genetic acceleration of AIDS progression by a promoter variant of CCR5. Science. 1998, 282: 1907-1911. 10.1126/science.282.5395.1907.View ArticleGoogle Scholar
- Guler ML, Gorham JD, Dietrich WF, Murphy TL, Steen RG, Parvin CA, Fenoglio D, Grupe A, Peltz G, Murphy KM: Tpm1, a locus controlling IL-12 responsiveness, acts by a cell-autonomous mechanism. J Immunol. 1999, 162: 1339-1347.Google Scholar