- Open Access
The signature of long-standing balancing selection at the human defensin β-1 promoter
Genome Biology volume 9, Article number: R143 (2008)
Defensins, small endogenous peptides with antimicrobial activity, are pivotal components of the innate immune response. A large cluster of defensin genes is located on human chromosome 8p; among them the beta defensin 1 (DEFB1) promoterhas been extensively studied since discovery that specific polymorphisms and haplotypes associate with asthma and atopy, susceptibility to severe sepsis, as well as HIV and Candida infection predisposition.
Here, we characterize the sequence variation and haplotype structure of the DEFB1 promoter region in six human populations. In all of them, we observed high levels of nucleotide variation, an excess of intermediate-frequency alleles, reduced population differentiation and a genealogy with common haplotypes separated by deep branches. Indeed, a significant departure from the expectation of evolutionary neutrality was observed in all populations and the possibility that this is due to demographic history alone was ruled out. Also, we verified that the selection signature is restricted to the promoter region and not due to a linked balanced polymorphism. A phylogeny-based estimation indicated that the two major haplotype clades separated around 4.5 million years ago, approximately the time when the human and chimpanzee lineages split.
Altogether, these features represent strong molecular signatures of long-term balancing selection, a process that is thought to be extremely rare outside major histocompatibility complex genes. Our data indicate that the DEFB1 promoter region carries functional variants and support previous hypotheses whereby alleles predisposing to atopic disorders are widespread in modern societies because they conferred resistance to pathogens in ancient settings.
Defensins comprise a large family of small endogenous peptides with antimicrobial activity against a wide range of microorganisms [1, 2]. Although initially regarded as pivotal components of the innate immune system, recent evidence has indicated that defensins also play roles in the recruitment of adaptive immune cells  and in promoting antigen-specific immune responses .
In humans two defensin subfamilies have been described (α and β), the structural difference residing in the linear spacing and pairing of their six conserved cysteine residues. While α-defensins are expressed by neutrophils and intestinal Paneth cells, β-defensins are mainly produced by epithelia .
In mammals, defensins represent large multigene families and a major defensin cluster localizes to human chromosome 8p22-23, where several α- and β-defensin genes are located. Recent evidence  has indicated that β-defensin genes on chromosome 8p originated by successive rounds of duplication followed by a complex evolutionary history involving both negative and positive selection with variable pressures among mammalian lineages . Given the relevance of defensins in antimicrobial response and the conundrum whereby increased protein sequence diversity in the immune system enhances the spectrum of pathogen recognition, defensin coding exons have attracted much more interest in evolutionary studies compared to noncoding sequences. Yet, growing evidence suggests that 5' cis regulatory regions of genes such as CCR5 , HLA-G , HLA-DQA1  and HLA-DPA1/DPB1  have been subjected to balancing selection during recent primate history.
Among defensins, the human β-defensin 1 (DEFB1 [OMIM *602056]) promoter has been extensively studied since specific polymorphisms and haplotypes of it have been associated with asthma and atopy , susceptibility to severe sepsis , as well as HIV [14, 15] and Candida  infection predisposition. Moreover, recent evidence  has indicated that reduced expression of DEFB1 is found in a high percentage of renal and prostate cancers, therefore suggesting that DEFB1 acts as a tumor suppressor gene. These findings, together with the demonstrated functional significance of polymorphisms within DEFB1 5' regulatory sequence, indicate that this region might represent a target of natural selection.
Nucleotide diversity at the DEFB1promoter region
We sequenced the 1,400 bp region immediately upstream of the DEFB1 translation start site (Figure 1) in 83 individuals with different ethnic origins (Yoruba from Nigeria  (YRI), Asians (AS), South American Indians (SAI), Australian Aborigines (AUA)); additional data derived from full gene resequencing of 47 subjects (24 African Americans (AA) and 23 European Americans (EA)) were retrieved from the Innate Immunity PGA (IIPGA) web site . A total of 27 single nucleotide polymorphism (SNPs) were identified and haplotypes (Additional data file 1) were inferred using PHASE [20, 21]. The analyzed region encompasses all polymorphic variants previously shown to modulate DEFB1 expression levels. As a control for the AA and EA populations, data for 20 promoter regions were retrieved for 20 genes in the IIPGA. In particular, the 2 kb upstream of the translation initiation site of other innate immunity genes genotyped for AA and EA were retrieved only if the initial ATG was located in the first exon (as it is for DEFB1) and if it could be unequivocally identified. Also, promoter regions were discarded if located in recombination hotspots or in resequencing gaps. A total of 20 promoter regions finally constituted the control dataset. Data concerning the number of segregating sites and nucleotide diversity at the DEFB1 promoter region are summarized in Table 1 and indicate that both θW  and π  are definitely higher for DEFB1 compared to maximum values calculated for IIPGA gene promoters.
We excluded that the high degree of polymorphisms at the DEFB1 promoter is due to non-allelic gene conversion with other paralogous defensin genes on chromosome 8 by applying Sawyer's gene conversion algorithm .
Under neutral evolution, the amount of within-species diversity is predicted to correlate with levels of between-species divergence, since both depend on the neutral mutation rate . The HKA test  is commonly used to verify whether this expectation is verified. We performed both pairwise and maximum-likelihood (MLHKA)  tests with Rhesus macaque as an outgroup (instead of chimpanzee) so that greater divergence time results in more fixed differences and improves power to detect selection. For pairwise HKA tests we compared polymorphism and divergence level at the promoter region of DEFB1 with the 20 IIPGA genes; we consider these comparisons to be well-suited since lower sequence conservation and faster evolutionary rates are though to be a widespread feature of immune response genes [28, 29]. Since IIPGA data refer to AA and EA, only these populations were used in the comparison; pairwise HKA tests (Table 2) yielded significant results (p < 0.05) in 11 out of 20 cases (with 5 additional tests yielding a p < 0.10), suggesting increased diversity at the DEFB1 promoter compared to most loci. For further confirmation, we performed a MLHKA test by comparing the DEFB1 5' region to all 20 promoter regions: a significant result was obtained (k = 3.31, p = 0.0018).
Another expectation for neutrally evolving genes is that values of θW and π are roughly equal; this is the case for the maximum values of innate immunity gene promoters but not for DEFB1, which shows greater π than θW, a finding consistent with an excess of intermediate frequency variants as a result of balancing selection . The statistics Tajima's D  and Fu and Li's D* and F*  are commonly used to evaluate the difference between θW and π and, therefore, to test departure from neutrality. As shown in Table 1, significantly positive values for the DEFB1 promoter of one or more statistics were obtained for all analyzed populations.
It should be noted that population history, in addition to selective processes, is known  to affect frequency spectra and, therefore, all related statistics such as Tajima's D and Fu and Li's D* and F*. In particular, positive values of the statistics are expected under a scenario of population contraction, while negative values are consistent with an increase in population size [31, 33]. We performed all tests under the standard assumption of constant population size, which is unrealistic for human populations. Still, this approach is conservative when applied to African populations since they are thought to have undergone moderate but uninterrupted population expansion ; in the case of non-African populations the effects of demography are more difficult to disentangle from balancing selection signatures since bottlenecks possibly occurred following migration out of Africa . One possibility to circumvent this problem is to exploit the fact that selection acts on a single locus while demography affects the whole genome. As shown in Table 1, Tajima's D, as well as Fu and Li's F* and D*, displays far higher values in the case of DEFB1 compared to the maximum values of innate immunity gene promoters in EA. In order to obtain a more extensive comparison, by including YRI and subjects of Asiatic ancestry we retrieved information concerning 231 genes resequenced in AA, EA, AS and YRI from the NIEHS SNPs Program (NIEHS panel 2) . In particular, for each gene a 5 kb region was randomly selected; the only requirement was that it did not contain any long (>500 bp) resequencing gaps, and if the gene did not fulfill this requirement it was discarded, as were 5 kb regions displaying less than six SNPs. The number of analyzed regions for AA, YRI, EA and AS were 209, 203, 177 and 172, respectively. We calculated the percentile rank of DEFB1 values in the distributions of Tajima's D and Fu and Li's F* and D* for this set of loci. In analogy to the results obtained above, values for DEFB1 ranked above the 95th percentile in all populations (except for Tajima's D in YRI, which ranked 93rd). It is worth mentioning that, as already noticed by other authors , resequenced genes in SNP discovery programs probably represent a sample biased toward non-neutrally evolving loci (in the case of the NIEHS SNPs Program, genes are selected on the basis of their having a role in organism-environment interactions), making comparison with their distribution a conservative test.
A second possibility to disentangle the effect of demographic history from selection is to apply calibrated population genetics models. In particular, one such model that has been proposed recently, cosi , is based on the ability to generate realistic data rather than relying on inference about population histories. We performed coalescent simulations using the cosi package  and its best-fit population parameters for YRI, AA, EA and AS. Data are reported in Table 1 and indicate that for Tajima's D, as well as for Fu and Li's D* and F*, application of a calibrated model allows rejection of neutrality for the four populations at the DEFB1 promoter region.
Population genetic differentiation, quantified by FST , can also be used to detect the signature of balancing selection. In particular, lower FST values are expected at loci under balancing selection compared to neutrally evolving ones [39, 40]. FST among AA, EA and AS was 0.0057, much lower than the genome average of 0.123  and not significantly different from 0 (p = 0.25).
We next wished to verify that the evolution of the DEFB1 promoter is not influenced by the presence of a linked balanced polymorphism within, for example, the gene coding region. We exploited the availability of full resequencing data for the whole gene and calculated human-macaque divergence, nucleotide diversity, Tajima's D and FST in sliding windows for AA and EA. As shown in Figure 1, while inter-specific divergence is quite homogeneous along DEFB1, a peak in nucleotide diversity (expecially π) is observed at the promoter; consistently, in both AA and EA, the same region displays the maximum Tajima's D value and the minimum FST, with no other region showing evidence suggestive of balancing selection.
It should be noted that several defensin genes on 8p23.1, but not DEFB1, exhibit copy number variation (CNV) in humans ; a more recent  genome-wide analysis of CNVs indicated that the 5' gene region of DEFB1 might be encompassed by a CNV, although the authors indicate that, since the breakpoints are difficult to establish, involved loci might flank rather than be encompassed by the CNVs. The authors studied HapMap subjects and reported a frequency for the CNV ranging from 6% to 14% in different populations. Since our YRI samples comprise a subset of HapMap YRI subjects, we checked whether any of them were reported to display a CNV in this region: two subject were retrieved, accounting for one gain and one loss. Electropherograms of these two subjects (as well as all other subjects in this study) revealed no evidence of unbalanced peaks at heterozygous SNPs and their removal from the sample did not affect the results for YRI. Previous  work had studied CNVs in the defensin cluster on chromosome 8 using real-time PCR assays and found that 24 American subjects with different ethnic origin had 2 copies of DEFB1. Taking these observations together, we consider that either DEFB1 lies outside the CNV or, in any case, that CNVs encompassing DEFB1 are very rare and do not affect the results reported here.
One effect of balancing selection is to preserve two or more lineages over an extended period of time, resulting in clades separated by long branch lengths. To examine the genealogy of DEFB1 promoter haplotypes, we built a median-joining network. The topology of this network (Figure 2) is unambiguous with no reticulations, a pattern consistent with the low level of recombination observed in this gene region (not shown). Two major clades (haplogroups 1 and 2) separated by long branch lengths are evident, each containing one common haplotype. We next wished to estimate the time to the most recent common ancestor (TMRCA) of the two haplotype clades, applying a phylogeny-based method  based on the measure ρ, the average distance of descendant haplotypes from a specified root. By using root 1 (Figure 2), ρ was equal to 9.45 so that, with a mutation rate based on 21 fixed differences between chimpanzee and humans and a separation time of 5 million years ago, we estimated a TMRCA of 4,489,791 years (standard deviation ±1,018,128).
Comparison with other primates
In order to gain further insight into the evolutionary history of the DEFB1 promoter region, we resequenced those from three chimpanzees and one orangutan. These samples were obtained from the European Collection of Cell Cultures and the Pongo sequence was used in the median-joining network in order to root the phylogeny (Figure 2). A total of 5 polymorphic sites were identified in chimpanzees; one of them (-913 C/T in the human sequence) was shared with humans and, therefore, represents a trans-specific polymorphism. Trans-specific polymorphisms are an effect of long-term balancing selection, while they are highly unlikely under neutrality. Indeed, a neutral polymorphism is expected to persist for 4Ne generations (where Ne is the effective population size, estimated to be around 10,000 for humans)  and, therefore, the probability of observing a polymorphism shared between humans and chimpanzees, two species that diverged about 5 million years ago (around 20Ne generations), is extremely low [46, 47]. Although the identification of a human/chimpanzee trans-specific SNP is consistent with the estimated TMRCA of the haplotype clusters (suggesting that balancing selection was established around the same time when the human and Pan lineages split), the possibility exists that the shared SNP is due to a coincidental mutation that occurred after speciation. Indeed, the location of the substitution at a CpG site makes the possibility of a recurrent mutation more likely and, therefore, taking into account the lack of functional data on this SNP, it is difficult to discriminate between the two possibilities.
Haldane's hypothesis  as formulated in 1932 posits that infectious diseases have been a major threat to human populations and have, therefore, exerted strong selective pressures throughout human history. As a result, a number of human loci are thought to have evolved in response to such pressures. Up to now, most evolutionary studies have focused on adaptive immunity, yet the ancient innate immune system, with the production of antimicrobial peptides, provides a critical line of defense in vertebrates . Following Haldane's idea, it is conceivable, therefore, that innate immunity genes have undergone similar selective pressures as their adaptive counterparts. Indeed, in analogy to immunoglobulins  and major histocompatibility complex (MHC) molecules , the paradigm whereby gene duplication followed by rapid divergence has been a powerful adaptive strategy in immune response genes has been verified for defensin loci [6, 7, 51]. Recent studies  demonstrated that, after gene duplication in an ancestral mammalian genome, the mature peptide-coding exons of β-defensins have been subjected to positive selection, while sites within the pre-propeptide region have undergone negative selection in primate lineages.
The data we report add further complexity to the evolutionary history of defensin genes by showing that balancing selection has shaped variability at the promoter region of human DEFB1. Indeed, we have documented here that the DEFB1 promoter region displays elevated nucleotide diversity, excess of polymorphism to divergence levels and reduced population differentiation. In line with these findings, the analysis of DEFB1 haplotypes revealed the presence of two clades separated by long branches approximately dating back to the time when the human and chimpanzee lineages split. Altogether, these features represent strong molecular signatures of long-term balancing selection, a process that is thought to be extremely rare outside MHC genes .
β-Defensin 1, the first human β-defensin to be discovered, shows anti-bacterial activity against a wide range of Gram-negative bacteria (for example, Escherichia coli, Pseudomonas aeruginosa, and Klebsiella pneumoniae), as well as different Candida species [52–54]. β-defensin 1 is constitutively expressed by most epithelia with higher levels being detectable in kidney, pancreas, the urogenital and respiratory tracts [54–56]. Consistently, targeted disruption of the mouse β-defensin 1 gene resulted in animals deficient in the clearance of Haemophilus influenzae from the lung  or containing a greater number of bacteria (Staphylococci, in particular) in urine collected from the bladder . Also, DEFB1 expression has been demonstrated [59–61] in the human epidermis, gingival epithelium, oral mucosa and saliva, suggesting that it contributes to host defenses in areas exposed to a variety of microbial challenges. Moreover, recent evidences indicated that the protein product of DEFB1 is detectable in human milk  and the mammary epithelium ; in particular, pregnant women display higher levels of β-defensin 1 and concentrations comparable to those observed in milk were effective in killing E. coli , suggesting that this antimicrobial peptide might have a fundamental role in protecting breast-fed infants from infectious diarrhea and mothers from lactational mastitis [62, 63].
The promoter region of DEFB1 has recently been subjected to extensive study; in particular, three SNPs have been reported to affect gene expression [17, 64], although contrasting results on transcriptional activity have been obtained by different research groups, possibly reflecting either non-trivial interactions among polymorphic alleles at multiple positions or cell-type specific SNP effects . In SNP typing studies, the -20A/-44C/-52G haplotype has been independently associated with protection against severe sepsis , susceptibility to asthma and atopy  and, in cystic fibrosis patients, with chronic P. aeruginosa lung infection . Also, the -44C allele was shown to predispose to HIV [14, 15] and Candida  infection, while an association with HIV infection in Brazilian children was also reported for SNPs -20G and -52A . Although the biological bases for these associations are presently unknown, their description allows interesting speculations concerning the selective pressures possibly shaping nucleotide diversity at the DEFB1 promoter region. Sepsis is a leading cause of death in infants and children throughout the world ; its incidence and fatal outcome were conceivably higher before the advent of modern sanitation and, therefore, it might have represented a powerful selective force during human history. Indeed, signatures of natural selection have been reported at another human locus, namely CASP12 , as a possible adaptive response to sepsis. Variants in the DEFB1 promoter that protect against sepsis might, therefore, have conferred a selective advantage to carriers, although one or more of these same SNP alleles have been associated with predisposition to candidiasis , as well as to susceptibility to HIV and P. aeruginosa infection (at least in cystic fibrosis patients) [14, 15, 66]. In this respect, it is interesting to notice that early hunter-gatherer societies, due to their small population sizes, were likely to support a parasite fauna constituted of pathogens with high transmission rates and inducing little or no immunity . In such a scenario, the role of innate response might have been extremely relevant to ensure protection from infectious agents. The increase in population size that occurred at some time during human history is thought to have allowed maintenance of a different and wider range of pathogen species, including major infectious agents responsible for sepsis. Variable environmental conditions are regarded as a possible explanation underlying the maintenance of balanced polymorphisms ; in a simplistic situation whereby a variant (or haplotype) protects against sepsis while predisposing to other infectious agents, changes in pathogen prevalence, with particular reference to microbes leading to fatal sepsis, might modulate the fitness of subjects carrying either allele.
Unfortunately, little information is available concerning the early epidemiological history of our predecessors; indeed, the timing of human population expansion has been matter of debate [71–73] and some uncertainty concerns the time of origin of major human pathogens, for example, tuberculosis [74, 75]. Further studies concerning these issues, as well as better understanding of the role of DEFB1 polymorphisms, will therefore be required before a direct link can be established between pathogen-driven selective pressure and the maintenance of DEFB1 variants.
An additional, non-mutually exclusive possibility to explain the action of balancing selection at the DEFB1 promoter implies heterozygote advantage. This phenomenon is deemed responsible for maintenance of polymorphisms at MHC class II promoters [10, 76] and is thought to enhance immune response flexibility by modulating allele-specific gene expression in different cell-types  and in response to diverse stimuli/cytokines . DEFB1 is considered a constitutive defensin, in that, unlike β-defensin 2, it shows limited inducibility by inflammatory stimuli (reviewed in ); however, previous reports have indicated that DEFB1 shows marked inter-individual variability in expression levels in urine, saliva, gingival epithelium and epidermis [56, 59–61]. Similarly, the ability of lipopolysaccharide to induce DEFB1 expression varied among the blood samples obtained from 51 healthy individuals . These data, together with the functional data indicating allele-dependent promoter activity in different cell types [64, 65], suggest that DEFB1 variants might exert different effects in diverse tissues, possibly accounting both for inter-individual variation of expression levels and for maintenance of divergent clades.
It might also be worth mentioning that evidence, albeit preliminary, indicates that DEFB1 expression is up-regulated during pregnancy [56, 62], suggesting hormone-regulated gene expression. No data have ever been reported concerning the response of different DEFB1 promoter haplotypes to hormone treatment; were any difference identified, the adaptive significance of variants increasing expression in human milk, for example, would be evident.
Finally, it might be interesting to note that, given its high expression in urogenital tissues, DEFB1 has been regarded as a possible innate defense against sexually transmitted pathogens . In line with this view, induction of an antiviral response in cultured uterine epithelial cells resulted in a six-fold increase in DEFB1 expression . Since sexually transmitted diseases are thought to have affected early hominid societies, due to their sustainability in low-density host population , these observation might help to explain the ancient origin of DEFB1 haplotype clades.
As discussed in the introduction, two recent reports indicated that balancing selection has shaped variability at the promoter region of other loci involved in immune response. In the case of CCR5, available evidence indicates that heterozygosity at this gene region delays HIV-1 disease progression . However, as the authors note, the introduction of HIV-1 in human populations is relatively recent and cannot, therefore, account for the maintenance of balanced polymorphisms in the region; therefore, CCR5 possibly evolved to respond to older pathogens, providing a clue to the difficult task of inferring the origin of selective pressures exerted by human pathogens over long evolutionary times.
Whatever the reason for the maintenance of a balanced variant, it is interesting to note that variation at DEFB1 might fit a previously proposed hypothesis  whereby alleles that conferred resistance to pathogens in ancient settings are now associated with susceptibility to atopic disorders; DEFB1 haplotypes associated with protection against sepsis seem to predispose to asthma and atopy. A similar link between past selection and present disease predisposition has been suggested  in the case of polymorphic variants in the IL4RA gene and might help to explain the high prevalence of atopic conditions in modern societies.
Association studies of DEFB1 variants have focused on a small number of SNPs to be genotyped; it is possible, therefore, that additional variants in this gene region play a role in the above described (or still unknown) conditions. In this regard, it is worth mentioning that the availability of full gene resequencing data allowed us to define a specific DEFB1 gene region as the target of balancing selection and, therefore, as the location of functional variants. This information might be valuable in future association studies, suggesting that DEFB1 promoter SNPs, rather than linked variants, associate with specific phenotypes.
This report represents an example of how population genetics approaches may benefit from association studies by gaining cues about possible selective pressures acting on target gene regions; we hope it also illustrates the possible contribution of evolutionary models to classic SNP-disease association approaches by providing information about the localization of candidate functional variants.
Materials and methods
DNA samples and sequencing
Human genomic DNA was obtained from the European Collection of Cell Cultures (Ethnic Diversity DNA Panel plus additional samples for Australian Aborigine derived from HLA defined panels). From the same source we obtained the genomic DNA of three chimpanzees (Pan troglodytes) and one orangutan (Pongo pygmaeus). Additional DNA samples from South American Indians and Yoruba individuals were derived from the Coriell Institute for Medical Research.
The 1.4 kb region covering the promoter region of DEFB1 was PCR amplified (primer sequences are reported in Table 3). PCR products were treated with ExoSAP-IT (USB Corporation, Cleveland, OH, USA), directly sequenced on both strands with a Big Dye Terminator sequencing Kit (v3.1 Applied Biosystems, Monza, Italy) and run on an Applied Biosystems ABI 3130 XL Genetic Analyzer. All sequences were assembled using AutoAssembler version 1.4.0 (Applied Biosystems), inspected manually by two distinct operators, and singletons were re-amplified and resequenced.
Data retrieval and haplotype construction
DEFB1 genotype data for American subjects of either African or European descent were retrieved from the IIPGA website . From the same source, we derived resequencing data referring to promoter regions (2 kb upstream of the translation initiation site) of other innate immunity genes genotyped for AA and EA. Promoter regions were not selected if the initial ATG was not located in the first exon (as it is for DEFB1) or if it could not be unequivocally identified due to the presence of multiple 5' isoforms, which were identified through manual inspection of UCSC annotation tracks . Also, promoter regions were discarded if located in recombination hotspots (these were manually identified through the UCSC genome annotation tables snpRecombHotspotHapmap and snpRecombHotspotPerlegen ) or in resequencing gaps. A total of 20 promoter regions finally constituted the control dataset.
Genotype data for 231 resequenced human genes were derived from the NIEHS SNPs Program web site . In particular, we selected genes that had been resequenced in populations of defined ethnicity, including Asians (NIEHS panel 2).
Haplotypes were inferred using PHASE version 2.1 [20, 21], a program for reconstructing haplotypes from unrelated genotype data through a Bayesian statistical method. Haplotypes for AS, AUA, SAI and YRI individuals are available as supporting information (Additional data file 1).
Tajima's D , Fu and Li's D* and F*  statistics, as well as diversity parameters θW  and π  were calculated using libsequence , a C++ class library providing an object-oriented framework for the analysis of molecular population genetic data. Departure from neutrality was tested from coalescent simulations computed with ms software  fixing the mutation parameter, assuming no intra-locus recombination and a constant population size with 100,000 iterations. Calibrated coalescent simulations were performed using the cosi package  and its best-fit parameters for YRI, AA, EA and AS populations with 10,000 iterations. The FST statistic  estimates genetic differentiation among populations and was calculated as proposed by Hudson et al. . Significance was assessed by permuting 10,000 times the haplotype distribution among populations .
Pairwise HKA tests were performed using libsequence. The maximum-likelihood-ratio HKA test was performed using the MLHKA software  with multilocus data of 20 selected IIPGA promoter regions and Rhesus macaque (NCBI rheMac2) as an outgroup. In particular, we evaluated the likelihood of the model under two different assumptions: that all loci evolved neutrally and that only the DEFB1 promoter region was subjected to natural selection; statistical significance was assessed by a likelihood ratio test. We used a chain length (the number of cycles of the Markov chain) of 500,000 and, as suggested by the authors, we ran the program several times with different seeds to ensure stability of results.
In order to test for gene conversion events, we applied Sawyer's gene conversion algorithm  implemented in the GENECONV program. GENECONV assesses significance using two methods: permutations and an approximate p-value [88, 89]. We performed several tests by varying the mismatch penalty from 0 to larger positive values and using 10,000 permutations. For all these runs and both methods, no pairwise or global p-value involving DEFB1 was significant, suggesting no inner or outer fragments showing past gene conversion.
The median-joining network to infer haplotype genealogy was constructed using NETWORK 4.2 . The time to the most common ancestor (TMRCA) was estimated using a phylogeny based approach implemented in NETWORK 4.2 using a mutation rate based on 21 fixed differences between chimpanzee and humans in the 1.4 kb DEFB1 region.
All calculations were performed in the R environment .
Additional data files
The following additional data are available. Additional data file 1 is a spreadsheet reporting the DEFB1 promoter haplotypes for the following subjects: 22 YRI, 25 AS, 24 SAI and 12 AUA. SNP positions refer to the NCBI Build 36.1 assembly.
copy number variation
Innate Immunity PGA
major histocompatibility complex
South American Indian
single nucleotide polymorphism
time to the most recent common ancestor
Boman HG: Gene-encoded peptide antibiotics and the concept of innate immunity: an update review. Scand J Immunol. 1998, 48: 15-25. 10.1046/j.1365-3083.1998.00343.x.
Lehrer RI, Ganz T: Defensins of vertebrate animals. Curr Opin Immunol. 2002, 14: 96-102. 10.1016/S0952-7915(01)00303-X.
Yang D, Biragyn A, Kwak LW, Oppenheim JJ: Mammalian defensins in immunity: more than just microbicidal. Trends Immunol. 2002, 23: 291-296. 10.1016/S1471-4906(02)02246-9.
Yang D, Biragyn A, Hoover DM, Lubkowski J, Oppenheim JJ: Multiple roles of antimicrobial defensins, cathelicidins, and eosinophil-derived neurotoxin in host defense. Annu Rev Immunol. 2004, 22: 181-215. 10.1146/annurev.immunol.22.012703.104603.
Selsted ME, Ouellette AJ: Mammalian defensins in the antimicrobial immune response. Nat Immunol. 2005, 6: 551-557. 10.1038/ni1206.
Semple CA, Rolfe M, Dorin JR: Duplication and selection in the evolution of primate beta-defensin genes. Genome Biol. 2003, 4: R31-10.1186/gb-2003-4-5-r31.
Semple CA, Maxwell A, Gautier P, Kilanowski FM, Eastwood H, Barran PE, Dorin JR: The complexity of selection at the major primate beta-defensin locus. BMC Evol Biol. 2005, 5: 32-10.1186/1471-2148-5-32.
Bamshad MJ, Mummidi S, Gonzalez E, Ahuja SS, Dunn DM, Watkins WS, Wooding S, Stone AC, Jorde LB, Weiss RB, Ahuja SK: A strong signature of balancing selection in the 5' cis-regulatory region of CCR5. Proc Natl Acad Sci USA. 2002, 99: 10539-10544. 10.1073/pnas.162046399.
Tan Z, Shon AM, Ober C: Evidence of balancing selection at the HLA-G promoter region. Hum Mol Genet. 2005, 14: 3619-3628. 10.1093/hmg/ddi389.
Loisel DA, Rockman MV, Wray GA, Altmann J, Alberts SC: Ancient polymorphism and functional variation in the primate MHC-DQA1 5' cis-regulatory region. Proc Natl Acad Sci USA. 2006, 103: 16331-16336. 10.1073/pnas.0607662103.
Liu X, Fu Y, Liu Z, Lin B, Xie Y, Liu Y, Xu Y, Lin J, Fan X, Dong M, Zeng K, Wu CI, Xu A: An ancient balanced polymorphism in a regulatory region of human major histocompatibility complex is retained in Chinese minorities but lost worldwide. Am J Hum Genet. 2006, 78: 393-400. 10.1086/500593.
Leung TF, Li CY, Liu EK, Tang NL, Chan IH, Yung E, Wong GW, Lam CW: Asthma and atopy are associated with DEFB1 polymorphisms in Chinese children. Genes Immun. 2006, 7: 59-64. 10.1038/sj.gene.6364279.
Chen QX, Lv C, Huang LX, Cheng BL, Xie GH, Wu SJ, Fang XM: Genomic variations within DEFB1 are associated with the susceptibility to and the fatal outcome of severe sepsis in Chinese Han population. Genes Immun. 2007, 8: 439-443. 10.1038/sj.gene.6364401.
Braida L, Boniotto M, Pontillo A, Tovo PA, Amoroso A, Crovella S: A single-nucleotide polymorphism in the human beta-defensin 1 gene is associated with HIV-1 infection in Italian children. AIDS. 2004, 18: 1598-1600. 10.1097/01.aids.0000131363.82951.fb.
Milanese M, Segat L, Pontillo A, Arraes LC, de Lima Filho JL, Crovella S: DEFB1 gene polymorphisms and increased risk of HIV-1 infection in Brazilian children. AIDS. 2006, 20: 1673-1675. 10.1097/01.aids.0000238417.05819.40.
Jurevic RJ, Bai M, Chadwick RB, White TC, Dale BA: Single-nucleotide polymorphisms (SNPs) in human beta-defensin 1: high-throughput SNP assays and association with Candida carriage in type I diabetics and nondiabetic controls. J Clin Microbiol. 2003, 41: 90-96. 10.1128/JCM.41.1.90-96.2003.
Sun CQ, Arnold R, Fernandez-Golarz C, Parrish AB, Almekinder T, He J, Ho SM, Svoboda P, Pohl J, Marshall FF, Petros JA: Human beta-defensin-1, a potential chromosome 8p tumor suppressor: control of transcription and induction of apoptosis in renal cell carcinoma. Cancer Res. 2006, 66: 8542-8549. 10.1158/0008-5472.CAN-06-0294.
The International HapMap Consortium: The International HapMap Project. Nature. 2003, 426: 789-796. 10.1038/nature02168.
Innate Immunity in Heart, Lung and Blood Disease: Programs for Genomic Applications. [http://innateimmunity.net]
Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001, 68: 978-989. 10.1086/319501.
Stephens M, Scheet P: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet. 2005, 76: 449-462. 10.1086/428594.
Watterson GA: On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975, 7: 256-276. 10.1016/0040-5809(75)90020-9.
Nei M, Li WH: Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA. 1979, 76: 5269-5273. 10.1073/pnas.76.10.5269.
Sawyer S: Statistical tests for detecting gene conversion. Mol Biol Evol. 1989, 6: 526-538.
Kimura M: The Neutral Theory of Molecular Evolution. 1983, Cambridge: Cambridge University Press
Hudson RR, Kreitman M, Aguadé M: A test of neutral molecular evolution based on nucleotide data. Genetics. 1987, 116: 153-159.
Wright SI, Charlesworth B: The HKA test revisited: a maximum-likelihood-ratio test of the standard neutral model. Genetics. 2004, 168: 1071-1076. 10.1534/genetics.104.026500.
Castillo-Davis CI, Kondrashov FA, Hartl DL, Kulathinal RJ: The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 2004, 14: 802-811. 10.1101/gr.2195604.
Sironi M, Menozzi G, Comi GP, Cagliani R, Bresolin N, Pozzoli U: Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences. Hum Mol Genet. 2005, 14: 2533-2546. 10.1093/hmg/ddi257.
Hudson RR, Kaplan NL: The coalescent process in models with selection and recombination. Genetics. 1988, 120: 831-840.
Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.
Fu YX, Li WH: Statistical tests of neutrality of mutations. Genetics. 1993, 133: 693-709.
Wooding S, Rogers A: The matrix coalescent and an application to human single-nucleotide polymorphisms. Genetics. 2002, 161: 1641-1650.
Marth GT, Czabarka E, Murvai J, Sherry ST: The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics. 2004, 166: 351-372. 10.1534/genetics.166.1.351.
National Institute of Environmental Health Sciences. [http://egp.gs.washington.edu]
Thompson EE, Kuttab-Boulos H, Witonsky D, Yang L, Roe BA, Di Rienzo A: CYP3A variation and the evolution of salt-sensitivity variants. Am J Hum Genet. 2004, 75: 1059-1069. 10.1086/426406.
Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D: Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005, 15: 1576-1583. 10.1101/gr.3709305.
Wright S: Genetical structure of populations. Nature. 1950, 166: 247-249. 10.1038/166247a0.
Bowcock AM, Kidd JR, Mountain JL, Hebert JM, Carotenuto L, Kidd KK, Cavalli-Sforza LL: Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. Proc Natl Acad Sci USA. 1991, 88: 839-843. 10.1073/pnas.88.3.839.
Akey JM, Zhang G, Zhang K, Jin L, Shriver MD: Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2002, 12: 1805-1814. 10.1101/gr.631202.
Hollox EJ, Armour JA, Barber JC: Extensive normal copy number variation of a beta-defensin antimicrobial-gene cluster. Am J Hum Genet. 2003, 73: 591-600. 10.1086/378157.
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, et al: Global variation in copy number in the human genome. Nature. 2006, 444: 444-454. 10.1038/nature05329.
Linzmeier RM, Ganz T: Copy number polymorphisms are not a common feature of innate immune genes. Genomics. 2006, 88: 122-126. 10.1016/j.ygeno.2006.03.005.
Bandelt HJ, Forster P, Röhl A: Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol. 1999, 16: 37-48.
Takahata N: Allelic genealogy and human evolution. Mol Biol Evol. 1993, 10: 2-22.
Clark AG: Neutral behavior of shared polymorphism. Proc Natl Acad Sci USA. 1997, 94: 7730-7734. 10.1073/pnas.94.15.7730.
Asthana S, Schmidt S, Sunyaev S: A limited role for balancing selection. Trends Genet. 2005, 21: 30-32. 10.1016/j.tig.2004.11.001.
Haldane JBS: The Causes of Evolution. 1932, New York: Harper & Row
Ota T, Sitnikova T, Nei M: Evolution of vertebrate immunoglobulin variable gene segments. Curr Top Microbiol Immunol. 2000, 248: 221-245.
Hughes AL, Yeager M: Natural selection at major histocompatibility complex loci of vertebrates. Annu Rev Genet. 1998, 32: 415-435. 10.1146/annurev.genet.32.1.415.
Hughes AL, Yeager M: Coordinated amino acid changes in the evolution of mammalian defensins. J Mol Evol. 1997, 44: 675-682. 10.1007/PL00006191.
Goldman MJ, Anderson GM, Stolzenberg ED, Kari UP, Zasloff M, Wilson JM: Human beta-defensin-1 is a salt-sensitive antibiotic in lung that is inactivated in cystic fibrosis. Cell. 1997, 88: 553-560. 10.1016/S0092-8674(00)81895-4.
Fang XM, Shu Q, Chen QX, Book M, Sahl HG, Hoeft A, Stuber F: Differential expression of alpha- and beta-defensins in human peripheral blood. Eur J Clin Invest. 2003, 33: 82-87. 10.1046/j.1365-2362.2003.01076.x.
Singh PK, Jia HP, Wiles K, Hesselberth J, Liu L, Conway BA, Greenberg EP, Valore EV, Welsh MJ, Ganz T, Tack BF, McCray PB: Production of beta-defensins by human airway epithelia. Proc Natl Acad Sci USA. 1998, 95: 14961-14966. 10.1073/pnas.95.25.14961.
Zhao C, Wang I, Lehrer RI: Widespread expression of beta-defensin hBD-1 in human secretory glands and epithelial cells. FEBS Lett. 1996, 396: 319-322. 10.1016/0014-5793(96)01123-4.
Valore EV, Park CH, Quayle AJ, Wiles KR, McCray PB, Ganz T: Human beta-defensin-1: an antimicrobial peptide of urogenital tissues. J Clin Invest. 1998, 101: 1633-1642. 10.1172/JCI1861.
Moser C, Weiner DJ, Lysenko E, Bals R, Weiser JN, Wilson JM: beta-Defensin 1 contributes to pulmonary innate immunity in mice. Infect Immun. 2002, 70: 3068-3072. 10.1128/IAI.70.6.3068-3072.2002.
Morrison G, Kilanowski F, Davidson D, Dorin J: Characterization of the mouse beta defensin 1, Defb1, mutant mouse model. Infect Immun. 2002, 70: 3053-3060. 10.1128/IAI.70.6.3053-3060.2002.
Krisanaprakornkit S, Weinberg A, Perez CN, Dale BA: Expression of the peptide antibiotic human beta-defensin 1 in cultured gingival epithelial cells and gingival tissue. Infect Immun. 1998, 66: 4222-4228.
Ali RS, Falconer A, Ikram M, Bissett CE, Cerio R, Quinn AG: Expression of the peptide antibiotics human beta defensin-1 and human beta defensin-2 in normal human skin. J Invest Dermatol. 2001, 117: 106-111. 10.1046/j.0022-202x.2001.01401.x.
Mathews M, Jia HP, Guthmiller JM, Losh G, Graham S, Johnson GK, Tack BF, McCray PB: Production of beta-defensin antimicrobial peptides by the oral mucosa and salivary glands. Infect Immun. 1999, 67: 2740-2745.
Jia HP, Starner T, Ackermann M, Kirby P, Tack BF, McCray PB: Abundant human beta-defensin-1 expression in milk and mammary gland epithelium. J Pediatr. 2001, 138: 109-112. 10.1067/mpd.2001.109375.
Tunzi CR, Harper PA, Bar-Oz B, Valore EV, Semple JL, Watson-MacDonell J, Ganz T, Ito S: Beta-defensin expression in human mammary gland epithelia. Pediatr Res. 2000, 48: 30-35. 10.1203/00006450-200007000-00008.
Milanese M, Segat L, Crovella S: Transcriptional effect of DEFB1 gene 5' untranslated region polymorphisms. Cancer Res. 2007, 67: 5997-10.1158/0008-5472.CAN-06-3544.
Petros J: Transcriptional effect of DEFB1 gene 5' untranslated region polymorphisms. Cancer Res. 2007, 67: 5997-10.1158/0008-5472.CAN-07-0204.
Tesse R, Cardinale F, Santostasi T, Polizzi A, Manca A, Mappa L, Iacoviello G, De Robertis F, Logrillo VP, Armenio L: Association of beta-defensin-1 gene polymorphisms with Pseudomonas aeruginosa airway colonization in cystic fibrosis. Genes Immun. 2008, 9: 57-60. 10.1038/sj.gene.6364440.
Watson RS, Carcillo JA: Scope and epidemiology of pediatric sepsis. Pediatr Crit Care Med. 2005, 6 (3 Suppl): S3-S5. 10.1097/01.PCC.0000161289.22464.C3.
Xue Y, Daly A, Yngvadottir B, Liu M, Coop G, Kim Y, Sabeti P, Chen Y, Stalker J, Huckle E, Burton J, Leonard S, Rogers J, Tyler-Smith C: Spread of an inactive form of caspase-12 in humans is due to recent positive selection. Am J Hum Genet. 2006, 78: 659-670. 10.1086/503116.
Dobson A: People and disease. The Cambridge Encyclopedia of Human Evolution. Edited by: Jones S, Martin R, Pilbeam D. 1992, Cambridge: Cambridge University Press, 411-420.
Charlesworth D: Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet. 2006, 2: e64-10.1371/journal.pgen.0020064.
Wall JD, Przeworski M: When did the human population size start increasing?. Genetics. 2000, 155: 1865-1874.
Rogers AR, Harpending H: Population growth makes waves in the distribution of pairwise genetic differences. Mol Biol Evol. 1992, 9: 552-569.
Reich DE, Goldstein DB: Genetic evidence for a Paleolithic human population expansion in Africa. Proc Natl Acad Sci USA. 1998, 95: 8119-8123. 10.1073/pnas.95.14.8119.
Gutierrez MC, Brisse S, Brosch R, Fabre M, Omais B, Marmiesse M, Supply P, Vincent V: Ancient origin and gene mosaicism of the progenitor of Mycobacterium tuberculosis. PLoS Pathog. 2005, 1: e5-10.1371/journal.ppat.0010005.
Smith NH: A re-evaluation of M. prototuberculosis. PLoS Pathog. 2006, 2: e98-10.1371/journal.ppat.0020098.
Cowell LG, Kepler TB, Janitz M, Lauster R, Mitchison NA: The distribution of variation in regulatory gene segments, as present in MHC class II promoters. Genome Res. 1998, 8: 124-134.
Beaty JS, Sukiennicki TL, Nepom GT: Allelic variation in transcription modulates MHC class II expression and function. Microbes Infect. 1999, 1: 919-927. 10.1016/S1286-4579(99)00225-7.
Beaty JS, West KA, Nepom GT: Functional effects of a natural polymorphism in the transcriptional regulatory sequence of HLA-DQB1. Mol Cell Biol. 1995, 15: 4771-4782.
Schaefer TM, Fahey JV, Wright JA, Wira CR: Innate immunity in the human female reproductive tract: antiviral response of uterine epithelial cells to the TLR3 agonist poly(I:C). J Immunol. 2005, 174: 992-1002.
Gonzalez E, Bamshad M, Sato N, Mummidi S, Dhanda R, Catano G, Cabrera S, McBride M, Cao XH, Merrill G, O'Connell P, Bowden DW, Freedman BI, Anderson SA, Walter EA, Evans JS, Stephan KT, Clark RA, Tyagi S, Ahuja SS, Dolan MJ, Ahuja SK: Race-specific HIV-1 disease-modifying effects associated with CCR5 haplotypes. Proc Natl Acad Sci USA. 1999, 96: 12004-12009. 10.1073/pnas.96.21.12004.
Barnes KC, Grant AV, Gao P: A review of the genetic epidemiology of resistance to parasitic disease and atopic asthma: common variants for common phenotypes?. Curr Opin Allergy Clin Immunol. 2005, 5: 379-385.
Wu X, Di Rienzo A, Ober C: A population genetics study of single nucleotide polymorphisms in the interleukin 4 receptor alpha (IL4RA) gene. Genes Immun. 2001, 2: 128-134. 10.1038/sj.gene.6363746.
UCSC Genome Browser. [http://genome.ucsc.edu]
Thornton K: Libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics. 2003, 19: 2325-2327. 10.1093/bioinformatics/btg316.
Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002, 18: 337-338. 10.1093/bioinformatics/18.2.337.
Hudson RR, Slatkin M, Maddison WP: Estimation of levels of gene flow from DNA sequence data. Genetics. 1992, 132: 583-589.
Hudson RR, Boos DD, Kaplan NL: A statistical test for detecting geographic subdivision. Mol Biol Evol. 1992, 9: 138-151.
Karlin S, Altschul SF: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA. 1990, 87: 2264-2268. 10.1073/pnas.87.6.2264.
Karlin S, Altschul SF: Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci USA. 1993, 90: 5873-5877. 10.1073/pnas.90.12.5873.
The R Project for Statistical Computing. [http://www.r-project.org]
We are grateful to Roberto Giorda for helpful discussions about the manuscript.
RC and SR performed all resequencing experiments and analyzed the data. MF and GM retrieved genotype data and performed population genetics analyses. MS, MF, RC, GPC and UP analyzed and interpreted the data. NB participated in the study coordination. MS and MF wrote the paper. MS conceived and coordinated the study.
Electronic supplementary material
About this article
Cite this article
Cagliani, R., Fumagalli, M., Riva, S. et al. The signature of long-standing balancing selection at the human defensin β-1 promoter. Genome Biol 9, R143 (2008) doi:10.1186/gb-2008-9-9-r143
- Copy Number Variation
- Additional Data File
- African American
- European American
- Coalescent Simulation