Genetic architecture of body size in mammals
© BioMed Central Ltd 2012
Published: 30 April 2012
Skip to main content
© BioMed Central Ltd 2012
Published: 30 April 2012
Much of the heritability for human stature is caused by mutations of small-to-medium effect. This is because detrimental pleiotropy restricts large-effect mutations to very low frequencies.
Body size, as measured by height in humans or weight in domestic species, is an archetypical quantitative or complex trait that shows continuous variation. It has been extensively recorded and studied for over a century because of its importance to ecology, its relevance in farming, and because it is an important indicator of human growth and health . The genetic architecture underlying body size was initially uncertain and Fisher proposed an infinitesimal model that was successfully applied for many years . This model, with an infinite number of loci, each with infinitesimal effect, is not literally true but it does provide a good fit to the data. In more recent times the infinitesimal model has gradually been replaced by a finite number of loci, each with discrete mutations. However, observations now form almost two disjointed sets: one set in which individual mutations have large effects (that is, so-called Mendelian traits) and another set where variants have small effects. This review attempts to bridge the gap between these two sets of observations using body size as an example of an extensively studied complex trait in mammalian species.
The genetic architecture underlying variation in complex traits is currently a topic of extensive debate. This is particularly true for human complex diseases but also for agriculture because of its impact in predicting future phenotypes (for example, [3–6]). Primarily it is the number, size and frequency of mutations that are under the most scrutiny. Taking the human disease example, some argue for a common disease common variant hypothesis where genetic susceptibility to disease is the result of many relatively high-frequency mutations each with small effect on disease susceptibility. However, others argue for a rare variant common disease hypothesis where many low-frequency mutations have large effects. As we shall see, observations on the genetic architecture underlying body size for humans and other mammals provide evidence for both hypotheses. Our discussion begins by describing the number, frequency and size of mutations with large effects for humans, mice and domesticated species. We then move onto genome-wide association studies (GWASs) that have investigated segregating variation in these species. We find evidence for moderate-to-large effect mutations in domestic species but highlight that this category of mutations goes undetected in human studies. Finally, we apply simple evolutionary theory to explain the observed distribution of mutation effects for human stature. Our model implies that most of the segregating variation in human height is caused by mutations with small-to-moderate effects.
Identification of causative mutations for so-called Mendelian traits has been possible by studying the segregation within families of mutations and phenotypes. Such mutations must have large effects so that individuals can be classified into genotype classes using their phenotype despite the background variation caused by other genes and environmental effects. Abnormal stature, for example, is generally diagnosed by clinicians when individuals are greater than ± 2 standard deviation (SD) units from the population average. A recent survey of Mendelian traits causing aberrant stature and other obvious skeletal abnormalities in humans revealed the involvement of at least 241 genes .
Genetic properties and complexities of ten conditions reported in humans with short or tall stature phenotypes (cases represent a cross-section of rare and extremely rare disorders)
Marfan syndrome (OMIM 154700)
FBN1 (fibrillin 1)
Autosomal dominant; 25% de novo mutations; prevalence of 2 to 3 in 10,000 live births; most alleles exhibit haploinsufficiency (where the product from a single functional copy of the gene is insufficient for normal function)
FBN1 is a very large (>600 kb) and highly fragmented (65 exons) gene; phenotypic heterogeneity and a spectrum of Marfan-like disorders suggest involvement of other genes; mutations in the functionally related transforming growth factor-β receptor, type II gene (TGFBR2) is also known to cause Marfan syndrome; symptoms include disproportionate overgrowth of limbs, and ocular and cardiovascular abnormalities
Sotos syndrome (OMIM 17550)
NSD1 (nuclear receptor binding SET domain protein 1)
Autosomal dominant; 95% de novo mutations; 1 in 14,000 live births
NSD1 mutations in 80% to 90% of cases; symptoms include characteristic facial features, overgrowth and mild-to-severe learning disabilities with possible cardiac, dental and renal abnormalities; increased tumor risk
Beals syndrome (OMIM 121050)
FBN2 (fibrillin 2)
Autosomal dominant; rare; mostly inherited
Similar phenotype to Marfan syndrome but with fewer complications; FBN2 abnormalities in 27% to 70% of cases; probable involvement of other loci; reports of a lethal mutation and somatic/germline mosaicism
3 M syndrome (OMIM 273750)
CUL7 (cullin 7)
Autosomal recessive; very rare, 40 to 50 cases reported
Mutations in OBSL1 and CCDC8 can also cause 3 M syndrome; symptoms include severe prenatal and postnatal growth retardation, characteristic facial features and normal intelligence
Costello syndrome (OMIM 218040)
HRAS (v-Ha-ras Harvey rat sarcoma viral oncogene homolog)
Autosomal dominant; mostly de novo mutations; very rare, 250 cases worldwide
Recurrent missense mutation in HRAS reported in up to 80% of cases; somatic/germline mosaicism confirmed in one case and suspected in others; symptoms include postnatal failure to thrive, intellectual disability, coarse facial features, cardiac abnormalities and an increased risk of malignant tumors
Achondroplasia (OMIM 100800)
FGFR3 (fibroblast growth factor receptor 3)
Autosomal dominant; 80% de novo mutations; 0.5 to 1.5 in 10,000 live births
Most common form of dwarfism; 97% of cases show one of two mutations that cause a missense glycine to arginine substitution at position 380 in FGFR3; missense mutation associated with gain-of-function and overactivation of negative growth control; evidence for increasing prevalence with increasing paternal age; other mutations in FGFR3 implicated in other diseases (including more severe skeletal dysplasias); symptoms include shortened limbs and facial features; unexplainably high prevalence and de novo mutations suggest other factors (such as positive selection of sperm) may influence the prevalence of the disease
Cornelia de Lange syndrome
NIPBL (nipped-B homolog) (Drosophila)
Autosomal dominant; mostly de novo mutations; 1 in 10,000 to 1 in 30,000 live births
NIPBL abnormalities reported in 60% of cases; mutations of SMC1A (X-linked) and SMC3 in <6% of cases; maybe involvement of other loci; germline mosaicism implicated; phenotypic heterogeneity, including characteristic facial features, postnatal growth retardation, hirsutism and possible oligodactyly (missing digits); symptoms may approach non-syndromic mental retardation
Growth hormone insensitivity syndrome (GHIS) (OMIM 262500)
GHR (growth hormone receptor)
-7 to -3.6 SD
Mostly autosomal recessive; rare, 100 to 200 cases reported worldwide mostly in two large cohorts
Biochemical and clinical heterogeneity; most severe form (Laron syndrome, effect -7 SD) to partial growth hormone insensitivity (-3.6 SD); one case of autosomal dominant inheritance; probable involvement of other loci as no abnormalities in GHR detected in some patients; symptoms may include severe postnatal growth failure, underdeveloped facial bones and slow motor development
Geleophysic dysplasia (OMIM 231050)
ADAMTSL2 (ADAMTS-like 2)
Autosomal recessive; very rare, 31 reported cases
Very similar to Weill-Marchesani syndrome and acromicric dysplasia; missense and nonsense mutations detected in 70% of individuals; possible involvement of other genes; high early childhood mortality (33%) resulting from cardiac and respiratory dysfunctions
Hypochrondroplasia (OMIM 146000)
FGFR3 (fibroblast growth factor receptor 3)
Autosomal dominant; assumed mostly de novo mutations with prevalence similar to achondroplasia (that is, 1 in 15,000 to 1 in 40,000 births)
Sever hypochondroplasia is similar to mild achondroplasia; most hypochondroplasia cases are associated with alanine-for-asparagine substitution in exon 10 of FGFR3; other mutations account for <2% of cases; suspected involvement of other loci for milder forms of the disorder
Spontaneous and chemically induced mutations affecting size have been reported in mice. As with family studies in humans, the effect of the mutation needs to be large (>3 SD) to be recognized, and the causative gene identified and reported . These approaches probably miss loci where the mutations have more subtle effects . Despite this, spontaneous and chemically induced screens have been successful in identifying over 500 genes associated with abnormal postnatal growth or body size . By comparison, this is more than twice the number of loci identified in humans.
Allelic heterogeneity is not a feature of the mouse mutations as it is in humans, because relatively few alleles are sampled in the small numbers of inbred lines typically used. Similarly, the identification methods also bias the observed inheritance patterns for mutations. Thus, many of the spontaneous mutations initially identified, such as little, pygmy or Ames dwarf, are recessive [16, 17], whereas the chemically induced mutagenesis screens tend to identify dominant, rather than recessive, alleles . These idiosyncrasies relate to identification methods, because of inbreeding strategies or the efficiencies for detecting dominant phenotypes, rather than biological characteristics.
Genes with known mutationsa affecting body size identified from domestic species, and corresponding conditions in humans and/or mice
Conditions in humans and/or mice
Disproportionate chondrodysplasia in Japanese Brown cattle
EVC2 (Ellis-van Creveld syndrome 2)
SNP and deletion variant; recessive mutation
Humans: skeletal dysplasia, Ellis-van Creveld syndrome (OMIM 225500); autosomal recessive
Disproportionate chondrodysplasia in Angus cattle
PRKG2 (protein kinase, cGMP-dependent, type II)
SNP variant; recessive homozygotes are 15.8 cm shorter at birth than wild-type; suggestion of embryonic lethality
Mice: homozygous null mice exhibit disproportionate dwarfism, and decreased weight and body length
Dwarfism in Dexter cattle
Insertion variant; recessive lethal; heterozygotes show disproportionate chondrodysplasia
Humans: short stature and skeletal dysplasia (for example, OMIM 165800); dominant and recessive forms. Mice: spontaneous mutation results in dwarfism and skeletal abnormalities; recessive lethal
Dwarfism in Brahman cattle
GH1 (growth hormone 1)
SNP variant; recessive homozygotes are 70% of wild-type phenotype height and weight
Humans: proportionate short stature (for example, OMIM 173100); dominant and recessive forms. Mice: ENU-induced mutation with additive effects causing dwarfism
Canis lupus familiaris
Disproportionate chondrodysplasia in dogs
FGF4 (fibroblast growth factor 4)
SNP variant; identified by between-breed analyses; shortened limbs
Humans: involved in cancer and limb development. Mice: null homozygous mice show embryonic mortality; mice with conditional mutations show normal limb development
Sus scrofa domesticus
Disproportionate chondrodysplasia in Danish Landrace
COL10A1 (collagen, type X, alpha 1)
SNP variant; dominant mutation; shortened limbs
Humans: dominant mutation causes Schmid metaphyseal chondrodysplasia (OMIM 156500). Mice: dominant mutation shows abnormal skeletal growth
Disproportionate chondrodysplasia in Suffolk sheep
FGFR3 (fibroblast growth factor receptor 3)
SNP variant; overgrowth of limbs; semi-lethal in homozygotes; cannon bone length +1 cm in heterozygotes; recessive but speculated co-dominance
Humans: associated with 13 phenotypes, including dwarfing syndromes and cancer (for example, OMIM 100800). Mice: homozygous null mice show abnormal skeletal development, decreased growth and premature death; mild symptoms in heterozygotes
Only a handful of causative mutations affecting body size in domestic livestock and companion species have been identified (Table 2). Six of the seven genes also have mutations with large effect for body size in mice and/or humans. Most of the mutations have been identified within a single breed, such as the FGFR3 mutation causing limb overgrowth in Suffolk sheep , and these mutations tend to be recessive deleterious mutations. This bias toward identification of recessive conditions can be explained by the relatively small effective population size of many domestic species. Thus, a recessive deleterious mutation may drift to high frequency within a breed before a problem is recognized (for example, ). In contrast, an animal with a deleterious, dominant mutation will be immediately culled and the causal gene is unlikely to be investigated. For recessive mutations, once the syndrome is recognized an effort is made to discover the cause.
An alternative to the within-breed approach is to examine differences across breeds. This approach aims to identify breed-defining loci under the assumption that selective sweeps will be evident in the genome. Examples of mutations identified using across breed identification methods include the FGF4 mutation for disproportionate short stature in dogs such as the dachshund (Table 2) and, although the causative mutations are unknown, the IGF1 mutation in small dog breeds, and the PLAG1-CHCHD7 intergenic region in Holstein-Friesian and Jersey cattle [23–25]. Common haplotypes for these mutant alleles suggest strong selection and identical mutations by descent, rather than selection for recurring new (de novo) mutations. For example, at least one copy of the haplotype carrying the mutant allele FGF4 mutation is found in 19 different dog breeds with chondrodysplasia . Similarly for IGF1, the small IGF1 haplotype was homozygous in 23 different small dog breeds . One problem created by these selective sweeps it that causative mutations are often difficult to isolate. Only analysis of different breeds or outbred populations, preferably where the mutation is not under selection, will enable the identification of the mutation by breaking up the observed linkage disequilibrium (LD) blocks. As observed in mice, the relative allelic and locus homogeneity found for dogs and livestock contrasts with the diversity observed for mutations conferring obvious phenotypes in humans (as listed in Table 1).
Results from livestock and companion species highlight that mutations with large-to-moderate effects are present without severe pleiotropic effects and can reach high frequencies in artificially selected populations. The PLAG1-CHCHD7 intergenic region, for example, was identified by Karim et al.  because alternative alleles were at high frequency in the Holstein-Friesian and Jersey breeds. The effect of the region is moderate, with homozygotes approximately ± 0.4 SD from the heterozygote genotypes (assuming the SD of body weight in cattle is about 50 kg). Similarly in dogs, the genetic architecture is such that ≤ 3 loci can explain much of the between-breed phenotypic variation for body weight . This implies the presence of high frequency alleles with large-to-moderate effects on body size within a breed.
GWASs for stature in humans provide one of the best resources for studying the segregating genetic variation in body size. Over 20 GWASs for human height have been published and 389 genes have been associated with height (P < 1 × 10-5; Table S1 in Additional file 1). In contrast to the mutations with large effect, causative mutations underlying significant associations have seldom been identified. The assumption is that significant SNP markers are in high LD with a causal mutation in a nearby gene. Sometimes there are difficulties in distinguishing between two genes near to a single marker and, occasionally, no known genes are located in the region. However, associations between SNP and height are highly significant, replicate in independent samples of people and, in some cases, in different racial groups (Table S2 in Additional file 1) . The genes near significant SNPs are not a random sample of genes because they are enriched for genes implicated in skeletal development and often they are in high LD with non-synonymous coding mutations or known regulatory mutations .
The estimated effects from human height GWASs are very small (0.02 to 0.13 SD), usually additive rather than recessive or dominant, and have moderate minor allele frequency (0.01 to 0.5). The effect of the causal mutation could be larger than the estimated effect of the SNP and its minor allele frequency (MAF) lower, but the most parsimonious explanation is that the effect sizes and MAF for mutations are similar to those of the associated SNP. This implies that the mutations currently detected by GWASs are relatively common and the effect size for these mutations is small. However, the SNPs identified by GWASs explain only a small proportion (approximately 12%) of the known inherited variation for stature . This has caused much debate among geneticists (Box 1). If the 180 loci identified by Lango Allen et al.  explain 12% of the genetic variation, then this implies a minimum of 1,500 loci to explain the segregating genetic variance for stature in human populations (that is, 180/0.12 = 1,500). This number is the minimum expected because loci identified by Lango Allen et al. are presumably a subset of loci with the largest influence on the genetic variance.
Genes identified with both large and small-effect mutations that affect stature and skeletal formation in humans
OMIM phenotypes associated with the gene
GWASs identifying the gene
Example stature phenotype a
Spondyloepimetaphyseal dysplasia, aggrecan type (-) (OMIM 612813)
ADAMTS10 (ADAM metallopeptidase with thrombospondin type 1 motif, 10)
Weill-Marchesani syndrome 1, recessive (-) (OMIM 277600)
ARSE (arylsulfatase E) (chondrodysplasia punctata 1)
Chondrodysplasia punctata, X-linked recessive (-) (OMIM 302950)
BBS1 (Bardet-Biedl syndrome 1)
Bardet-Biedl syndrome 1 (-) (OMIM 209900)
BBS7 (Bardet-Biedl syndrome 7)
Bardet-Biedl syndrome 7 (-) (OMIM 209900)
BRCA2 (breast cancer 2, early onset)
Fanconi anemia, complementation group D1 (-) (OMIM 605724)
COL11A1 (collagen, type XI, alpha 1)
Fibrochondrogenesis (-) (OMIM 228520)
CYP19A1 (cytochrome P450, family 19, subfamily A, polypeptide 1)
Aromatase deficiency (-) (OMIM 613546)
Smith-McCort dysplasia (-) (OMIM 607326)
EIF2AK3 (eukaryotic translation initiation factor 20alpha kinase 3)
Wolcott-Rallison syndrome (-) (OMIM 226980)
EXT1 (exostosin 1)
Exostoses, multiple, type 1 (-) (OMIM 133700)
FANCC (Fanconi anemia, complementation group C)
Franconi anemia, complementation group C (-) (OMIM 227645)
FANCE (Fanconi anemia, complementation group E)
Franconi anemia, complementation group E (-) (OMIM 600901)
FBN2 (fibrillin 2)
Contractural arachnodactyly, congenital (+) (OMIM 121050)
FGFR3 (fibroblast growth factor receptor 2)
Achondroplasia (-) (Table 1)
FLNB (fibamin B, beta)
Larsen syndrome (-) (OMIM 150250)
GALNS (galactosamine (N-acetyl)-6-sulfate sulfatase)
Mucopolysaccharidosis IVA (-) (OMIM 253000)
GDF5 (growth differentiation factor 5)
Acromesomelic dysplasia, Hunter-Thompson type (-) (OMIM 201250)
GH1 (growth hormone 1)
Growth hormone deficiency, isolated, type IA (-) (OMIM 262400)
GHR (growth hormone receptor)
Laron dwarfism (-) (OMIM 262500)
GHSR (growth hormone secretagogue receptor)
Short stature (-) (OMIM 604271)
HMGA2 (high-mobility group AT-hook 2)
Leiomyoma, uterine, somaticb (-) (OMIM 150699)
IHH (Indian hedgehog)
Acrocapitofemoral dysplasia (-) (OMIM 607778)
KCNJ2 (potassium inwardly-rectifying channel, subfamily J, member 2)
Atrial fibrillation, familial, 9 (-) (OMIM 613980)
PTCH1 (patched 1)
Basal cell nevus syndromec (-) (OMIM 109400)
RNF135 (ring finger protein 135)
Macrocephaly, macrosomia, facial dysmorphism syndrome (+) (OMIM 614192)
RPL5 (ribosomal protein L5)
Diamond-Blackfan anemia 6d (-) (OMIM 612561)
RUNX2 (runt-related transcription factor 2)
Cleidocranial dysplasia (-) (OMIM 119600)
SLC39A13 (solute carrier family 39 (zinc transporter), member 13)
Spondylocheirodysplasia, Ehlers-Danlos syndrome-like (-) (OMIM 612350)
TBX15 (T-box 15)
Cousin syndrome (-) (OMIM 260660)
Generally, GWASs in domestic species explain a much higher percentage of the genetic variance than human GWASs . In mice, for example, the study by Valdar et al.  explains an average of 75% of the genetic variance in 97 traits, including body weight. GWASs in mice are not performed with a wild population but with a heterogeneous population derived from inbred strains. These heterogeneous strains show LD over long genomic distances and this probably explains why a higher proportion of variance is captured. In addition, Valdar et al. assign an identical-by-decent probability to each marker . This may track the causal polymorphisms from each strain better than the use of individual SNPs because, for example, Valdar et al.  also show that diallelic markers (that is, SNPs) cannot account for the described loci in a third of cases.
A further consequence of LD over long distances is that positioning the causal polymorphism can be very difficult from GWASs in model species. This is similar to the problem encountered for breed-defining loci in dogs, for example; however, loci identified from GWASs are segregating variants causing within-population variation in body size. In the study of Valdar et al. , quantitative trait loci regions for body weight contain up to 22 genes within the 50% confidence interval. It is likely that some of these regions contain multiple causal polymorphisms, particularly for regions that show differences between the identical-by-descent and diallelic results.
Association studies in livestock generally have small sample sizes (approximately 2,000 records), and hence relatively high false discovery rates compared with GWASs in humans (for example, ). This means that defining genes identified by livestock GWASs is more uncertain than those identified in humans. However, Pryce et al.  tested 55 genes that had previously been identified in human GWASs for affects on stature in dairy and beef cattle . A total of eight genomic regions with ten genes (Table 2; Table S2 in Additional file 1) showed significant associations (P < 1 × 10-3). This confirms that real causal mutations are being detected by human GWASs, and highlights that mutations in the same genes contribute to segregating variation for both human and cattle populations. This implies that mutations in these genes may occur without obvious pleiotropic effects.
One explanation is that the intermediate effects are not efficiently detected in humans because of the current detection methodologies. This is because family-based studies detect only extreme phenotypes, whereas association studies usually detect only rare, large effect mutations and segregating mutations in high LD with SNPs at moderate allele frequencies. The power of GWASs to detect a mutation depends on both the variance explained by the mutation (that is, 2pqa 2, where p is the allele frequency, p + q = 1, and a is the effect size) and the LD between the mutation and SNP. Therefore, mutations with intermediate effect but frequencies lower than common SNPs are unlikely to be detected because of the small variance explained by the locus and also because of weak LD with common SNPs. Loci with intermediate effects and poor LD with common SNPs may also explain some of the so-called 'missing heritability' (Box 1).
Mutations with intermediate effects might be detected as follows. (1) In populations subject to strong, recent artificial or natural selection, alleles that were previously rare can be driven to intermediate frequency where they are easier to detect; the PLAG1-CHCHD7 polymorphism in cattle may be an example of this. (2) Identical-by-descent haplotypes may be in complete LD with a rare mutation even though it is not in complete LD with any single SNP (for example, ). (3) Genomic sequence should include the causal mutations so that imperfect LD is not a problem. However, the power to detect a mutation is still determined by the variance it explains (that is, 2pqa 2) such that large sample sizes will still be necessary to detect variants explaining small proportions of the phenotypic variance.
The genetic architecture that we observe in populations today is the result of evolutionary processes. Mutation creates new variants and then selection and genetic drift determine the observed allele frequency. Not all mutations affect stature, but the results reviewed here suggest that there are many sites in the genome where mutation does affect size. Mutations in 241 genes are known to cause large effects on stature and skeletal features in humans. In many cases, >20 alleles at a gene with a large effect have been discovered and presumably not all possible sites in these genes have been discovered. Assuming there are 50 sites at 250 genes, this implies that there are 12,500 sites where mutations have a large effect (>2 SD). This is likely to be an underestimate because we previously estimated that there are 2,500 sites where mutation can cause Marfan syndrome alone.
The mutations of large effect are subject to strong selection, presumably due to their pleiotropic effects on fitness. This is shown by the high rate of de novo mutations among people carrying a mutant allele (Table 1). The selection coefficient is equal to the proportion of mutant alleles that are new mutations because an equal number of mutant alleles must be eliminated each generation. Disorders vary, but 25% de novo mutation rates are not uncommon (Table 1). Assuming a per gene mutation rate of 5 × 10-7 (50 sites and a mutation rate of 1 × 10-8 per site), the equilibrium allele frequency for mutations at such a locus is 5 × 10-7/0.25 = 2 × 10-6 (or a prevalence of approximately 1 in 500,000). Genes where mutations occur at much higher frequencies, such as 10-4 (or 1 in 10,000), must be due to higher mutation rates, a lower selection coefficient, genetic drift, or they may be recessive. Even at a frequency of 10-4, a mutation with effect of 2 SD explains only 8 × 10-4 of the phenotypic variance (for details see Additional file 1). However, the average frequency rate for all mutations is much less than 10-4. If we assume the average frequency is 1 × 10-5, the variance explained per locus is 8 × 10-5 and the total variance explained for all mutations is 0.025 of the total phenotypic variance. Considering that 0.8 of the phenotypic variance is due to inherited genetic factors, we conclude that most of the genetic variance is not due to mutations of large effect.
It seems likely that other mutations at these 250 genes can cause smaller effects on height and experimental results support this assertion . Lango Allen et al.  discovered 180 loci for height in humans that were estimated to explain about 10% of the phenotypic variance. When they allowed for the lack of power of their experiment, they concluded there were 700 loci associated with height but these would still only explain 16% of the variance. Therefore, 700 is likely to be a considerable underestimate. As stated previously, mouse knockout experiments suggest 6,000 genes can affect height . If there are 50 sites within a gene where mutation has a large effect, there are likely to be many more, including sites regulating gene expression, with small effects. If we assume 6,000 genes each with 200 sites, there are 1,200,000 mutable sites that affect height.
GWASs find SNPs with effects <0.13 SD. Even new mutations at a possible 1.2 million sites, a mutation rate of 10-8 per site and an average effect per mutation of 0.1 SD would only add 2.4 × 10-4 to the genetic variance each generation (for details see Additional file 1). This mutation variance would need to accumulate for 3,300 generations to account for the known heritability. This implies that selection against these mutations of small effect is weak.
The selection pressure against mutations probably decreases as the size of the effect on height decreases. Mutations with very small effects may be effectively neutral in the human population and drift in allele frequency until they are lost or fixed by chance. Other mutations with intermediate effect will also drift in frequency but selection (due to pleiotropic effects) will cause most to have a low frequency. It is these mutations of intermediate and small effect that appear to explain most of the genetic variance. Although the mutations of intermediate effect are poorly detected by current experiments because they are not in strong LD with SNPs used in GWASs, we know they are biologically plausible because we occasionally detect such mutations in domestic animals when artificial selection or genetic drift caused by inbreeding causes their frequency to increase.
In summary, we have surveyed the current known mutations affecting body size in humans, mice, dogs and livestock species. Although mutations of intermediate effect are poorly detected by current experiments in humans, we know they are biologically plausible because we occasionally detect such mutations in domestic animals when artificial selection or genetic drift increases their frequency, and enables their detection. We see that genomic information is gradually building a model for genetic architecture implying many thousands of discrete genes, each with many mutable sites and (possible) segregating mutations. The frequency and size of effect for mutations differs between populations where natural selection and recent history play significant roles in determining the observed distribution. We see extensive (detrimental) pleiotropy for large-effect mutations for rare conditions in humans and also occasionally in livestock. Mutations with less obvious pleiotropy and more modest effects are observed in domestic populations because of selection and drift, but this class of mutations is rarely observed in humans. Mutations with very small effects occur at intermediate frequencies in both humans and livestock. However, the current data are limited because associations of phenotype are with genetic markers (that is, SNP) and not causal mutations.
In the future, genomic sequence data will offer the opportunity to discover the causal mutations underlying quantitative traits such as body size. Of key interest are the number, effect size and frequency of such mutations. It remains to be seen, for example, if the missing intermediate effect size mutations for human height are identified from genomic sequence and if these mutations will further explain some of the 'missing heritability'. It is likely that body size, as a model trait, will continue to inform and direct research into the future.
The inability of genome-wide association studies (GWASs) to explain most of the known inherited variation in complex traits (including human height) has caused much debate (for example, ). For instance, Lango Allen et al.  used >100,000 people to find 180 SNPs significantly associated with height. When the effects of these SNPs were estimated in an independent sample, they explained only 10% of the phenotypic variance although the heritability of height is approximately 80%. So why do we not detect these missing variants? Part of the reason for the difference between 80% and 10% is that the experiment lacked power to find SNPs with small effects despite the large sample size . Lango Allen et al.  estimate the power to find associations of the size they discovered (0.02 to 0.13 standard deviation (SD) units) and suggest that there would be 700 loci with effect sizes in this range that would collectively explain 16% of the phenotypic variance. In contrast, Yang et al.  estimate that 45% of the phenotypic variance could be explained by all the SNPs together when the significance level of individual markers is ignored. The difference between 16% and 45% is because SNPs with real associations with height had an effect that was too small to be detected by Lango Allen et al. This group of SNPs could include two classes of mutations, one with very small effects (<0.02 SD) and another with small (0.02 to 0.13 SD) or intermediate (0.1 to 1 SD) effects but in low linkage disequilibrium (LD) with many SNPs. Under these conditions, the analysis of Lango Allen et al. may underestimate the number of mutations with small effects, because if mutations are associated with several SNPs the effect of any one SNP may be too small to be significant in a GWAS. However, the collective variance explained by all associated SNPs is included in the analysis of Yang et al. . The difference between 45% and 80% is likely to be caused by imperfect LD between the SNP and the causal mutations. That is, even when multiple SNPs track a mutation they may not completely explain variance at the loci. This lack of perfect LD could be due to the causal polymorphisms having a low frequency (for example, a minor allele frequency <0.1). We suggest that some of these loci could be the missing intermediate effect mutations highlighted by Figure 2.
genome-wide association study
minor allele frequency
single nucleotide polymorphism.
This research was supported under Australian Research Council's Discovery Projects funding scheme (project DP1093502). The views expressed herein are those of the authors and are not necessarily those of the Australian Research Council.