The domestic dog: man's best friend in the genomic era
© BioMed Central Ltd 2011
Published: 22 February 2011
Skip to main content
© BioMed Central Ltd 2011
Published: 22 February 2011
The domestic dog genome - shaped by domestication, adaptation to human-dominated environments and artificial selection - encodes tremendous phenotypic diversity. Recent developments have improved our understanding of the genetics underlying this diversity, unleashing the dog as an important model organism for complex-trait analysis.
In the 5 years since the publication of the genome sequence of the domestic dog (Canis familiaris) , our understanding of dog origins and evolution has improved considerably. Before this genome sequence was available (the breed chosen to sequence was a boxer, sequenced at 7.8 coverage), canine genomic analysis relied on linkage and radiation hybrid maps encompassing at most 3,000 to 4,000 markers, or approximately 1 Mb resolution of the genome [2, 3]. Comparing the boxer genome to an earlier 1.5 coverage poodle genome  and low-coverage sequencing from nine other dog breeds and wolves, scientists have now cataloged over 2.5 million single nucleotide polymorphisms (SNPs) . Genotyping technology has enabled tens of thousands of these SNPs to be typed at a modest cost (approximately US$200 per sample for a 20,000 to 60,000 marker array), giving unprecedented resolution of canine population genetics [5–7] and leading to the rapid identification of loci underlying complex and Mendelian traits (see Additional file 1).
The phenotypic diversity of the world's 350 to 400 dog breeds is mirrored in their genetic diversity. Although most breeds have existed for less than two centuries, the level of diversity (F ST ) in dogs is about twice that found in humans (F ST averages 0.28 among dog breeds) [6, 8]. In an effort to create a perfect companion, dog fanciers have embarked on an 'experiment', faithfully rearing, selecting, breeding and adapting, generation after generation, millions of pedigreed animals with genetically based proclivities and susceptibilities awaiting genomic interrogation. The recent release of the new 170K Illumina HD canine SNP array coupled with an improved genome assembly (canFam3) and advances in targeted and high-throughput DNA and RNA sequencing will surely accelerate the pace of canine genomics in the near future, expanding our understanding of evolution in dogs and their utility as a model genetic system.
Because of the incredible diversity of modern dogs and the number of derived characteristics distinguishing dogs from their ancestors, determining the ancestor of dogs required genetic data. From Charles Darwin to Konrad Lorenz, early researchers believed that admixture with multiple canid species, including jackals, was necessary to explain domestic dog diversity [9–11]. However, modern mitochondrial DNA (mtDNA) analysis has instead shown that the gray wolf (Canis lupus) was the sole ancestor of modern dogs [12, 13].
Shedding light on the specifics of the location and timing of dog origins has been difficult. The earliest definitive archeological evidence for dog burials is around 11,500 years ago in Israel [14, 15], although evidence also exists for dog burials in Germany around 14,000 years ago . Burials or artistic depictions of dogs before this are lacking from the archeological record, suggesting a relatively recent origin for dogs (less than 16,000 years ago) or at least a major shift in dog anatomy and/or human interaction at this time. Older dog-like canid fossil remains exist, but they are difficult to group unequivocally as being early dogs or small wolves (for example [17, 18]).
In the same way, genetic studies have yet to provide a definitive account of dog origins. Initial estimates of a dog-wolf divergence of more than 100,000 years ago based on mtDNA sequence data relied on the flawed assumption that each modern dog mtDNA haplogroup descended from a single wolf mtDNA lineage . Research by Savolanien and colleagues [19, 20], however, gives a more recent estimate for dog domestication (5,400 to 16,300 years ago) using a domestication model that allows for dozens of founding maternal lines. These studies conclude that dogs probably originated in East Asia south of the Yangtze River, on the basis of the high level of mtDNA diversity found in extant dogs in this area, but they make a crucial assumption that there was a single origin for dogs and that introgression from wolves in Asia has been minimal. If mtDNA haplogroups entered the dog lineage at different times and places, the true timing and location of domestication could differ substantially from those inferred from a single-origin model . This seems to be the case: using whole-genome genotyping data from purebred dogs and wolves, Vonholdt et al.  found evidence for allele sharing between Asian dog breeds and Asian wolves (and between European dog breeds and European wolves), suggesting non-trivial levels of regional introgression or multiple founding events. More sophisticated models using whole-genome data from both dogs and wolves are needed to disentangle the complex demographic process underlying dog domestication.
In part because of the difficulties in resolving the location and timing of dog domestication, considerable debate surrounds the roles that humans, wolves and early dogs played in the process. Certain populations of gray wolves probably 'pre-adapted' themselves for domestication by scavenging from human settlements, an activity that not only placed wolves at close quarters to people, but also selected for reduced fearfulness. Individuals willing to forage in proximity to humans were better able to exploit their food sources, particularly if these individuals learned to read human cues (Box 1). Whether increased competition from bow-wielding human hunters or the emergence of rubbish dumps in the villages of these hunter-gatherers caused the shift to scavenging is unclear [22, 23], but competition with humans, or persecution by them, certainly could have been an important isolating force keeping feral wolves from wiping out early proto-dog populations by swamping gene flow.
Notably, dog domestication occurred before the advent of agriculture . The appearance of agriculture shortly after the origin of dogs suggests that dog domestication itself may have been an important precursor to the transformation of humans into agriculturalists. However, without knowing what roles early dogs played in human settlements, this hypothesis is highly speculative. Once agriculture was established, scavenging around human settlements became a highly profitable endeavor, and dogs rapidly spread throughout the world . At some point, people began using these dogs as sentries, food sources and companions, but whether domestication was complete by this time (as proposed in Coppinger's model of dog 'self-domestication' ) or whether directed human selection was required to make dogs fully domesticated (as proposed in Crockford's model of 'classic domestication' ) is still debated.
Over the past two centuries, dog breeders have fortuitously generated a powerful system for mapping genes underlying phenotypic variation . Variants with large, observable effects, like hairlessness or dwarfism, were strongly selected, reaching fixation within certain breeds. The canine genome itself can contribute to generating such variants. Canidae have been shown to exhibit elevated levels of DNA slippage contributing to microsatellite diversity  and have a highly active SINE (short interspersed nuclear element), SINE_Cf, akin to Alu SINE repeats found in primates, that segregates at a rate ten-fold higher than the SINE rate in humans [1, 33]. However, these types of variants underlie only a small proportion of the casual mutations that have been found to date in dogs , suggesting that population structure and selection for extreme phenotypes, not canine-specific mutational biases, are the main forces driving the rapid diversification of dogs .
Often, several breeds are characterized by a phenotype in which the causal genetic variant in each breed is identical because of shared descent or directed introgressive breeding for the trait . For example, disproportionate dwarfism (chondrodysplasia) is a defining characteristic of at least 19 breeds, including dachshunds, pekingese, and basset hounds. A genome-wide association study (GWAS) using 797 dogs from eight chondrodysplastic breeds and 64 nonchondrodysplastic breeds found a region of canine chromosome 18 (CFA18) corresponding to a 5 kb expressed retrotransposon insert of fibroblast growth factor 4 (FGF4) unique to the chondrodysplastic breeds . Multi-breed GWAS have likewise shown that short-snouted (brachycephalic) breeds share a haplotype near thrombospondin-2 (THBS2) on CFA1 [8, 36] and floppy-eared breeds share a haplotype near methionine sulfoxide reductase B3 (MSRB3) on CFA10 , although the causal mutations in these regions remain undiscovered.
It is perhaps surprising that for so many traits, the causal variants are identical across often distantly related breeds. In some cases (such as floppy ears or small body size), the causal variant is ancient and segregating at high frequency in natural village-dog populations. Thus, the variant was present in the founder population for several breeds and subsequently selected in parallel in a subset of them. In other cases, the causal variant might have been introduced directly from one breed to another (for example, the ridgeback phenotype caused by an identical 133-kb duplication of CFA18 in Thai and Rhodesian ridgebacks ). Currently, little is known about the evolutionary age and history of most causal alleles, although haplotype analysis and/or the genotyping of diverse village-dog and wolf populations can be highly informative. Variants with a global distribution probably arose early in dog evolutionary history, whereas geographically restricted variants are expected to be more recent. Interestingly, two ancient causal haplotypes (the 'small dog' IGF1 haplotype and the chondroplasia fgf4 variant) both appear to have arisen on ancestral haplotypes associated with Middle Eastern or European gray wolves and not East Asian gray wolves [37, 43].
For traits found in just one or a few breeds, linkage and GWAS have also been highly successful despite the increased difficulty of fine mapping across large linkage blocks (see  for a recent review). Furthermore, haplotype-based and F ST-based methods to detect recent selection can improve the power of traditional GWAS and find genomic regions underlying selected phenotypes even if they are only present in a single breed. For example, excessive skin wrinkling (found almost exclusively in the Shar Peis breed) was mapped to hyaluronan synthase 2 (HAS2) using an F ST-based approach . The recent introduction of higher-density genotyping arrays (such as the 170 K Illumina HD array) should further improve the power of these methods from what was possible with older (20 to 60 K) SNP arrays.
An important application of trait mapping in dogs is the discovery of variants underlying genetic disease. Traditional GWAS using well phenotyped cases and controls have proved highly successful in finding regions containing causal variants underlying more than 70 Mendelian diseases in dogs (see Additional data file 1). Many of these diseases have close human analogs: genetic mapping of narcolepsy , copper toxicosis [46, 47] and ichthyosis (C André, personal communication) in dogs led to the discovery of new causal variants in the human ortholog affecting the human disease. Because diseases that are complex and/or rare in humans are often monogenic and common in some dog breeds, and because of dogs' relatively large family sizes, dogs are particularly useful for identifying new candidate genes underlying such disorders.
In addition, dogs occupy a valuable intermediate position between the human and mouse genetic systems, increasing their utility as a model system . Despite mice and humans last sharing a common ancestor more recently than dogs and humans (approximately 75 million years ago versus approximately 87 million years ago), the faster rate of evolution in the rodent lineage means that there is less sequence divergence between human and dogs than between humans and mice, and therefore approximately 650 Mb more human sequence can be syntenically aligned to the dog genome than to that of the mouse . Furthermore, dogs are more similar to humans than are mice in terms of body size, longevity and behavior, which also leads to similarity in various genetic pathologies. Finally, dogs have co-habited with humans longer than any other domestic animal, sharing our nutritional and pathogenic environment during our species' unprecedented shift from a hunter-gatherer lifestyle to agriculture. Some human adaptations to this dramatic environmental shift that contribute through antagonistic pleiotropy to disease (such as highly reactive immune systems that protect from infectious disease but predispose individuals to autoimmune disorders [49, 50]) might have evolved in parallel in dogs.
Strong artificial selection has contributed to the diversity of disorders exhibited in dogs. Independent, severe founder effects for each breed cause diseases that are at extremely low prevalence in natural dog populations to, by chance, reach appreciable frequency in one or a few breeds, either from the founder bottleneck itself or through the subsequent propagation of popular sires harboring the variant . In particular, some recessive disorders caused by loss-of-function mutations and some cancers can be rare in humans but common in certain dog breeds (for example, osteosarcoma  and amyotropic lateral sclerosis (ALS)-like canine degenerative myelopathy ). Diseases can also be associated with variants selected for a pleiotropic effect - for example, dermoid sinus, a neural tube defect in dogs, is caused by the same variant that produces the ridgeback coat phenotype . Finally, the large selective sweeps containing artificially selected variants can also harbor linked disease variants that hitchhiked to high frequency during the sweep. It is perhaps not coincidental that the gene underlying lens dislocation in terriers  is adjacent to a gene implicated in controlling body size among small breed dogs (B Hoopes, personal communication).
Mapping causal variants for quantitative traits is generally more difficult than mapping monogenic traits, if only because accurate phenotyping and controlling for genetic background can be problematic. Nevertheless, GWAS have elucidated dozens of regions underlying quantitative variation in dogs, although most of the causal variants in these regions remain undiscovered. As next-generation sequencing costs decline and more and more canine genomes are sequenced, candidate loci in these regions and others should begin to emerge, improving our understanding of the genetic basis of phenotypic variation in this system. In particular, advances in targeted sequence capture (seq-cap) and DNA barcoding currently enable efficient sequencing and analysis of multiple individuals across candidate qualitative trait loci (QTL) regions [55, 56].
Several studies have performed multi-breed GWAS for body weight and morphological measurements [8, 44, 57, 58]. Despite great differences in the breeds and marker sets used by each, the results are highly consistent for several traits. All studies identified IGF1 as the primary locus affecting body weight and also consistently found other significant QTLs not yet associated with causal variants on CFA7 near a SMAD family gene (SMAD2), CFA10 near high-mobility group protein-A2 (HMGA2) and CFA34 near IGF2 mRNA-binding protein (IGF2BP2). After controlling for allometry, height was principally controlled by the fgf4 retrotransposon on CFA18 and also by an unknown variant near RNF4 and MXD on CFA3 in all four studies.
Does this simplified genetic architecture characterize other complex canine phenotypes, including those associated with behavior, longevity and common multifactorial diseases? Analyses of highly differentiated genomic regions among breeds (which might represent the 'low-hanging fruit' for across-breed mapping studies) show that, overwhelmingly, the highest differentiated regions correspond to known morphological traits involving body size, proportion, coat characteristics and ear type . Although these regions can also be associated with other traits - IGF1, for example, has also been implicated by GWAS as significantly affecting boldness, age of death and prevalence of several diseases [57, 58] - the most parsimonious explanation is that selection towards breed standards has led to stronger differentiation at loci affecting morphology than those affecting other traits. Perhaps this result is not surprising, as behavior can be difficult to quantify and disease prevalence and longevity, while highly breed-dependent, are not breed-defining characters undergoing direct diversifying selection.
Even for morphological variation, the influence of QTLs of major effect may be overstated. For example, although variation in IGF1 explains 50% of the variation in size between breeds and nearly 50% of the variation within Portuguese water dogs where it segregates, it explains only 17% of the variation of body size within a population of village dogs where it segregates [8, 60]. Portuguese water dogs were chosen to study body-size variation because they exhibit high intra-breed variation for size; other breeds also have intermediate IGF1 frequencies and it is not clear if they exhibit more intra-breed variation in body size than other breeds or if they possess loci that 'canalize' IGF1 and other high-effect QTLs to reduce their effects.
In addition to the uncertainties regarding the simplified architecture underlying additive genetic variance in dogs, the degree to which non-additivity (dominance and recessiveness) and epistasis (interactions between loci) have an impact on complex traits in dogs is also uncertain. Although Lark  found a non-additive epistatic interaction between IGF1 and a locus on the X chromosome controlling body size, most studies of complex traits in dogs have assumed additivity and ignored the X chromosome (Boyko et al.  did include the X chromosome, albeit only using an additive model, and found evidence for two body-size loci but not in the region reported in .) In addition, even the estimates for the proportion of variation explained by QTLs of large effect such as IGF1 might be overstated. If diversifying selection is strong and several loci contribute to the trait but only the ones of largest effect are detected, then these large-effect loci will be associated with both the morphological effect they engender and also with the effects of the rare or small-effect loci that were also swept to some degree but not detected.
Nevertheless, dogs have clearly exhibited a shift in the genetic architecture of complex traits towards variants of large effect for several important phenotypes. This shift facilitates complex-trait mapping, making the dog an extremely important model system. The extent to which this shift has affected non-morphological variation and whether this simplification extends to other aspects of genetic architecture is still unclear. Important insights into evolutionary biology, including the nature of standing genetic variation and the genomic consequences of adaptation to new environmental pressures, could be gained by determining the degree to which domestication, selective breeding and the genetic structure of the canine genome have each altered the genomic architecture of complex traits in this species.
Over the past decade, genetic analysis of dogs has not only enriched our understanding of their origins, but also led to the discovery of causal variants underlying myriad diverse phenotypes and diseases. As dogs became genome-enabled, candidate gene approaches gave way to GWAS, which have proved a particularly powerful method in this genetic system. Dogs have fewer allelic variants per locus and long tracts of linkage disequilibrium within breeds, meaning that far fewer markers are needed to find significant associations in dogs, perhaps an order of magnitude fewer than needed in human studies [61, 62]. In addition to helping to identify regions associated with morphology and disease, this should make dogs particularly valuable in the near future when studies begin focusing on gene x gene interactions. In that case, the number of hypotheses to test scales by roughly the square of the number of markers; thus, depending on the breed, significant tracts of linkage disequlibrium could effectively reduce this number by a 100-fold or more.
Great progress is currently being made in mapping complex diseases in dogs, including cancer, diabetes, immune disorders, behavioral pathologies, osteoarthritis, and cardiac disease. Causal variants contributing to certain conditions in certain breeds will likely be identified in the near future, but it is less clear when these studies will begin to identify gene-gene and gene-environment interactions that could contribute even more to our understanding of the biological basis of the diseases. Reduced genetic (and perhaps environmental) heterogeneity compared with humans also makes expression QTL approaches to detecting complex traits promising in dogs. Previously, expression studies were limited by the power of microarrays and the reliability of gene annotation in the dog genome, but next-generation approaches such as direct RNA sequencing (RNA-seq) should greatly enhance functional genomics and expression profiling in dogs .
A significant impediment to fully realizing the potential of the dog as a model genomic system is the lack of uniform data-release standards for published dog genomic studies. Making complete genetic and phenotypic data release de rigueur for canine genomics projects would provide the opportunity for meta-analyses utilizing many thousands of dogs that would be highly informative for many of these traits and interactions. Such meta-analyses are common in human genetic studies even though standards of subject confidentiality are much more rigorous, and have yielded valuable insights about the genetic basis of complex disease (see, for example ).
Future genomic studies may be able to unravel what it is that makes a dog a dog. For example, what are the genetic variants underlying the traits such as barking and neoteny (juvenilization) that became fixed very early in dog evolutionary history? How has the novel social, nutritional and disease environment of the dog affected its genome? The human genome is packed with strong signatures of selection at variants underlying phenotypic diversity of behavioral, metabolic and immune traits; is the dog genome littered with a parallel complement? The unique relationship between dogs and humans gives canine genomic studies the opportunity not only to flush out what it is that makes a dog a dog, but also to motivate comparative genomic approaches in their human companions.
One of the most remarkable aspects of dog cognition is their ability to 'read' people. Like humans, dogs seem naturally inclined to use cues like pointing or gazing to find hidden food sources in object-choice tests, unlike wolves and non-human primates. Dogs do not outperform these species in non-social tasks , suggesting that domestication itself has selected for human-like social responsiveness.
In a typical experimental set-up (reviewed in ), dogs are allowed to choose between two inverted cups, one of which has been randomly chosen to be baited with food hidden under the cup. A human experimenter stationed between the cups gives a signal (such as gazing at the baited cup while pointing), and the proportion of times the animal chooses the correct cup is recorded for each cue. Dogs of all breeds seem remarkably adept at these experiments .
Early studies concluded that making use of human social cues was a skill present in dogs at a very early age, regardless of upbringing, and absent in wolves . Subsequent work, however, showed that both dog and wolf performance on this task is highly dependent on experience, environment and experimental set-up - pet dogs do not perform well outdoors , shelter dogs do better with obvious cues (point + gaze) than less obvious one (point or gaze separately) , and neither dogs nor wolves excel when they are separated from the cue-giving experimenter by a fence . Furthermore, the evidence for cue use in puppies less than 16 weeks old is ambiguous [70, 71].
R Boyko and colleagues (personal communication) found differences in performance between shelter dogs in Western and non-Western countries. In Western countries where shelter dogs were raised in homes or shelters, dogs successfully followed human point and gaze cues whereas in non-Western countries where shelter dogs were raised on the streets, they did not. This supports the hypothesis that socialization depends on a critical developmental window with the process of domestication acting to lengthen the window . Natural selection has clearly given dogs the cognitive abilities and temperament to excel at reading human signals, but early socialization is still critical for these skills to develop .
Reduction of genetic diversity due to demographic forces. The founding and maintenance of dog breeds, and to a lesser extent bottlenecks associated with dog domestication, have reduced genetic diversity throughout the dog genome . As a consequence, the genetic background on which a causal variant acts is less heterogeneous in dogs than it is in humans. Furthermore, allelic heterogeneity within a locus is significantly reduced, facilitating the discovery of candidate genes through GWAS. Whereas human populations may harbor several rare variants with varying (and sometimes opposite) effects at a locus, dogs have far fewer variants, often only one per locus. These single variants are more easily tagged in genotyping array studies, particularly since they are often relatively common in one or more breeds (and absent in others). In effect, breed structure significantly reduces the prevalence of rare variants that are believed to account for much of the missing heritability in human complex traits.
Artificial selection favoring novelty. Victorian dog fanciers actively selected for distinguishing characteristics in their animals. In fact, dog populations with distinguishing characteristics were probably more likely to gain official recognition as a sanctioned breed. Thus, among breeds, phenotypic variance in many traits has increased, favoring a simplified genetic architecture for the trait . Furthermore, active selection for saltational mutations that would have been weeded out by natural selection further enhances the proportion of variance explained by large-effect QTLs.
Short evolutionary history. Most breeds were formed and adopted their breed standards little more than 100 generations ago. QTLs of minor effect necessarily have small fitness values and therefore have not had sufficient time to substantially differentiate in frequency in select breeds, a requirement if these alleles are to underlie a significant proportion of genetic variance. In contrast, QTLs of major effect can be very efficiently selected by breeders over short timescales. A corollary of this effect, however, is that most 'modifier loci', such as those that increase canalization of breed-defining characteristics or reduce recombination rates between epistatic loci, tend to be weakly selected, reducing the likelihood that such effects are a major part of the canine genetic architecture for complex traits.
Relaxation of selective constraint. Artificial selection by breeders dramatically reduces the efficacy of natural and sexual selection, allowing for genetic drift and phenotypic variation in traits that would otherwise be constrained by these forces. In general, selectively neutral traits exhibit simplified genetic architectures, as evidenced by the relatively large proportion of late- versus early-onset human diseases explained by just a few major QTLs (for example, Alzheimer's disease and macular degeneration ).
I gratefully acknowledge the helpful comments of RH Boyko, JM Kidd and SM Myles on an earlier draft of this manuscript and insightful discussions with CD Bustamante. AR Boyko has been funded by NSF (DEB) 0948510 and (DBI) 0701382.