Opinion | Open | Published:
What use is the human genome for understanding the mouse?
Genome Biologyvolume 2, Article number: comment2009.1 (2001)
Having a working draft of the human genome sequence is proving invaluable to mouse genetic and genomic studies, providing a useful stepping-stone towards the finished sequence of the mouse genome.
All over the world, mouse geneticists have applauded the publication of the initial analysis of the human genome sequence(s) [1,2]. Why? One simplistic answer is that mouse and human are two flavors of mammal, and a genome sequence for one is a surrogate for the other. So perhaps the pertinent question then becomes: how can mouse geneticists make use of the human sequence? In this article, we briefly describe some ways in which the human working draft sequence can be used as a tool in mouse genomics, not only for assembling the mouse genome but also for identifying conserved sequence elements and providing new insights into genome evolution.
Before the genome
Even before the inception of 'The Human Genome Project', mouse and human genetics already formed a two-way street. An early example of this was the observation that inherited traits exhibiting sex-linkage in humans were also sex-linked in mice - for example, hypophosphatemia . As genetic maps improved in both species, it became clear that there were blocks of conserved synteny, along chromosomes (synteny literally means 'on the same thread') . Indeed, with the development of dense, genome-wide maps, it has become possible confidently to infer the location of a mouse homolog of a human gene, on the basis of the location of the genes that flank it in the human genome, and vice versa (Figure 1) [5,6].
Mice suffer from diseases similar to those of humans. Furthermore, this biological similarity can extend to defects in the same molecules: for example, mutations in the leptin gene cause morbid obesity  and in myosin VII cause deafness , both in humans and in mice. In complex genetic diseases, such as diabetes, or where mutations may be subtle rather than obviously deleterious, the mouse is of particular importance as it allows experimental testing of the validity of candidate mutations, by targeted mutagenesis. From the outset, the Human Genome Project recognized the importance of model organisms, from bacteria to mouse, and devoted funding to developing resources for their genetic and physical mapping . As a genetically malleable social mammal, well suited to living and breeding in (relatively) modest space, the mouse has become the premier genetic model for humans. A secondary aspect of the publicly funded Human Genome Project that has been immensely valuable to scientists working on model organisms is the policy of rapid data release. This has meant that data could be used prior to formal publication.
Building mouse genome sequence using a human scaffold
One direct way in which the human genome sequence can be used in mouse genomics is as a scaffold to support the anchoring and merging of clone contigs (contiguous assemblies) of large-insert bacterial artificial chromosomes (BACs; Figure 2). Draft or finished human genomic sequence is compared with mouse sequence taken from the ends of BAC inserts . BAC-insert ends showing highly significant similarity to the human genomic sequence are assumed to represent homologous sequences from conserved syntenic segments, where both gene function and gene order are conserved. Clone names for the BAC clones from which these sequences are derived are then used to search the public database of fingerprinted BAC clones (where the restriction digest band pattern on electrophoresis makes a 'fingerprint') constructed by the British Columbia Genome Sequencing Center (BC-GSC) . This can be done in a number of ways: using the text version of the database on the BC-GSC website ; or by downloading the data onto a local computer and searching it using the fingerprinted contigs (FPC) software ; or by using an assembly and annotation website  maintained by the Center for Bioinformatics, University of Pennsylvania.
The version of the contig data generated in the last of these ways is made particularly powerful by the way in which it links together data from several sources: the sequence-tagged site (STS) content of the contigs, plus assemblies of expressed sequence tags (ESTs) in the database of transcribed sequences (DOTS)  and data from radiation hybrid (RH) maps. In some cases, this helps to confirm anchoring of mouse BAC contigs and also to orient the contigs on the mouse chromosome (M. Bucan, personal communication). Furthermore, a refinement of this approach has been used recently to produce a physical map of the whole mouse genome .
Identifying and visualizing conserved sequences
One of the most powerful uses of the human draft sequence again depends on the high level of sequence similarity between mouse and human. It is assumed that sequence elements with the highest similarity are those with critical functions, such as the transcribed and regulatory elements of genes. Evolutionary forces will have actively selected against mutations in these elements, whereas the sequences of non-functional genomic regions will acquire differences to an extent approximately proportional to the time passed since the divergence of the two organisms from a common ancestor. It can therefore be highly informative to take mouse and human sequences from a region of synteny, and to align them and graphically display the degree of sequence similarity (Figure 3) [16,17,18]. Known or predicted gene structures can be overlaid on the alignments displayed by the PipMaker and VISTA programs, allowing identification of novel evolutionarily conserved regions (ECRs) . Such regions may represent coding exons not predicted by conventional methods, regulatory elements, genes expressed at low levels and so not represented in EST databases, or perhaps genes that are transcribed to make non-coding RNAs. A further development of the VISTA package incorporates prediction of transcription-factor binding sites in conserved regions, to aid identification of potential regulatory elements . In some cases, experimental evidence supports the prediction that ECRs are transcribed ( and R.B., unpublished observations) or that they represent regulatory elements . It seems that not all genomic regions acquire mutations at similar rates, however, so some sequences will be conserved as a result of insufficient time to diverge, adding 'noise' to the alignments. One way of improving the discrimination of actively and passively conserved sequences may be to make multiple comparisons with different species .
Turning the idea of comparative analysis on its head, we can use the human genome to find areas of the mouse genome where conservation of gene content and order breaks down. What happens in these evolutionary 'breakpoint' regions? Studies of this kind are still quite rare, but there are suggestions of an emerging consensus. A correlation has been noted between genetic instability and sites that are rich in repetitive elements and may, therefore, be more prone to rearrangement through inappropriate homologous recombination . Transposition events, leading to both insertion and deletion, are also apparent in regions of chromosomal rearrangements [24,25,26]. The evolutionary breakpoint disrupting the conservation of human 19p13.3 with mouse chromosomes 10 and 17, for example, is rich in simple tandem repeats , and repetitive elements were identified in the section of mouse chromosome 10 bridging the junction between conserved syntenic regions of human chromosomes 21 and 22 . Indeed, many of the breaks in conservation of human chromosome 19 relative to mouse chromosomes seem to be associated with localized repetitive elements, such as tandemly repeated gene families . As more extensive mouse genome sequence becomes available, it will be interesting to assess whether the prediction that regions rich in repetitive elements are associated with genome rearrangements holds up.
Finishing the mouse genome?
Finally, we would like to argue the case for producing a finished mouse genomic sequence. When mouse genome sequencing first became a serious endeavor, it was unclear what the quality of the 'product' might be. Some suggested that all that was needed was a low-to-medium coverage whole-genome shotgun (about 3-6-fold sequencing depth), which could be assembled and aligned with the finished human genome. Indeed, unassembled, low-coverage mouse shotgun sequence can be used efficiently to find exons in the human working draft sequence  and is a valuable resource for gene and marker discovery . Lack of long-range contiguity hampers accurate prediction of mouse gene structure, however, and accurate prediction is invaluable for efficient mutation scanning. Studies of two critical classes of mutations would benefit from high-quality, finished sequence: point mutations such as those induced by the supermutagen ethylnitrosourea (ENU) , and mutations responsible for quantitative traits, as it is believed that many of the latter may be found in regulatory elements . Despite the unequivocal utility of a draft human genome sequence [1,2], it is clear that draft sequence has limitations . Even with a finished human genome sequence, interpretation of genome function will be enhanced by access to a second, finished mammalian genome. As the premier genetic model mammal, it makes sense to finish the mouse.
International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al: The sequence of the human genome. Science. 2001, 291: 1304-1351. 10.1126/science.1058040.
Eicher EM, Southard JL, Scriver CR, Glorieux FH: Hypophosphatemia: mouse model for human familial hypophosphatemic (vitamin D-resistant) rickets. Proc Natl Acad Sci USA. 1976, 73: 4667-4671.
Shows TB, Brown JA, Chapman VM: Comparative gene mapping of HPRT, G6PD, and PGK in man, mouse, and muntjac deer. Cytogenet Cell Genet. 1976, 16: 436-439.
Hudson TJ, Church DM, Greenaway S, Nguyen H, Cook A, Steen RG, Van Etten WJ, Strivens MA, Trickett P, Heuston C, et al: A radiation hybrid map of mouse genes. Nat Genet. 2001, 29: 201-205. 10.1038/ng1001-201.
O'Rahilly S: Life without leptin. Nature. 1998, 392: 330-331. 10.1038/32769.
Brown SDM, Steel KP: Deafness (DFN) genes. In Encyclopedia of molecular medicine. New York: John Wiley and sons,.
National Human Genome Research Institute: Understanding Our Genetic Inheritance: The U.S. Human Genome Project. The First Five Years Fiscal Years 1991-1995. [http://www.nhgri.nih.gov/HGP/HGP_goals/5yrplan.html]
TIGR: Mouse BAC Ends. [http://www.tigr.org/tdb/bac_ends/mouse/bac_end_intro.html]
British Columbia Genome Sequence Center: A BAC fingerprint map of the mouse genome. [http://www.bcgsc.bc.ca/projects/mouse_mapping/]
Soderlund C, Humphray S, Dunham A, French L: Contigs built with fingerprints, markers, and FPC V4.7. Genome Res. 2000, 10: 1772-1787. 10.1101/gr.GR-1375R.
Mouse chromosome 5 annotation project. [http://www.cbil.upenn.edu/mouse/chromosome5/fpc-search.php3]
Mouse Ensembl. [http://mouse.ensembl.org]
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W: PipMaker-a web server for aligning two genomic DNA sequences. Genome Res. 2000, 10: 577-586. 10.1101/gr.10.4.577.
Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I: VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000, 16: 1046-1047. 10.1093/bioinformatics/16.11.1046.
Mallon AM, Platzer M, Bate R, Gloeckner G, Botcherby MR, Nordsiek G, Strivens MA, Kioschis P, Dangel A, Cunningham D, et al: Comparative genome sequence analysis of the Bpa/Str region in mouse and man. Genome Res. 2000, 10: 758-775. 10.1101/gr.10.6.758.
Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, Frazer KA: Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science. 2000, 288: 136-140. 10.1126/science.288.5463.136.
Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA: Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 2000, 10: 1304-1306. 10.1101/gr.142200.
Carver EA, Stubbs L: Zooming in on the human-mouse comparative map: genome conservation re-examined on a high-resolution scale. Genome Res. 1997, 7: 1123-1137.
DeBry RW, Seldin MF: Human/mouse homology relationships. Genomics. 1996, 33: 337-351. 10.1006/geno.1996.0209.
Ehrlich J, Sankoff D, Nadeau JH: Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics. 1997, 147: 289-296.
Kamnasaran D, O'Brien PC, Ferguson-Smith MA, Cox DW: Comparative mapping of human chromosome 14q11.2-q13 genes with mouse homologous gene regions. Mamm Genome. 2000, 11: 993-999. 10.1007/s003350010183.
Puttagunta R, Gordon LA, Meyer GE, Kapfhamer D, Lamerdin JE, Kantheti P, Portman KM, Chung WK, Jenne DE, Olsen AS, et al: Comparative maps of human 19p13.3 and mouse chromosome 10 allow identification of sequences at evolutionary breakpoints. Genome Res. 2000, 10: 1369-1380. 10.1101/gr.145200.
Pletcher MT, Roe BA, Chen F, Do T, Do A, Malaj E, Reeves RH: Chromosome evolution: the junction of mammalian chromosomes in the formation of mouse chromosome 10. Genome Res. 2000, 10: 1463-1467. 10.1101/gr.146600.
Dehal P, Predki P, Olsen AS, Kobayashi A, Folta P, Lucas S, Land M, Terry A, Ecale-Zhou CL, Rash S, et al: Human chromosome 19 and related regions in mouse: conservative and lineage-specific evolution. Science. 2001, 293: 104-111. 10.1126/science.1060310.
Bouck JB, Metzker ML, Gibbs RA: Shotgun sample sequence comparisons between mouse and human genomes. Nat Genet. 2000, 25: 31-33. 10.1038/75563.
Brown SD, Nolan PM: Mouse mutagenesis-systematic studies of mammalian gene function. Hum Mol Genet. 1998, 7: 1627-1633. 10.1093/hmg/7.10.1627.
Flint J, Mott R: Finding the molecular basis of quantitative traits: successes and pitfalls. Nat Rev Genet. 2001, 2: 437-445. 10.1038/35076585.
Katsanis N, Worley KC, Lupski JR: An evaluation of the draft human genome sequence. Nat Genet. 2001, 29: 88-91. 10.1038/ng0901-88.
We thank Steve Brown for improvements to the manuscript.