Public mouse sequence published
© BioMed Central Ltd 2002
Published: 5 December 2002
You might think that formal publication of the publicly funded mouse genome would be a bit of an anticlimax. Some of its major findings have been known and scrutinized for months, and the draft has been in the free databases since May. But the papers published in tomorrow's Nature (December 5, 2002) make clear not just how momentous the mouse genome project is, but also offer a glimpse of the many ways it can contribute to human life and knowledge.
The mouse genome sequence has helped fill in many gaps in the human sequence, which will be completed finally next year, said Kirstin Lindblad-Toh, of the Whitehead Institute/MIT Center for Genome Research and lead author of the massive main paper. "Still, for me the greatest satisfaction from the mouse genome comes from the wealth of information one can get about the human genome by comparing the two," she told us.
Take, for example, the three papers on the relationship between human chromosome 21 and mouse equivalents of the DNA in that chromosome. They demonstrate the range of ways the mouse genome sequence will assist with specific questions that have clinical import, as well as more abstract basic inquiries into the nature of genomes. Human chromosome 21 (HSA 21) is small, and its complete sequence was published two years ago. Several of its 200 or so genes are known to be related to human disorders, among them Alzheimer's disease, myopathy, anemia, platelet disorders, deafness and cataracts. Its most visible pathology is probably Down syndrome, which results from trisomy of the chromosome.
Examining expression patterns of about 160 mouse orthologs of HSA 21 genes in 12 adult tissues (including brain) and at 6 developmental stages, researchers in Europe put together an "atlas" of gene expression for the chromosome. The atlas took about 18 months, Andrea Ballabio told us. Ballabio, of the Telethon Institute of Genetics and Medicine in Naples, Italy, is senior author of one of two papers describing the atlas. The authors have already zeroed in on several HSA 21 genes active in cardiac and gut tissue that could figure in abnormalities of the heart and gastrointestinal tract that often accompany Down syndrome. They also have hypotheses about the function of several other genes on HSA 21.
"We have here established the first expression map for a virtually complete chromosome," Marie-Laure Yaspo told us. Yaspo, of the Max-Planck Institute for Molecular Genetics in Berlin, is senior author of the second paper on the chromosome 21 atlas. That paper offers a brain map of human chromosome 21 genes in the mouse, and a map showing which genes are expressed in early development. The collection of HSA 21 orthologs was started two years ago and would have gone faster if the mouse sequence had been available at that time, Yaspo said. In addition to suggesting candidate genes, the work is expected to aid the design of better models for Down syndrome, "including engineering mouse models carrying multiple genes at gene dosage imbalance."
Ballabio agreed that functional analysis of candidate genes on chromosome 21 and overexpression of both single genes and regions containing multiple genes in transgenic mice are crucial next steps. A further challenge, Ballabio said, will be finding additional mouse genes that are not on human chromosome 21 but still display expression abnormalities as a consequence of the triple dosage of chromosome 21 genes in Down syndrome.
The third paper on chromosome 21 has a lot to say about the increasingly elevated status of non-coding DNA. Once dismissed as junk, noncoding regions are now often regarded as treasured antiques, tended carefully by evolution and handed down through mammalian generations. Swiss researchers, led by Stylianos Antonarakis of the University of Geneva, showed that HSA 21 and its mouse equivalents are larded with blocks of DNA that, although similar in both species, are not genes. The authors offer half a dozen regulatory and structural suggestions about what that DNA is doing, ranging from alternatively spliced exons to non-functional regions with an exceptionally low mutation rate.
The mouse genome means big changes not just in researchers' mindsets, but in research infrastructure. Some of these are easy to foresee. For example, Allan Bradley of the Wellcome Trust Sanger Institutein Cambridge, UK, points out in his commentary on the papers that we can expect an explosion of mouse mutants, all of which will need maintaining and archiving for decades.
The world's best-known mouse house, the Jackson Laboratory in Bar Harbor, Maine, says it is mindful of the challenge. The laboratory is evaluating new ways of making mice that increase efficiency and lower costs, including a frozen embryo program that is in pilot stages, according to Warren Cook, president of JAX Research Systems, the production and services division. "Overall production space will expand, though not all in Bar Harbor - we're looking at expansion on the east and west coasts for mice and services, along the lines of our facilities at the University of California, Davis, and in West Sacramento," he told us. "The scientific community is also looking for more support on the technology side - embryo freezing is one example, phenotyping another - so this whole resource science area will need and receive a lot of attention." JAX volume is growing at about 10% every year, he reported, and it is trying to anticipate demand by at least a year.
The final version of the mouse sequence is expected to take another two years. But now that the public databases of mouse sequence information are relatively complete, is there any incentive for researchers to use the privately held mouse database put together by Celera Genomics and now administered by its partner Applied Biosystems of Foster City, California? For some scientists, the answer seems to be yes - or at least maybe.
"More data is always better, and mistakes will be found in both data sets. People who cannot find the information they need in the one data set will want access to the other one," Lincoln Stein of Cold Spring Harbor Laboratory told us. "Celera's data is not as valuable as it would be if they had a monopoly on it, but it is not completely worthless."
Michael Zhang and his colleagues, also at Cold Spring Harbor, have demonstrated those points via a direct comparison between the Celera and public mouse databases. They found that the two new assemblies released last May differ in about 10% of the mouse genome. The Celera assembly, they say, has higher accuracy in base pairs and overall higher coverage of the genome. But the public assembly has higher sequence quality in newly finished BAC regions. It also has one unbeatable advantage: it's free. The paper is published December 5 by Genome Biology.
Zhang told us that, although Celera's current database has a slight edge in coverage and accuracy, the public database is closing the gap fast. It is much better than he had thought it would be, and is already superior in the finished BAC regions, he said. In addition, as the public mouse genome project folks keep pointing out, any Celera advantage springs in part from the fact that it includes not just the company's data, but the free public mouse data sets too.
Applied Biosystems argues that its database, which it calls the Celera Discovery System (CDS), has other advantages. Spokesperson Lori Murray pointed to data from more than one mouse strain (compared with the public mouse project, which covers only one, Black 6) plus SNPs and human–mouse orthologs for comparison. The company, she said, is planning a new web portal that will help researchers design and analyze experiments, and in January will also offer a new pricing plan. "The new pricing model will be designed to make subscriptions to CDS more accessible for researchers, including existing customers," Murray said. Details are not yet available.
For some researchers, analytical bells and whistles, and even price, are all irrelevant. The real sticking point is that the CDS databases are proprietary. Sean Eddy, of Washington University St. Louis, told us there never was an incentive for him to subscribe, because he needs open access to genome data in order to disseminate the results of his large-scale bioinformatics analyses.
"Celera's contracts understandably block open publication of whole genome analyses that would disclose significant parts of their proprietary data. Only their mouse chromosome 16 assembly, which they published in Science and deposited in Genbank, 'exists' in my world," he said. "The Celera data access model has always been incompatible with the way we do our work."
- Moore P, Mining the mouse, The Scientist, May 17, 2002., [http://www.the-scientist.com/news/20010517/04/]
- Moore P, Mouse control, The Scientist, May 8, 2002., [http://www.the-scientist.com/news/20010508/03/]
- Whitehead Institute/MIT Center for Genome Research, [http://www-genome.wi.mit.edu/]
- The mouse genome, [http://genomebiology.com/researchnews/default.asp?arx_id=gb-spotlight-20021205-02]
- Weitzman J, Chromosome 21 sequenced, Genome Biology 2000, 1:reports0058., [http://genomebiology.com/2000/1/2/reports/0058]
- Telethon Institute of Genetics and Medicine, [http://www.tigem.it/]
- Max-Planck Institute for Molecular Genetics, [http://www.molgen.mpg.de/]
- University of Geneva, [http://www.unige.ch/]
- Wellcome Trust Sanger Institute, [http://www.sanger.ac.uk/]
- Jackson Laboratory, [http://www.jax.org/]
- Celera Genomics, [http://www.celera.com/]
- Applied Biosystems, [http://home.appliedbiosystems.com/]
- Cold Spring Harbor Laboratory, [http://www.cshl.org/]
- Z. Xuan et al, "Computational comparison of two mouse draft genomes and the human golden path," Genome Biology 2002, 4:R1., [http://genomebiology.com/2002/4/1/R1]
- Washington University St. Louis, [http://www.wustl.edu/]