Integrating computationally assembled mouse transcript sequences with the Mouse Genome Informatics (MGI) database
© Zhu et al; licensee BioMed Central Ltd. 2003
Received: 9 October 2002
Accepted: 19 December 2002
Published: 3 February 2003
Databases of experimentally generated and computationally derived transcript sequences are valuable resources for genome analysis and annotation. The utility of such databases is enhanced when the sequences they contain are integrated with such biological information as genomic location, gene function, gene expression and phenotypic variation. We present the analysis and results of a semi-automated process of connecting transcript assemblies with highly curated biological information for mouse genes that is available through the Mouse Genome Informatics (MGI) database.
The volume and diversity of expressed sequence tag (EST) data in the public databases makes them an important resource for gene identification, genome annotation and comparative genomics. The value of EST data is enhanced when the sequences are clustered (on the basis of sequence overlap) to reduce redundancy. In some cases these sequence clusters can be used to generate a consensus sequence that represents a virtual transcript. Examples of electronic transcript data resources include UniGene , TIGR Gene Indices [2,3], DoTS  and STACK . Each of these resources differs in the methods used to reduce the redundancy in EST sequence data and in how the data are represented. For example, UniGene uses pairwise sequence comparisons to group and partition EST and other transcript sequences from GenBank into gene-orientated clusters with no consensus sequence. The other three resources (TIGR Gene Indices, DoTS, and STACK) cluster sequences from ESTs and known transcripts and then assemble the members of each cluster to produce a consensus representation of the transcripts. The algorithms and/or parameters used to guide the clustering and assembly process for the virtual transcript resources are similar, but not identical. These resources also differ with respect to the number of species for which EST assemblies are available. For example, the STACK database includes only human sequence data, DoTS has both human and mouse assemblies, and TIGR Gene Indices maintains separate electronic transcript databases for over 50 species. In contrast to computational approaches to transcript analysis and representation, the Mammalian Gene Collection (MGC)  and the RIKEN Mouse Encyclopedia projects  are systematically generating full-length cDNA clones with the aim of having at least one full-length clone reagent and sequence for every human (MGC) and mouse (MGC, RIKEN) gene.
Selected database content statistics for the MGI information resource
Genetic markers mapped
Curated mouse/human orthologs
Genes with molecular probes and segments data
Number of genetic markers with molecular polymorphisms
Number of genes with molecular polymorphisms
MGI markers with GenBank sequence associations
Genes with SwissProt-TrEMBL protein sequences
The utility of both experimentally and computationally derived transcript resources are greatly enhanced when the transcripts are associated with well curated biological knowledge about the genes with which the transcripts are associated . However, manual curation of computationally derived transcript data is not feasible because the underlying data for these resources are constantly changing. Therefore, we have developed a semi-automated curation process to create and update associations between constantly changing electronic transcript databases and the genes represented in the MGI database. Associations are based on GenBank sequence accession identifiers shared between MGI genes and transcript clusters/assemblies. Although associations between the genes in MGI and the electronic transcripts could also be made on the basis of sequence similarity, the use of shared accession ids is faster and avoids inconsistencies in sequence-to-gene associations that arise from highly similar sequence among members of multigene families.
Transitive associations between MGI genes and assembled transcripts
We first generated a set of GenBank sequence accession identifiers that have trusted associations (that is they have been manually curated) with mouse genes represented in MGI. A daily report of the associations of MGI markers and GenBank sequences, MRK_Sequence.rpt, is available from the MGI public FTP site . We removed sequences associated with more than one gene object in MGI (for example, large cloned inserts containing multiple genes) to avoid confounding multiple genes to sequence associations. After this filtering step, the relationships of MGI genes to GenBank sequences were maintained in a dictionary data structure, MGI-GB, with MGI accession numbers as keys and GenBank accession identifiers as values.
A second dictionary, GB-MGI (GB as keys and MGI as values), was generated by reversing the keys and values of MGI-GB. A report with all DT identifiers and their constituent GenBank sequences accession identifiers, musDoTS_rel5_accessionsPerAssembly.dat.gz file (Release 5.0, 19 August 2002), was downloaded from CBIL's website .
A report containing TIGR Mouse Gene Index TC identifiers and their constituent GenBank sequence identifiers was generated from TIGR Mouse Gene Index Release 9.0 (October 1, 2002) (Geo Pertea, personal communication). This report was used to create two dictionaries, TC-GB (TC as keys and GB as values) and GB-TC (GB as keys and TC as values) to map TCs to GenBank sequence accession ids. There were many more GenBank sequences (mostly ESTs) in the TIGR Mouse Gene Index than in the MGI database because the TIGR Mouse Gene Index was built on all available GenBank sequences, and MGI curated only a subset of them (mostly mRNA and RefSeq sequences). Only GenBank sequence accession identifiers that appear in the MGI database were retained in TC-GB and GB-TC because those sequences will bridge the transitive associations between MGI genes and TCs. Sequences associated with more than one TC were removed.
Statistics of associations between MGI genes and transcript assemblies
Sequences used to build TCs and DTs
Sequences included in the assemblies (excluding singletons)
Assemblies (excluding singletons)
GenBank sequences shared by MGI markers and assemblies
MGI genes linked to assemblies through GenBank sequences
Assemblies linked to MGI genes through GenBank sequences
Two other dictionaries, TC-GB and GB-MGI, were used to link TCs to MGI genes in the same way. A dictionary, TC-MGI-via-GB (TC as keys and MGI as values), was used to maintain links from TCs to MGI genes and the supporting GenBank sequences. Figure 2b shows examples of links from TCs to MGI genes with the supporting GenBank sequences. 19.8% (20,942 out of 105,520) of TCs (excluding singletons) were linked to one or more MGI genes.
The same approaches were also used to associate MGI genetic markers to DoTS DTs. A report with all DT identifiers and their constituent GenBank sequence accession identifiers was downloaded from CBIL's website (Release 5.0, 19 August, 2002). The musDoTS_1-7-02_containedIds.dat.gz file can be downloaded from this site . The report lists both DoTS assemblies (excluding singletons) and singletons. We included only assemblies in our analysis. Statistics of associations of MGI markers and DTs are shown in Table 2. The analysis linked 83.5% (24,340 out of 29,144) of MGI markers with sequence information to 20.1% (25,799 out of 128,341) of DTs. It is not surprising that only about 20% of DTs or TCs can be associated with genes in MGI because the majority of the assemblies are composed solely of EST sequences and the MGI curation processes focus primarily on collecting and curating associations with genomic and mRNA sequence data. There are a total of 26,440 DTs (including singletons) with mRNA sequences and 20,908 of them have MGI associations. The remaining 5,532 DTs with mRNA sequences might represent alternative transcripts of MGI genes, known genes not yet represented by MGI, or novel genes. We will evaluate the component sequences of these DTs and incorporate them into MGI database over time through manual curation.
Classification of the relationships between MGI genes and transcript assemblies
Classification of associations between MGI genes and both DT and TC gene indices
One-to-one MGI gene to assembly
One-to-many MGI gene to assembly*
Many-to-one MGI gene to assembly†
Many-to-many MGI gene to assembly‡
The majority of the associations were one-to-one relationships: 16,996 MGI-to-DT and 13,451 MGI-to-TC. Among these, a large number of MGI genes (9,509 in MGI-to-DT and 5,742 in MGI-to-TC associations) have only single GenBank sequence. The remaining MGI genes in one-to-one category have two or more GenBank sequences associated with them. The one-to-one associations between MGI genes and TCs/DTs suggests that these genes have only one form of transcript or that the data needed to detect transcript variants is not yet available in public databases.
One-to-many associations between genes and transcripts are related to transcript diversity
The TIGR Mouse Gene Index and DoTS databases are transcript orientated. That is, the sequence clustering and assembly process seeks to generate distinct assemblies for every form of transcript. The MGI database is gene-centric and associates transcripts from the same locus to a single gene object in the database. Therefore, in many cases, there are multiple TCs/DTs associated with a single gene in the MGI database. The average numbers of DTs/TCs per MGI gene among the one-to-many associations were 2.29 and 2.24, respectively.
Multiple TCs or DTs could also represent products of transcription from alternative promoters and/or polyadenylation sites of a single gene. For example, Dtna (dystrobrevin alpha) in the MGI database (MGI:106039) was associated with five TCs (TC546133, TC569762, TC577975, TC590157, TC633947) and with five DTs (DT.50316348, DT.60101497, DT.87050693, DT.91340878 and DT.91393353). The Dtna gene has three promoters that are active in tissue-specific manner . Figure 3b clearly shows multiple transcripts from three promoters of Dtna gene on mouse chromosome 18. Experimental results suggested that Ncam1 contains more than one poly(A) addition site and produces transcripts with different 3' untranslated regions . Figure 3a demonstrates that multiple virtual transcripts with different 3' ends align the Ncam1 gene on mouse chromosome 9.
Another explanation for one-to-many MGI gene to DTs/TCs associations is multiple site-specific recombination or DNA rearrangement that occurs normally in certain cell types. For example, the Igh-VS107 (immunoglobulin heavy chain (S107 family)) locus in MGI (MGI: 96490) was associated with four TCs (TC632874, TC632875, TC632877 and TC643641) and with three DTs (DT.94166135, DT.94209475 and DT.94398318). All above TCs/DTs and sequences associated with Igh-VS107 were mapped to the same locus on chromosome 12 using the UCSC genome browser (data not shown). The sequence differences of Igh-VS107 transcripts are readily explained by normal DNA rearrangements (V(D)J recombination) . Another example is that H2-Eb1 (histocompatibility 2, class II antigen E beta) in the MGI database (MGI:95901) was associated with four TCs (TC575977, TC608775, TC638140 and TC640785) and with two DTs (DT.493389 and DT.55100612). The H2-Eb1 gene contains a recombination hotspot, which has a predominant role in generating different recombinants through meiotic crossing-over within the I region of the mouse major histocompatibility complex (MHC) . All above TCs/DTs except TC575977 and DT.55100612 were mapped to annotated H2-Eb1 gene on chromosome 17 using the UCSC genome browser (data not shown). TC575977 and DT.55100612 were associated to H2-Eb1 through one GenBank sequence AK012147, which was mapped to chromosome 5. Further analysis indicated that AK012147 was incorrectly associated with H2-Eb1.
Non-biological explanations may also explain some of the one-to-many associations among genes and transcripts in our analysis. For example, low-quality sequence data or problematic sequences that are not filtered out before clustering and assembly can cause errors in the sequence assemblies. Another possible explanation is that nucleotide and protein sequences are occasionally associated with the wrong gene in MGI. This will always be a challenge to the database community when both completeness (including as much data as possible in the database) and accuracy (associating every sequence to the right gene) are goals. Fortunately, non-biological reasons for one-to-many associations between transcripts and genes only account for a small percentage of the whole dataset based on our experience of ongoing internal quality control and manual check of portions of the data in this analysis.
The many-to-one associations between genes and transcripts are evidence for over-clustering or gene redundancy in MGI
In the analysis reported here, 14.8% (4,302 out of 29,144) of MGI genes with sequence information were involved in many-to-one gene to TC transcript associations and 12.4% (3,621 out 29,144) in many-to-one gene to DT transcript associations. The average numbers of MGI genes per DT/TC were 2.16 and 2.23, respectively. First, some of the many-to-one associations are due to sequence clusters that contain mistakenly grouped similar sequences from closely related genes (paralogs). For example, sequences from 14 members of the defensin-related cryptdin gene family were clustered into one TC (TC6H932) and to one DT (DT. 94272645). These genes share similar structure and sequence. Their mRNAs are distinguished by a 45-nucleotide 5' untranslated sequence (UTS) encoded completely by the first exon . Second, genes adjacent to each other in the same chromosomal location were occasionally clustered together and assembled into one sequence because their transcripts overlap each other. For example, the 3' end of Stk11 (serine/threonine kinase 11; MGI:1341870) is in very close proximity to the 3' end of a functionally unrelated gene dos (downstream of Stk11; MGI: 1354170) and it seems that overlapping transcripts of the two genes are produced . Both Stk11 and Dos were linked to one DT (DT.493186) because sequences associated with both genes were clustered and assembled together. Third, there are rare cases of polycistronic transcripts in mammalian genomes. For example, Snrpn (small nuclear ribonucleoprotein N; MGI: 98347) and Snurf (SNRPN upstream reading frame; MGI: 1891236) are expressed as bicistronic Snurf-Snrpn transcript  and both of them were associated with one single TC (TC619385) and one single DT (DT.535946) in our analysis. Finally, many-to-one associations can be caused by uncorrected gene redundancy (one gene represented by multiple entries) in the MGI database. The majority of the redundant records are the result of genes in MGI that are represented solely by EST sequences. As these redundancies are identified in MGI they are corrected.
Many-to-many associations between genes and transcripts could be the result of any combination of many-to-one and one-to-many associations
In the analysis reported here, we had 531 MGI-DT and 454 MGI-TC many-to-many associations. There were 4.1% (1,202 out of 29,144) of MGI genes with sequence information and 1,355 DTs involved in many-to-many MGI gene to DT transcript associations and 3.6% (1,044 out 29,144) of MGI genes with sequence information and 1,127 TCs in many-to-many MGI gene to TC transcript associations. The average numbers of MGI genes per DT/TC were 2.26 and 2.30, respectively, and the average numbers of DTs/TCs per MGI gene were 2.55 and 2.48, respectively. The majority of the many-to-many associations had only two MGI genes and two DTs/TCs. The group with the largest number of MGI genes in MGI-DT associations included eight paired-Ig-like receptor A genes (Pira1, Pira2, Pira3, Pira4, Pira5, Pira7, Pira10, Pira11) and two DTs (DT.87053023 and DT. 94272531). DNA blot analysis indicated the presence of multiple paired-Ig-like receptor A genes in the genome, and cDNA sequencing analysis suggested 0.2-4.7% frequency of overall nucleotide variations . The group with the largest number of MGI genes in MGI-TC associations included six eosinophil-associated ribonuclease (Ear1, Ear2, Ear3, Ear8, Ear9 and Ear10) and RNA guanylyltransferase and 5'-phosphatase (Rngtt) and four TCs (TC561767, TC569331, TC557280 and TC557281). The mouse Ear family has at least 13 members, 11 functional genes and 2 pseudogenes . The genes within this family share a common genomic structure that is conserved with primate Ear genes. The mouse Ear gene family forms four unique clades (Ear1/2/3/8/9/10 genes form subfamily A). The members of each clade share a high degree of sequence identity. Transcripts from Ear1/2/3/8/9/10 were over-clustered into one TC (TC561767). TC569331 was associated with Ear2 because of shared EST sequence AA510162 and associated with Rngtt because of shared GenBank sequence AK002922. Sequence analysis indicated that AA510162 encodes Rngtt instead of Ear2. Further analysis suggested that a typographical error in the publication  caused the reported EST sequence associated with Ear2 to be AA510162 instead of AA510161. This many-to-many association can be resolved into one many-to-one association (six Ear genes to TC561767) caused by over-clustering and one one-to-many association (one MGI gene Rngtt to three TCs) caused by transcript diversity. The group with the largest number of DTs in MGI-DT associations included 18 DTs and three immunoglobin heavy-chain genes (Igh-4, Igh-VJ558 and Igh-V). The group with the largest number of TCs in MGI-TC associations included 25 TCs and three immunoglobin heavy-chain genes (Igh-4, Igh-VJ558 and Igh-1). The complexity of the many-to-many associations demonstrates the challenges of creating links between genes and electronic transcripts and highlights the caveats that users of these resources must keep in mind.
Comparison of DoTS and TIGR Mouse Gene Index
For the one-to-one MGI-DT and MGI-TC associations, the number of shared sequences between DT and TC pairs linked to the same MGI gene varies
Comparison of the constituent sequences of TCs and DTs
DT and TC pairs analyzed*
DT and TC that have the same constituent sequences†
DT is a subset of TC†
TC is a subset of DT†
DT and TC assemblies that share one sequence
DT and TC assemblies that share 2-4 sequences
DT and TC assemblies that share 5-9 sequences
DT and TC assemblies that share 10-99 sequences
DT and TC assemblies that share 100 or more sequences
DT and TC assemblies that share zero sequence
For the one-to-many MGI-DT and MGI-TC associations, DoTS and TIGR Mouse Gene Index did not consistently cluster the GenBank sequences
There are 2,522 MGI genes associated with multiple DTs, and 1,975 MGI genes with multiple TCs. And 1,475 MGI genes had both MGI-to-TC and MGI-to-DT one-to-many associations. We considered all TCs or DTs associated with the same MGI genes as different forms of transcripts and grouped them together. We compared the identity and grouping of the component sequences between the TC group and its corresponding DT group. We included only the sequences curated in the MGI database in the comparison because they are mostly high-quality mRNA sequences and should be reliably clustered. There were only 245 pairs of the TC group and DT group associated with the same set of MGI curated GenBank sequences, which were also clustered in the same way. The remaining pairs differ either in their set of associated GenBank sequences or in the way of sequence clustering.
The differences between the two electronic transcript databases are likely to be due to the different criteria used by the two groups for clustering and assembly of EST and mRNA sequences. One possibility is different degrees of trimming poor-quality sequences from the ends of ESTs (C.J.S., personal communication). Less trimming in DoTS build might result in more assemblies than TIGR TCs. In testing, fewer larger assemblies were generated when trimming was not limited. Limited trimming was chosen in attempt to preserve better representation of differentially processed transcripts in DoTS build. The comparison of the two databases using curated data from MGI as a reference provides some measures to evaluate and improve computational methods.
Utility of the analysis
The association of MGI genes with electronic transcript assemblies supplies biological context to the computationally assembled transcripts and allows researchers to access these data from biological as well as sequence perspectives. The curation process described here permits us to rapidly build high-confidence associations between MGI genes and electronic transcript sequences. The results reveal the complications that can arise from the clustering process as well as errors in the MGI database. The assessment of the results will provide measures to evaluate and improve the EST-assembly protocols and to check the quality of gene representation in the MGI database.
Access to the links between the MGI database and the TIGR and CBIL electronic transcript databases
Only associations between MGI genes and TCs/DTs that are supported by non-conflicting evidence (one-to-one and one-to-many associations) are accessible from the web browsers for these resources. The links from MGI genes to TCs and DTs are available from the MGI gene detail pages. The links from TCs to MGI genes are available from TIGR's TC report page and through another TIGR database resource, RESOURCERER . The links from DTs to MGI genes are available from Allgenes's DT report page . Users can query for related DTs by MGI gene accession identifiers or symbols. The data files for MGI-DT/TC associations are available from MGI public FTP site . These data will be updated after each build of TIGR's Mouse Gene Index and CBIL's DoTS database or after every major change in MGI databases.
Additional data files
The original datasets from TIGR (from TIGR Mouse Gene Index Release 9.0 (1 October 2002)), DoTS (from DoTS mouse assembly Release 5.0 (19 August 2002)) and MGI (11 October 2002) are available as additional data files. Links from MGI to the DOTS (one-to-one, one-to-many, many-to-one and many-to-many) and TIGR (one-to-one, one-to-many, many-to-one and many-to-many) electronic transcript associations from the analysis done on 11 October 2002 are also available as additional data files. The most recent data files for MGI-DT/TC associations can be obtained from MGI public FTP site .
This work was supported by the Department of Energy (FG02-99ER62850) and NIH/NHGRI (HG0030-P1). The comments of Martin Ringwald, Molly Bogue, and Dong (Donnie) Qi helped to improve the clarity of the manuscript. The authors thank Jim Kadin and David Miers (MGI Software Group) and Geo Pertea (TIGR) for their technical assistance.
- Boguski MS, Schuler GD: ESTablishing a human transcript map. Nat Genet. 1995, 10: 369-371.PubMedView ArticleGoogle Scholar
- Quackenbush J, Liang F, Holt I, Pertea G, Upton J: The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 2000, 28: 141-145. 10.1093/nar/28.1.141.PubMedPubMed CentralView ArticleGoogle Scholar
- Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J: The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 2001, 29: 159-164. 10.1093/nar/29.1.159.PubMedPubMed CentralView ArticleGoogle Scholar
- DoTS: a database of transcribed sequences for human and mouse genes. [http://www.cbil.upenn.edu/downloads/DoTS]
- Christoffels A, van Gelder A, Greyling G, Miller R, Hide T, Hide W: STACK: sequence tag alignment and consensus knowledge-base. Nucleic Acids Res. 2001, 29: 234-238. 10.1093/nar/29.1.234.PubMedPubMed CentralView ArticleGoogle Scholar
- The Mammalian Gene Collection. [http://mgc.nci.nih.gov]
- Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara A, Fukunishi Y, Konno H, et al: Functional annotation of a full-length mouse cDNA collection. Nature. 2001, 409: 685-690. 10.1038/35055500.PubMedView ArticleGoogle Scholar
- Mouse Genome Informatics. [http://www.informatics.jax.org]
- Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT: The Mouse Genome Database (MGD): the model organism database for the laboratory mouse. Nucleic Acids Res. 2002, 30: 113-115. 10.1093/nar/30.1.113.PubMedPubMed CentralView ArticleGoogle Scholar
- Ringwald M, Eppig JT, Begley DA, Corradi JP, McCright IJ, Hayamizu TF, Hill DP, Kadin JA, Richardson JE: The Mouse Gene Expression Database (GXD). Nucleic Acids Res. 2001, 29: 98-101. 10.1093/nar/29.1.98.PubMedPubMed CentralView ArticleGoogle Scholar
- Bult CJ, Richardson JE, Blake JA, Kadin JA, Ringwald M, Eppig JT, the Mouse Genome Informatics Staff: MGI: Mouse Genome Informatics in a New Age of Biological Inquiry. In Proceedings of the IEEE International Symposium on BioInformatics and Biomedical Engineering. 2000, Los Alamitos, CA: IEEE Computer Society, 29-32.View ArticleGoogle Scholar
- Naf D, Krupke DM, Sundberg JP, Eppig JT, Bult CJ: The Mouse Tumor Biology Database: a public resource for cancer genetics and pathology of the mouse. Cancer Res. 2002, 62: 1235-1240.PubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- SWISS-PROT. [http://www.ebi.ac.uk/swissprot]
- Maglott DR, Katz KS, Sicotte H, Pruitt KD: NCBI's LocusLink and RefSeq. Nucleic Acids Res. 2000, 28: 126-128. 10.1093/nar/28.1.126.PubMedPubMed CentralView ArticleGoogle Scholar
- Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2001, 29: 137-140. 10.1093/nar/29.1.137.PubMedPubMed CentralView ArticleGoogle Scholar
- MGI Data and Statistical reports. [ftp://ftp.informatics.jax.org/pub/reports/index.html]
- DoTS download. [http://www.cbil.upenn.edu/downloads/DoTS/release_5]
- Barbas JA, Chaix JC, Steinmetz M, Goridis C: Differential splicing and alternative polyadenylation generates distinct NCAM transcripts and proteins in the mouse. EMBO J. 1988, 7: 625-632.PubMedPubMed CentralGoogle Scholar
- Santoni MJ, Barthels D, Vopper G, Boned A, Goridis C, Wille W: Differential exon usage involving an unusual splicing mechanism generates at least eight types of NCAM cDNA in mouse brain. EMBO J. 1989, 8: 385-392.PubMedPubMed CentralGoogle Scholar
- UCSC Genome Browser-Mouse Genome Assembly February, 2002. [http://genome.ucsc.edu/]
- Holzfeind PJ, Ambrose HJ, Newey SE, Nawrotzki RA, Blake DJ, Davies KE: Tissue-selective expression of alpha-dystrobrevin is determined by multiple promoters. J Biol Chem. 1999, 274: 6250-6258. 10.1074/jbc.274.10.6250.PubMedView ArticleGoogle Scholar
- Ferguson SE, Rudikoff S, Osborne BA: Interaction and sequence diversity among T15 VH genes in CBA/J mice. J Exp Med. 1988, 168: 1339-1349.PubMedView ArticleGoogle Scholar
- Zimmerer EJ, Passmore HC: Structural and genetic properties of the Eb recombinational hotspot in the mouse. Immunogenetics. 1991, 33: 132-140.PubMedView ArticleGoogle Scholar
- Huttner KM, Selsted ME, Ouellette AJ: Structure and diversity of the murine cryptdin gene family. Genomics. 1994, 19: 448-453. 10.1006/geno.1994.1093.PubMedView ArticleGoogle Scholar
- Smith DP, Spicer J, Smith A, Swift S, Ashworth A: The mouse Peutz-Jeghers syndrome gene Lkb1 encodes a nuclear protein kinase. Hum Mol Genet. 1999, 8: 1479-1485. 10.1093/hmg/8.8.1479.PubMedView ArticleGoogle Scholar
- Gray TA, Saitoh S, Nicholls RD: An imprinted, mammalian bicistronic transcript encodes two independent proteins. Proc Natl Acad Sci USA. 1999, 96: 5616-5621. 10.1073/pnas.96.10.5616.PubMedPubMed CentralView ArticleGoogle Scholar
- Kubagawa H, Burrows PD, Cooper MD: A novel pair of immunoglobulin-like receptors expressed by B cells and myeloid cells. Proc Natl Acad Sci USA. 1997, 94: 5261-5266. 10.1073/pnas.94.10.5261.PubMedPubMed CentralView ArticleGoogle Scholar
- Cormier SA, Larson KA, Yuan S, Mitchell TL, Lindenberger K, Carrigan P, Lee NA, Lee JJ: Mouse eosinophil-associated ribonucleases: a unique subfamily expressed during hematopoiesis. Mamm Genome. 2001, 12: 352-361. 10.1007/s003350020007.PubMedView ArticleGoogle Scholar
- McDevitt AL, Deming MS, Rosenberg HF, Dyer KD: Gene structure and enzymatic activity of mouse eosinophil-associated ribonuclease 2. Gene. 2001, 267: 23-30. 10.1016/S0378-1119(01)00392-4.PubMedView ArticleGoogle Scholar
- RESOURCERER 5.0. [http://pga.tigr.org/tigr-scripts/magic/r1.pl]
- AllGenes. [http://www.allgenes.org]
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.