Mutation discovery in mice by whole exome sequencing
- Heather Fairfield1,
- Griffith J Gilbert1,
- Mary Barter1,
- Rebecca R Corrigan2,
- Michelle Curtain1,
- Yueming Ding3,
- Mark D'Ascenzo4,
- Daniel J Gerhardt4,
- Chao He5,
- Wenhui Huang6,
- Todd Richmond4,
- Lucy Rowe1,
- Frank J Probst2,
- David E Bergstrom1,
- Stephen A Murray1,
- Carol Bult1,
- Joel Richardson1,
- Benjamin T Kile7,
- Ivo Gut8,
- Jorg Hager8,
- Snaevar Sigurdsson9,
- Evan Mauceli9,
- Federica Di Palma9,
- Kerstin Lindblad-Toh9,
- Michael L Cunningham10,
- Timothy C Cox10,
- Monica J Justice2,
- Mona S Spector5,
- Scott W Lowe5,
- Thomas Albert4,
- Leah Rae Donahue1,
- Jeffrey Jeddeloh4,
- Jay Shendure10 and
- Laura G Reinholdt1Email author
© Fairfield et al.; licensee BioMed Central Ltd. 2011
Received: 27 May 2011
Accepted: 14 September 2011
Published: 14 September 2011
We report the development and optimization of reagents for in-solution, hybridization-based capture of the mouse exome. By validating this approach in a multiple inbred strains and in novel mutant strains, we show that whole exome sequencing is a robust approach for discovery of putative mutations, irrespective of strain background. We found strong candidate mutations for the majority of mutant exomes sequenced, including new models of orofacial clefting, urogenital dysmorphology, kyphosis and autoimmune hepatitis.
Phenotype-driven approaches in model organisms, including spontaneous mutation discovery, standard N-ethyl-N-nitrosourea (ENU) mutagenesis screens, sensitized screens and modifier screens, are established approaches in functional genomics for the discovery of novel genes and/or novel gene functions. As over 90% of mouse genes have an ortholog in the human genome , the identification of causative mutations in mice with clinical phenotypes can directly lead to the discovery of human disease genes. However, mouse mutants with clinically relevant phenotypes are not maximally useful as disease models until the underlying causative mutation is identified. Until recently, the gene discovery process in mice has been straightforward, but greatly hindered by the time and expense incurred by high-resolution recombination mapping. Now, the widespread availability of massively parallel sequencing  has brought about a paradigm shift in forward genetics by closing the gap between phenotype and genotype.
Both selective sequencing and whole genome sequencing are robust methods for mutation discovery in the mouse genome [3–5]. Nonetheless, the sequencing and analysis of whole mammalian genomes remains computationally burdensome and expensive for many laboratories. Targeted sequencing approaches are less expensive and the data are accordingly more manageable, but this technique requires substantial genetic mapping and the design and purchase of custom capture tools (that is, arrays or probe pools) . Targeted sequencing of the coding portion of the genome, the 'exome', provides an opportunity to sequence mouse mutants with minimal mapping data and alleviates the need for a custom array/probe pool for each mutant. This approach, proven to be highly effective for the discovery of coding mutations underlying single gene disorders in humans [6–12], is particularly relevant to large mutant collections, where high-throughput gene discovery methods are desirable.
Currently, there are nearly 5,000 spontaneous and induced mouse mutant alleles with clinically relevant phenotypes catalogued in the Mouse Genome Informatics database . The molecular basis of the lesions underlying two-thirds of these phenotypes is currently unknown. For the remaining one-third that have been characterized, the Mouse Genome Informatics database indicates that 92% occur in coding sequence or are within 20 bp of intron/exon boundaries, regions that are purposefully covered by exome targeted re-sequencing. While this estimate is impacted by an unknown degree of ascertainment bias (since coding or splice site mutations are easier to find and hence reported and since many uncharacterized mutations remain so because they are understudied), we anticipated that exome sequencing would still be likely to capture a considerable percentage of spontaneous and induced mouse mutations. Therefore, to significantly reduce the time, effort, and cost of forward genetic screens, we developed a sequence capture probe pool representing the mouse exome. Here, we describe the utility of this tool for exome sequencing in both wild-type inbred and mutant strain backgrounds, and demonstrate success in discovering both spontaneous and induced mutations.
Results and discussion
Mouse exome content and capture probe design
The coding sequence selected for the mouse exome probe pool design includes 203,225 exonic regions, including microRNAs, and collectively comprises over 54.3 Mb of target sequence (C57BL/6J, NCBI37/mm9). The design was based on a unified, Mouse Genome Database-curated gene set, consisting of non-redundant gene predictions from the National Center for Biotechnology Information (NCBI), Ensembl and The Vertebrate Genome Annotation (VEGA) database . The gene list is available at . To manage the size of the probe pool and to avoid non-uniquely mappable regions, we excluded olfactory receptors and pseudogenes from the target sequence. In cases where an exon contained both UTR and coding sequence, the UTR sequence was included in the design. Two DNA probe pools, alpha and beta prototypes, were ultimately designed and tested. To maximize the uniformity of the sequencing libraries after capture, re-sequencing data from the alpha prototype design were empirically studied and used to inform a coverage re-balancing algorithm. That algorithm altered the probe coverage target ratio of a second design (beta prototype) in an attempt to decrease over-represented sequence coverage, and increase under-represented sequence coverage. The target (primary design) coordinates and the coordinates of the capture probes in the beta design are available at . The summary statistics for each probe pool are shown in Additional file 1.
Exome capture performance and optimization
Direct comparison of coverage statistics from exome re-sequencing (2 × 40 bp, Illumina) of four inbred strains with two exome probe pool designs, alpha and beta
Target bases covered
Percentage target bases covered
Target bases not covered
Percentage target bases not covered
The beta design was made using a proprietary rebalancing algorithm from Roche NimbleGen (Madison, WI, USA) that removes probes from targets with high coverage and adds probes to low coverage targets in order to maximize coverage across targets. In addition to testing the beta design by exome capture and 2 × 40 bp PE Illumina sequencing of four different inbred strains, the beta design was also tested with four independent captures of C57BL/6J female DNA and sequenced on the Illumina GAII platform, 2 × 76 bp PE. The most dramatic improvement was observed in the fraction of targeted bases covered at 20× or more where the increase in uniformity resulted in 12% improvement (Additional file 2).
Sequencing of mutant exomes
Representative coverage statistics from exome re-sequencing (2 × 100 bp) of six mutant strains
Final target bases
Target bases covered
Percentage target bases covered
Target bases not covered
Percentage target bases not covered
Number of reads in target regions
Percentage reads in target regions
Coverage at 20×
Coverage at 10×
Coverage at 5×
Coverage at 1×
Mapping and variant calling
Mapping to the mouse reference sequence (C57BL/6J, NCBI37/mm9) and subsequent variant calling resulted in a number of single nucleotide variants (SNVs) and insertions/deletions (INDELs) ranging from approximately 8,000 (C57BL/6J background) to over 200,000 (for more divergent strain backgrounds) variant calls per mutant exome, depending on strain background and depth of coverage. Generally, approximately two-thirds of the variants called were SNVs, rather than INDELS. However, in mutants on the C57BL/6J background, this ratio was closer to approximately one-half (Additional file 3). This is not surprising given that a large proportion of false positive calls from reference guided assembly are INDELs and the number of true variants in any C57BL/6J exome is expected to be low because the mouse reference strain is, primarily, C57BL/6J. The one exception was mutant 12860 (nert), which was reported to be on a C57BL/6J background; however, the relatively large number of variants detected in this mutant exome could indicate that the reported strain background is likely incorrect.
Variant annotation and nomination of candidate mutations
Analysis of annotated variant data from mutant exome sequencing
Mutant number (allele)
Mutation type: strain background
In gene (introns, exons)
Overlap with map position
Non-synonymous coding variants, splice sites
Spontaneous: stock (mixed B6)
ENU: C57BL/6J, C3HeB/FeJ
ENU: C57BL/6J, C3H/HeJ, Cast/EiJ
Notch3, splice donor site (G to A), intron 31
Prkra, intron 5, splice donor
ENU: B6, 129
Rundc3a, Y46F; Nek8, V343E
Spontaneous: C57BL/6J, AKR/J
Validation of putative causative mutations
In some cases, more than one potentially damaging variant was found to correlate with the phenotype when additional affected and unaffected animals from the pedigree were genotyped (Table 3). In two cases, hpbk and vgim, where more than one variant was found, only one variant could be validated while the other variants were false positives. In two cases where more than one potentially damaging variant was found, both were validated. Not surprisingly, these cases were ENU-induced mutant exomes (Cleft and l11Jus74) and ENU is known to cause mutations at a rate of greater than 1 in 750 per locus per gamete  at doses of 85 mg/kg. Cleft is a dominant craniofacial ENU mutation that causes cleft palate. Of the two variants that were nominated for validation, both were SNVs residing in Col2a1, a gene coding for type II procollagen. Both SNVs reside within 10 kb of each other (Chr15:97815207 and Chr15:97825743) in Col2a1, a gene coding for type II procollagen, and not surprisingly were found to be concordant with the phenotype when multiple animals from the pedigree were genotyped. The most likely causative lesion (G to A at Chr15:97815207) is a nonsense mutation that introduces a premature stop codon at amino acid 645. The second closely linked variant is an A to T transversion in intron 12 that could potentially act as a cryptic splice site. However, since RT-PCR did not reveal splicing abnormalities, it is more likely that the nonsense mutation is the causative lesion (Figure 2b). Mice homozygous for targeted deletions in Col2a1 and mice homozygous for a previously characterized, spontaneous mis-sense mutation, Col2a1 sedc , share similar defects in cartilage development to Cleft mutants, including recessive peri-natal lethality and orofacial clefting [19, 20], providing further support that the Cleft phenotype is the result of a mutation in Col2a1.
The l11Jus74 mutation was isolated in a screen for recessive lethal alleles on mouse chromosome 11 using a 129.Inv(11)8Brd Trp53-Wnt3 balancer chromosome [21, 22]. The screen was performed as described previously using C57BL/6J ENU-treated males, mated to the balancer, which was generated in 129S5SvEv embryonic stem cells. Embryos from the l11Jus74 line were analyzed from timed matings, as previously described , to determine that homozygotes die perinatally. Two potentially causative missense mutations were found in Nek8 (NIMA (never in mitosis gene a)-related expressed kinase 8; V343E) and Rundc3a (Run domain containing 3a; Y46F). Mutations in Nek8 cause polycystic kidney disease, but no phenotypes have been ascribed to mutations in Rundc3a. Although the cause of death of l11Jus74 homozygotes has not been determined, polycystic kidneys have not been observed, making the most likely lesion to result in perinatal death Rundc3a, although the Nek8 mutation may cause a delayed onset phenotype.
For all four of the ENU-induced mutant exomes sequenced, putative causative mutations were nominated and validated. Mutations induced by ENU are usually single nucleotide substitutions. The high sensitivity of current analytical pipelines for detecting single nucleotide substitutions (and particularly homozygous substitutions), combined with the propensity of damaging single nucleotide substitutions to occur in coding sequences, likely explains the high success rate of exome sequencing for detecting induced lesions. Similarly, Boles et al.  showed that targeted sequencing of exons and highly conserved sequences from ENU mutants mapping to chromosome 11 yielded a high success rate, with candidate mutations nominated in nearly 75% of mutants.
While mutations induced by mutagens like ENU are known to cause single nucleotide substitutions, spontaneous mutations are the result of a variety of lesions, including single nucleotide substitutions, small INDELS and larger deletions or insertions of mobile DNA elements. Of the nine potentially damaging coding or splicing mutations discovered in this set of mutant exomes, the spontaneous Sofa mutant was the only one for which a single nucleotide substitution was not discovered. Instead, a 15-bp deletion in Pfas (Table 3; Figure 2d,e) was found, demonstrating that small deletions in coding sequence can be discovered using this approach.
Interestingly, the allele ratio for the Sofa deletion was 0.2, which is lower than expected for a heterozygote; therefore, a stringent cutoff of 0.5 or even 0.35, which we previously found was sufficient for calling heterozygous variants at approximately 80% confidence , would have eliminated this variant from consideration. The lower allele ratio is likely the result of bias in either the capture of the INDEL-containing fragments, and/or the ability to appropriately map some of the INDEL-bearing reads. Since the library fragments are larger than both the probes and the exons they target and because each target is tiled with multiple probes, there are expected to be perfect match probes somewhere within an exon for nearly every allele despite the presence of an INDEL. Consequently, we favor a mapping problem as the major driver for the lower than expected allele ratio observed (Figure 2e). Longer reads may alleviate some systematic issues associated with discovering relevant deletions or insertions. A 15-bp deletion would maximally comprise a mismatch of nearly 38% along a 40-bp read, but only 20% within a 76-bp read. Large gaps (20% or more of the read) would impose a stiff mapping penalty on that end of read pairs. Presumably, longer reads (100 bp or longer) would incur lower penalties, thereby moderating adverse mapping effects.
In silico analysis of all induced or spontaneous alleles (4,984) with phenotypes reported in the Mouse Genomes Database 
Number of alleles
Unknown or uncharacterized
Introns, UTRs, regulatory regions (including instances where the lesion is not known but coding sequence has been sequenced), cryptic splice sites, inversions
Exons (single nucleotide substitutions, deletions, insertions)
Conserved splice acceptor or donor
Traditional genetic mapping and exome sequencing
In all cases, either coarse mapping data (chromosomal linkage) or a fine map position (< 20 Mb) was available to guide analysis and ease validation burden (Additional file 3). For example, the shep mutation was previously linked to chromosome 7 (approximately 152 Mb), while repro7 was fine mapped to a 4.5 Mb region on chromosome 17. The mapping of shep to chromosome 7 was accomplished using a group of 20 affected animals, while the fine mapping of repro7 to a 4.5 Mb region on chromosome 17 required the generation of 524 F2 animals, requiring over a year of breeding in limited vivarium space. In both cases, the mapping data coupled with the additional filtering of annotated data, as shown in Table 3, significantly reduced the validation burden to a single variant. Therefore, high-throughput sequencing (exome or whole genome) represents a cost efficient alternative to fine mapping by recombination, especially in cases where vivarium space and time are limited resources.
In the absence of chromosomal linkage, the validation burden is significantly larger. For example, the vgim mutant exome was reanalyzed without utilizing mapping information (Table 3, last row) and 38 variants were nominated for validation. Addition of just the chromosomal linkage data for vgim (chromosome 13), but not the fine mapping data (chr13:85473357-96594659) reduces the validation burden to two candidates. Therefore, coarse mapping to establish chromosomal linkage provides significant reduction in validation burden at minimal additional animal husbandry cost and time. In the absence of mapping data and/or when mutations arise on unusual genetic backgrounds, exome sequencing of additional samples (affected animal and parents) would similarly reduce the validation burden to just one or a few variants.
Limitations of exome sequencing for mutation discovery
Validation of putative causative coding mutations in 15 mutant exomes
Mutant number (allele)
In gene (introns, exons)
Overlap with map position
Non-synonymous coding variants, splice sites
Validation of coding/splice variants
Variants in UTRs
Spontaneous: C57BL/6J, 129S1/SvImJ
3: Kcnab3, Pigs, Accn1
4: 4931406P16Rik, Shisa7, Nipa1, Alpk3
4: Eif2ak3, Mrpl35, Usp39 (2)
Spontaneous: C57BL/6J, A/J
In a parallel effort, we used targeted sequencing of contiguous regions to discover spontaneous mutations that have been mapped to regions of 10 Mb or less. Interestingly, the success rate for nominating putative mutations via targeted sequencing of contiguous regions was comparable to that of exome sequencing (at approximately 60%), demonstrating that despite the availability of sequence data representing the entire candidate region, existing analysis pipelines are not sufficient for discovery of all disease-causative genetic lesions. Moreover, systematic errors in the mm9 reference sequence or insufficient gene annotation  are also likely to contribute to failed mutation discovery, since current analytical approaches rely upon reference and contemporary gene annotation as assumed underlying truth.
In this context, it is notable that the exome-based analysis of human phenotypes that are presumed to be monogenic is also frequently unsuccessful, although such negative results are generally not reported in the literature. Consequently, we anticipate that deeper analysis of the mouse mutants that fail discovery by exome sequencing may also shed light on the nature of both non-coding and cryptic coding mutations that contribute to Mendelian phenotypes in humans.
Whole exome sequencing is a robust method for mutation discovery in the mouse genome and will be particularly useful for high-throughput genetic analyses of large mutant collections. Due to the nature of the underlying mutations and the current methods available for massively parallel sequence data analysis, ENU mutation discovery via exome sequencing is more successful than spontaneous mutation discovery. In all cases, coarse mapping data (chromosomal linkage) significantly eased validation burden (Table 3); however, fine mapping to chromosomal regions < 10 to 20 Mb, while useful, did not provide significant added value (Table 3; Additional file 3). A similar conclusion was drawn by Arnold et al.  for mutation discovery via whole genome sequencing. In addition, since the data shown here include mutations on a variety of strain backgrounds, comparison across unrelated exome data sets and to whole genome sequencing data from the Mouse Genomes Project  proved critical in reducing the validation burden, especially where mapping data were not available to guide analysis.
Although we are 10 years past the assembly of both the human and mouse genomes, the biological function of the vast majority of mammalian genes remains unknown. We anticipate that the application of exome sequencing to the thousands of immediately available mutant mouse lines exhibiting clinically relevant phenotypes will make a large and highly valuable contribution to filling this knowledge gap.
Materials and Methods
Exome capture and sequencing
The following protocol for exome capture and sequencing is the standard protocol generally followed by all sites providing data for proof-of-concept experiments. Site-specific deviations in the standard protocol can be provided upon request. The mouse exome probe pools developed in this study, SeqCap EZ Mouse Exome SR, are commercially available on request from Roche NimbleGen.
DNA for high-throughput sequencing was isolated from spleen using a Qiagen DNeasy Blood and Tissue kit (Qiagen, Santa Clarita, CA USA) or by phenol/chloroform extraction of nuclear pellets. Briefly, spleen samples were homogenized in ice-cold Tris lysis buffer (0.02 M Tris, pH 7.5, 0.01 M NaCl, 3 mM MgCl2). Homogenates were then incubated in 1% sucrose, 1% NP40 to release nuclei, which were subsequently pelleted by centrifugation at 1,000 rpm, 4°C. Isolated nuclei were then extracted by phenol chloroform in the presence of 1% SDS. DNA for PCR was extracted from small (1 to 2 mm) tail biopsies by lysing in 200 ml of 50 mM NaOH at 95°C for 10 minutes. Samples were neutralized by adding 20 ml of 1 M Tris HCl, pH 8.0 and used directly for PCR amplification.
Capture library preparation and hybridization amplification
Illumina PE libraries (Illumina, San Diego, CA, USA) were constructed using Illumina's Multiplexing Kit (part number PE-400-1001) with a few modifications. Size selection was done using the Pippin Prep from Sage Science, Inc. (Beverly, MA, USA). The target base pair selection size was set at 430 bp. The entire 40 μl recovery product was used as template in the pre-hybridization library amplification (using ligation-mediated PCR (LMPCR)). Pre-hybridization LMPCR consisted of one reaction containing 50 μl Phusion High Fidelity PCR Master Mix (New England BioLabs, Ipswich, MA, USA; part number F-531L), 0.5 μM of Illumina Multiplexing PCR Primer 1.0 (5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3'), 0.001 μM of Illumina Multiplexing PCR Primer 2.0 (5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3'), 0.5 μM of Illumina PCR Primer, Index 1 (or other index at bases 25-31; 5'-CAAGCAGAAGACGGCATACGAGAT(CGTGATG)TGACTGGAGTTC-3'), 40 μl DNA, and water up to 100 μl. PCR cycling conditions were as follows: 98°C for 30 s, followed by 8 cycles of 98°C for 10 s, 65°C for 30 s, and 72°C for 30 s. The last step was an extension at 72°C for 5 minutes. The reaction was then kept at 4°C until further processing. The amplified material was cleaned with a Qiagen Qiaquick PCR Purification Kit (part number 28104) according to the manufacturers instructions, except the DNA were eluted in 50 μl of water. DNA was quantified using the NanoDrop-1000 (Wilmington, DE, USA) and the library was evaluated electrophoretically with an Agilent Bioanalyzer 2100 (Santa Clara, CA, USA) using a DNA1000 chip (part number 5067-1504). Sample multiplexing was performed in some cases, after capture and prior to sequencing.
Liquid phase sequence capture and processing
Prior to hybridization the following components were added to a 1.5 ml tube: 1.0 μg of library material, 1 μl of 1,000 μM oligo 5'- AATGATACGGCGACCACCGAGATCTACACTCTT TCCCTACACGACGCTCTT CCG ATC*T-3' (asterisk denotes phosphorothioate bond), 1 μl of 100 μM oligo 5' CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T-3' (bases 25 to 31 correspond to index primer 1), and 5 μg of Mouse COT-1 DNA (part number 18440-016; Invitrogen, Inc., Carlsbad, CA, USA). Samples were dried down by puncturing a hole in the 1.5-ml tube cap with a 20 gauge needle and processing in an Eppendorf Vacufuge (San Diego, CA, USA) set to 60°C for 20 minutes. To each sample 7.5 μl NimbleGen SC Hybridization Buffer (part number 05340721001) and 3.0 μl NimbleGen Hybridization component A (part number 05340721001) were added, sample was vortexed for 30 s, centrifuged, and placed in a heating block at 95°C for 10 minutes. The samples were again mixed for 10 s, and spun down. This mixture was then transferred to a 0.2-ml PCR tube containing 4.5 μl of Mouse Exome Solution Phase probes and mixed by pipetting up and down ten times. The 0.2 ml PCR tubes were placed in a thermocylcer with heated lid at 47°C for 64 to 72 hours. Washing and recovery of captured DNA were performed as described in chapter 6 of the NimbleGen SeqCap EZ Exome SR Protocol version 2.2 (available from the Roche NimbleGen website) . Samples were then quality checked using quantitative PCR as described in chapter 8 of the SR Protocol version 2.2 . Sample enrichment was calculated and used as a means of judging capture success. Mean fold enrichment greater than 50 was considered successful and sequenced. NimbleGen Sequence Capture Control (NSC) quantitative PCR assay NSC-0272 was not used to evaluate captures in these experiments.
Post-hybridization amplification (for example, LMPCR via Illumina adapters) consisted of two reactions for each sample using the same enzyme concentration as the pre-capture amplification, but a modified concentration, 2 uM, and different versions of the Illumina Multiplexing 1.0 and 2.0 primers were employed: forward primer 5'- AATGATACGGCGACCACCGAGA and reverse primer 5'-CAAGCAGAAGACGGCATACGAG. Post-hybridization amplification consisted of 16 cycles of PCR with identical cycling conditions as used in the pre-hybridization LMPCR (above), with the exception of the annealing temperature, which was lowered to 60°C. After completion of the amplification reaction, the samples were purified using a Qiagen Qiaquick column following the manufacturer's recommended protocol. DNA was quantified spectrophotometrically, and electrophoretically evaluated with an Agilent Bioanalyzer 2100 using a DNA1000 chip (Agilent). The resulting post-capture enriched sequencing libraries were diluted to 10 nM and used in cluster formation on an Illumina cBot and PE sequencing was done using Illumina's Genome Analyzer IIx or Illumina HiSeq. Both cluster formation and PE sequencing were performed using the Illumina-provided protocols.
High-throughput sequencing data analysis
Mapping, SNP calling and annotation
The sequencing data were mapped using Maq, BWA (Burrows-Wheeler alignment tool) and/or GASSST (global alignment short sequence search tool) and SNP calling was performed using SAMtools  and/or GenomeQuest . SNP annotation was performed using GenomeQuest, custom scripts and Galaxy tools. Alignments were visualized with the UCSC genome browser, Integrated Genomics Viewer (Broad Institute) and/or SignalMap (Roche NimbleGen).
Candidate mutations were validated by PCR amplification and sequencing of affected and unaffected samples if available from the mutant colony or from archived samples. Sequencing data were analyzed using Sequencher 4.9 (Gene Codes Corp., Ann Arbor, MI, USA). Primers were designed using Primer3 software .
Total RNA was isolated from heterozygous and homozygous tail biopsies and/or embryos using the RNeasy Mini Kit (Qiagen) according to the manufacturer's protocols. Total RNA (1 μg) was reverse transcribed into cDNA using the SuperScript III First-Strand Synthesis SuperMix for quantitative RT-PCR (Invitrogen) according to the manufacturer's protocols. cDNA (3 μl) was used as template in a 30 μl PCR with the following cycling conditions for all primers (0.4 μM final concentration): 94°C (45 s), 56°C (45 s), 72°C (45 s) for 30 cycles. Primers used for Cleft were Cleft_11-14f (5'-CTGGAAAACCTGGTGACGAC) and Cleft_11-14R (5'-ACCAGCTTCCCCCTTAGC).
Single Nucleotide Polymorphism Database
National Center for Biotechnology Information
polymerase chain reaction
NCBI Reference Sequence
reverse transcriptase polymerase chain reaction
single nucleotide variant
The Vertebrate Genome Annotation database.
We are grateful to the Mouse Genome Informatics team at The Jackson Laboratory for providing custom queries of the Mouse Genome Database. We would also like to thank Belinda Harris, Son Yong Karst, Louise Dionne, Pat Ward-Bailey and Coleen Kane of the The Jackson Laboratory Mutant Mouse Resource for animal husbandry and technical assistance. We also thank Lindsay Felker, Alexandra MacKenzie, and Choli Lee at the University of Washington for analytical and technical assistance. We are grateful to the Illumina High Throughput Sequencing Service and the DNA Resource at The Jackson Laboratory for providing sequencing support and archived DNA samples. The repro7 mutant was obtained from The Reproductive Genomics program at The Jackson Laboratory (NICHD P01 HD42137) and was sequenced at the Broad Institute under the Mouse Mutant Re-sequencing Project . This work was supported in part by the Australian Phenomics Network. This work was also supported in part by The Mouse Mutant Resource and the Craniofacial Resource at The Jackson Laboratory, NIH-NCRR RR001183, NEI EY015073. MSS was supported by a generous contribution from The Don Monti Memorial Research Foundation. SWL is a Howard Hughes Medical Institute Investigator and is also supported in part by the Mouse Models of Human Cancer Consortium, grant 5U01 CA105388.
- Mouse Genome Informatics. [http://www.informatics.jax.org/mgihome/homepages/stats/all_stats.shtml]
- Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol. 2008, 26: 1135-1145. 10.1038/nbt1486.PubMedView ArticleGoogle Scholar
- Zhang Z, Alpert D, Francis R, Chatterjee B, Yu Q, Tansey T, Sabol SL, Cui C, Bai Y, Koriabine M, Yoshinaga Y, Cheng JF, Chen F, Martin J, Schackwitz W, Gunn TM, Kramer KL, De Jong PJ, Pennacchio LA, Lo CW: Massively parallel sequencing identifies the gene Megf8 with ENU-induced mutation causing heterotaxy. Proc Natl Acad Sci USA. 2009, 106: 3219-3224. 10.1073/pnas.0813400106.PubMedPubMed CentralView ArticleGoogle Scholar
- D'Ascenzo M, Meacham C, Kitzman J, Middle C, Knight J, Winer R, Kukricar M, Richmond T, Albert TJ, Czechanski A, Donahue LR, Affourtit J, Jeddeloh JA, Reinholdt L: Mutation discovery in the mouse using genetically guided array capture and resequencing. Mamm Genome. 2009, 20: 424-436. 10.1007/s00335-009-9200-y.PubMedPubMed CentralView ArticleGoogle Scholar
- Arnold CN, Xia Y, Lin P, Ross C, Schwander M, Smart NG, Muller U, Beutler B: Rapid identification of a disease allele in mouse through whole genome sequencing and bulk segregation analysis. Genetics. 2011, 187: 633-641. 10.1534/genetics.110.124586.PubMedPubMed CentralView ArticleGoogle Scholar
- Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, Gildersleeve HI, Beck AE, Tabor HK, Cooper GM, Mefford HC, Lee C, Turner EH, Smith JD, Rieder MJ, Yoshiura K, Matsumoto N, Ohta T, Niikawa N, Nickerson DA, Bamshad MJ, Shendure J: Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet. 2010, 42: 790-793. 10.1038/ng.646.PubMedPubMed CentralView ArticleGoogle Scholar
- Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, Shendure J, Bamshad MJ: Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 2010, 42: 30-35. 10.1038/ng.499.PubMedPubMed CentralView ArticleGoogle Scholar
- Zuchner S, Dallman J, Wen R, Beecham G, Naj A, Farooq A, Kohli MA, Whitehead PL, Hulme W, Konidari I, Edwards YJ, Cai G, Peter I, Seo D, Buxbaum JD, Haines JL, Blanton S, Young J, Alfonso E, Vance JM, Lam BL, Pericak-Vance MA: Whole-exome sequencing links a variant in DHDDS to retinitis pigmentosa. Am J Hum Genet. 2011, 88: 201-206. 10.1016/j.ajhg.2011.01.001.PubMedPubMed CentralView ArticleGoogle Scholar
- Ostergaard P, Simpson MA, Brice G, Mansour S, Connell FC, Onoufriadis A, Child AH, Hwang J, Kalidas K, Mortimer PS, Trembath R, Jeffery S: Rapid identification of mutations in GJC2 in primary lymphoedema using whole exome sequencing combined with linkage analysis with delineation of the phenotype. J Med Genet. 2011, 48: 251-255. 10.1136/jmg.2010.085563.PubMedView ArticleGoogle Scholar
- Walsh T, Shahin H, Elkan-Miller T, Lee MK, Thornton AM, Roeb W, Abu Rayyan A, Loulus S, Avraham KB, King MC, Kanaan M: Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82. Am J Hum Genet. 2010, 87: 90-94. 10.1016/j.ajhg.2010.05.010.PubMedPubMed CentralView ArticleGoogle Scholar
- Bainbridge MN, Wang M, Burgess DL, Kovar C, Rodesch MJ, D'Ascenzo M, Kitzman J, Wu YQ, Newsham I, Richmond TA, Jeddeloh JA, Muzny D, Albert TJ, Gibbs RA: Whole exome capture in solution with 3 Gbp of data. Genome Biol. 2010, 11: R62-10.1186/gb-2010-11-6-r62.PubMedPubMed CentralView ArticleGoogle Scholar
- Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloglu A, Ozen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, Lifton RP: Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci USA. 2009, 106: 19096-19101. 10.1073/pnas.0910672106.PubMedPubMed CentralView ArticleGoogle Scholar
- Blake JA, Bult CJ, Kadin JA, Richardson JE, Eppig JT: The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 2011, 39: D842-848. 10.1093/nar/gkq1008.PubMedPubMed CentralView ArticleGoogle Scholar
- Mouse Exome Gene List. [ftp://ftp.jax.org/Genome_Biology_mouse_exomes/mouse_exome_genes.xls.zip]
- Mouse Exome Design. [ftp://ftp.jax.org/Genome_Biology_mouse_exomes/100803_MM9_exome_rebal_2_EZ_HX1.gff.bz2]
- Mouse Genomes Project. [http://www.sanger.ac.uk/resources/mouse/genomes/]
- Brancho D, Ventura JJ, Jaeschke A, Doran B, Flavell RA, Davis RJ: Role of MLK3 in the regulation of mitogen-activated protein kinase signaling cascades. Mol Cell Biol. 2005, 25: 3670-3681. 10.1128/MCB.25.9.3670-3681.2005.PubMedPubMed CentralView ArticleGoogle Scholar
- Hitotsumachi S, Carpenter DA, Russell WL: Dose-repetition increases the mutagenic effectiveness of N-ethyl-N-nitrosourea in mouse spermatogonia. Proc Natl Acad Sci USA. 1985, 82: 6619-6621. 10.1073/pnas.82.19.6619.PubMedPubMed CentralView ArticleGoogle Scholar
- Leung AW, Wong SY, Chan D, Tam PP, Cheah KS: Loss of procollagen IIA from the anterior mesendoderm disrupts the development of mouse embryonic forebrain. Dev Dyn. 2010, 239: 2319-2329. 10.1002/dvdy.22366.PubMedView ArticleGoogle Scholar
- Donahue LR, Chang B, Mohan S, Miyakoshi N, Wergedal JE, Baylink DJ, Hawes NL, Rosen CJ, Ward-Bailey P, Zheng QY, Bronson RT, Johnson KR, Davisson MT: A missense mutation in the mouse Col2a1 gene causes spondyloepiphyseal dysplasia congenita, hearing loss, and retinoschisis. J Bone Miner Res. 2003, 18: 1612-1621. 10.1359/jbmr.2003.18.9.1612.PubMedPubMed CentralView ArticleGoogle Scholar
- Kile BT, Hentges KE, Clark AT, Nakamura H, Salinger AP, Liu B, Box N, Stockton DW, Johnson RL, Behringer RR, Bradley A, Justice MJ: Functional genetic analysis of mouse chromosome 11. Nature. 2003, 425: 81-86. 10.1038/nature01865.PubMedView ArticleGoogle Scholar
- Zheng B, Sage M, Cai WW, Thompson DM, Tavsanli BC, Cheah YC, Bradley A: Engineering a mouse balancer chromosome. Nat Genet. 1999, 22: 375-378. 10.1038/11949.PubMedView ArticleGoogle Scholar
- Hentges KE, Nakamura H, Furuta Y, Yu Y, Thompson DM, O'Brien W, Bradley A, Justice MJ: Novel lethal mouse mutants produced in balancer chromosome screens. Gene Expr Patterns. 2006, 6: 653-665. 10.1016/j.modgep.2005.11.015.PubMedView ArticleGoogle Scholar
- Boles MK, Wilkinson BM, Wilming LG, Liu B, Probst FJ, Harrow J, Grafham D, Hentges KE, Woodward LP, Maxwell A, Mitchell K, Risley MD, Johnson R, Hirschi K, Lupski JR, Funato Y, Miki H, Marin-Garcia P, Matthews L, Coffey AJ, Parker A, Hubbard TJ, Rogers J, Bradley A, Adams DJ, Justice MJ: Discovery of candidate disease genes in ENU-induced mouse mutants by large-scale sequencing, including a splice-site mutation in nucleoredoxin. PLoS Genet. 2009, 5: e1000759-10.1371/journal.pgen.1000759.PubMedPubMed CentralView ArticleGoogle Scholar
- Galaxy. [http://main.g2.bx.psu.edu]
- GenomeQuest. [http://www.genomequest.com/]
- Primer3. [http://frodo.wi.mit.edu/primer3/]
- Mouse Mutant Re-sequencing Project. [http://www.broadinstitute.org/scientific-community/science/projects/mammals-models/mouse/mouse-mutant-resequencing]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.