Enrichment of sequencing targets from the human genome by solution hybridization
- Ryan Tewhey†1, 2,
- Masakazu Nakano†1, 4,
- Xiaoyun Wang1, 4,
- Carlos Pabón-Peña3,
- Barbara Novak3,
- Angelica Giuffre3,
- Eric Lin3,
- Scott Happe3,
- Doug N Roberts3,
- Emily M LeProust3,
- Eric J Topol1,
- Olivier Harismendy1, 4Email author and
- Kelly A Frazer1, 4Email author
© Tewhey; licensee BioMed Central Ltd. 2009
Received: 17 June 2009
Accepted: 16 October 2009
Published: 16 October 2009
To exploit fully the potential of current sequencing technologies for population-based studies, one must enrich for loci from the human genome. Here we evaluate the hybridization-based approach by using oligonucleotide capture probes in solution to enrich for approximately 3.9 Mb of sequence target. We demonstrate that the tiling probe frequency is important for generating sequence data with high uniform coverage of targets. We obtained 93% sensitivity to detect SNPs, with a calling accuracy greater than 99%.
Over the past several years, genome-wide association (GWA) studies have identified compelling statistical associations between more than 350 different loci in the human genome and common complex traits . However, great difficulty occurs in moving beyond these statistical associations to identifying the causative variants and functional basis of the link between the genomic interval and the given complex trait. Population sequencing of these genomic intervals has been proposed as a method for identifying the causal common variants underlying the statistical associations and also for examining the potential contribution of rare variants in the interval to the complex trait of interest . Next-generation sequencing technologies and their increased capacity have made it feasible to sequence efficiently hundreds of megabases of DNA. However, the current costs for sequencing entire human genomes makes this approach prohibitively expensive for population studies. Targeted sequencing of the specific loci associated with a complex trait in large numbers of individuals is a promising approach for using current sequencing technologies to identify and characterize the variants in these intervals. Additionally, population sequencing of candidate genes or the entire human exome may, in the near future, potentially make sequence-based association studies possible.
Several methods have been proposed for enrichment of sequence targets from the human genome. PCR has been used to amplify a large hundred-kilobase-size interval associated with prostate cancer for targeted sequencing in 79 individuals  and also the exons of hundreds of genes to identify somatic mutations in hundreds of individual tumors [3, 4]. Although PCR enriches target sequences with high specificity and sensitivity, it is difficult to scale the method. A second approach is hybridization-based methods using oligonucleotide probes either attached to a solid array [5–7] or in solution  to capture the sequencing targets. The solid-phase hybridization approach has been used to capture the entire human exome, reported in several published studies [7, 9]; however, the process is difficult to scale for large population studies. A proof-of-principle study for solution-phase hybridization by using long 170-bp capture probes has recently been published . Although this study clearly demonstrated the utility of the approach, at a depth of 84× coverage, the variant-detection sensitivity was only 64% to 80% within the exonic sequences, likely because of insufficient coverage uniformity.
Results and discussion
Targeted genomic sequences
In total, about 3.6 Mb of human sequences consisting of three contiguous intervals (0.4 Mb) and the coding and potential regulatory elements of 622 genes (3.2 Mb) distributed across the genome were targeted for enrichment. The three contiguous genomic intervals spanned 125 kb on 8q24, 196 kb on 9p21, and 100 kb on 19q13 (Additional data file 1). The targeted sequences of the 622 genes comprised 9,215 exons and 4,886 evolutionarily conserved sequences (ECSs) located within 10 kb upstream or 20 kb downstream of the genes (Additional data file 2). ECSs were identified as stretches of contiguous sequence greater than 50 bases that had conservation scores of 0.75 or more within the 28-way placental mammalian conservation track at the UCSC genome browser .
Probe design efficiency
Efficiency of target enrichment
Efficiency of target enrichment
Filtered reads1 (Mb)
Mapped bases, Mb
Uniquely mapped vases, Mb
On or near target2
On or near target2
Repetitive elements compose a significant fraction of the human genome, and it is important to reduce their presence in the solution-hybridization step to enrich efficiently for targeted sequences. We examined the efficiency of masking repetitive elements during the capture probe design and of reducing their nonspecific hybridization by adding Cot-1 DNA in the solution-hybridization step by determining how many of the off-targeted sequences map to LINE and SINE elements that compose 20% and 13% of the human genome, respectively (Figure 3). Of the 52% (153 Mb) of filtered bases that do not map on or near target, 8% correspond to LINE, and 4%, to SINE elements, indicating that the fraction of sequences that are repetitive elements in the background of target enriched samples (Figure 1) is about one third of that in the genome at large. These data show that about 400-fold enrichment of the targeted sequences was achieved when capturing approximately 3.9 Mb by the solution-hybridization method.
Uniformity of sequence coverage
Uniformity of sequence coverage
Proportion of mapped bases on targets1 (%)
1/5 to <5
We expect capture probes of different GC content to behave differently in the solution-hybridization step and that this would have an effect on the resulting sequence coverage. We plotted the GC content of each probe versus the normalized coverage of the probe (Figure 4b). The GC content of the capture probes ranged from 15% to 86%, and, as expected, the scatterplot appears to have a gaussian distribution, with the peak of the distribution at about 45% GC content. The normalized coverage decreased to less than 0.5 when the GC content was lower than about 23% or higher than about 66%. These results seem to be reflecting the low efficiency of hybridization to the targets with base composition of either AT or GC rich. However, other possible explanations exist for these observations, including potential oligonucleotides synthesis issues of high- and low-GC content probes, potential self-structure of targeted DNA during hybridization, and potential biases in the PCR amplification step during generation of the sequencing libraries, resulting in fewer corresponding targeted sequences .
Effect of probe-tiling frequency on sequence coverage
To gain insight into the optimal density for tiling the capture probes, we assessed the effect of probe-tiling frequency on sequence coverage (Figure 4c). As described in Methods, the probe-tiling frequency varied from 1× to 4× for the targeted sequences. We separated the 52,187 capture probes into five bins based on their probe-tiling frequency then plotted the distribution of the normalized coverage for each bin (Figure 4c). The normalized coverage increased from 1× to 1.5× to 2× probe-tiling frequency and then formed a plateau. These results suggest that sequence coverage is improved if each targeted base pair is contained within two different capture probes but is not affected by a greater tiling density. To examine further the effects of probe-tiling frequency, we plotted the length of targeted exons and ECS regions compared with normalized coverage (Figure 4d). The lengths of the targets varied from 120 bp to 7,860 bp, and targets less than 180 bp in length (1 through 1.5× tiling frequency) had less coverage than longer exonic sequences, which had 2× tiling frequency (see Methods). These results indicate that, for optimal coverage of human exons shorter than 180 bp in length , at least three 120-mer capture probes per exon should be used to achieve an optimal tiling frequency.
The ability to capture reproducibly targeted sequences across multiple samples is of high importance to perform sequence-based association studies. We assessed the reproducibility of this enrichment method by comparing technical replicates of the solution-hybridization step (Figure 1). For each of the two samples, a single genomic DNA library was generated, and two aliquots of each sample library were independently hybridized with capture probes (Capture 1 and Capture 2).
We next examined sample-to-sample reproducibility by comparing the normalized coverage of one technical replicate of NA15510 with one technical replicate of HE00069 (Figure 5b). The correlation of the two samples was very good (r2 = 0.85) but significantly lower than that observed for the technical replicates of the same sample. These results could reflect sequence-variant differences in the two samples or that differences in the genomic DNA-fragment library step may affect the solution-hybridization step, or both. In either case, our results suggest that the reproducibility of capturing targeted sequences across samples in different experiments will be sufficient to allow sequence-based association studies.
Accuracy of variant calling
Variant detection rate and concordance1
Variant detection rate (%)
Variant concordance3 (%)
Number of discordant SNPs
On or near target
On or near target
On or near target
On or near target
To gain insight into the source of the variant-calling errors in the sequence data, we carefully examined the discrepancies that occurred for on-target bases with five reads and an MAQ quality score of 30 or more. In the four samples, in total, 51 discrepant variants were found across 33 positions (Additional data file 3). For 24 of the discrepancies, the variant calls between the two replicates agreed with each other, suggesting that the microarray data are incorrectly calling the SNP, but not ruling out the less-likely possibility of a systematic error in the sequencing. In 18 of the discrepancies, the replicate sample was unable to make a high-quality call. The majority of these positions (72%) were missed heterozygote genotypes in which the sequence coverage was low for both samples. The remaining nine discrepancies were called correctly in one of the replicates and incorrectly in the second. All but one of the variants were below the mean coverage, and the majority (six of nine) were missed heterozygote calls. It is important to note that these errors represent a minor fraction of the heterozygous sites and that the vast majority are correctly called in the sequence data. Twenty-two positions were discordant in only one replicate, the majority of which (64%) failed to be called in the second replicate. The remaining one position was discordant in one NA15510 and one HE00069 replicate and not called in the other replicate. These results suggest that approximately half of the discrepant variants bases are likely attributable to errors in the microarray data, and the other half are likely errors in the sequence data. Thus, the accuracy of calling SNPs in the sequence data may be greater than 99.7% (Table 3). Additionally, because most of the discrepancies attributed to sequencing errors were of lower coverage, it is reasonable to assume that an increase in sequencing depth or capture uniformity would rescue these variants.
Novel and functional variants
Zygosity and functional annotation of exonic variants
--- 3' UTR
--- 5' UTR
Both samples had roughly an equal number of nonsynonymous variants in the 622 genes, with one in every five genes having a heterozygote and one in every 10 having a homozygote nonsynonymous variant. Most of the nonsynonymous SNPs are present in dbSNP and might thus be common variants not specific to our samples (Table 4). Of the 191 nonsynonymous variants found in NA15510, nine were predicted to cause an amino acid substitution that results in a functional change, as determined by the program SIFT . This number was slightly higher in HE00069, with 13 of the 205 nonsynonymous changes predicted to cause a change in function.
Only a few extensive coding variation surveys have been performed in the human genome. Our analysis is consistent with previous whole-exome analysis  and thus supports the use of solution hybridization for targeted exon sequencing.
Our results show that the solution hybridization-based method can generate highly uniform coverage of sequence targets that is reproducible across samples. The method has limited, if any, systematic allelic biases resulting in dropout effects, as demonstrated by the greater than 99% SNP calling accuracy and especially the ability to call correctly most heterozygous sites. The solution hybridization-based method is clearly dependent on the ability to design successful capture probes to target sequences of interest. The ability to design capture probes is dependent on local sequence characteristics, and whereas 97% of the base pairs in exonic targets can be targeted, the success rate is only about 50% for base pairs in genomic intervals. We demonstrated that shorter 120-mer probes and an overlapping tiling strategy for probe design produces greater uniformity than previously published results for a solution hybridization-based study with 170-mer probes tiled with an end-to-end strategy . It is important to note that some of this increase in overall uniformity of coverage may in part be due to the fact that the shorter 120-mer probes are easier to synthesize in a reliable and consistent fashion than are 170-mer probes. This greater coverage uniformity allowed us to call confidently a higher proportion of variant bases at a sequence-coverage depth almost one third lower than that produced in the previous study. This improvement will result in reduced costs and more-complete variant detection for large-scale resequencing studies.
Two general types of population-based sequencing studies are currently under consideration in the community. The first type is sequence-based association studies that specifically focus on elements with known function. Relatively few repetitive sequences occur in the majority of known functional elements, and thus the success rate for designing capture probes is high. The second type is targeted sequencing of intervals associated through genome-wide association studies with a particular complex trait. In our study, the repetitive content of the three genomic intervals we targeted varied from 45% to 63%. Thus, although the base pairs in these genomic intervals for which capture probes can be designed are well represented in the resulting sequence data, a considerable fraction of bases cannot be investigated. It is important to note that analysis methods for investigating variants outside of exons and regulatory elements for function are currently nonexistent. Thus, the solution-hybridization approach for targeted sequencing is clearly optimal for sequence-based association studies, and the limitations of capture-probe design have to be taken into account for targeted sequencing of genomic intervals.
Overall, our study demonstrates that the solution hybridization-based method is well suited for the enrichment of loci in the mega-base-pair scale from the human genome for population studies using current sequencing technologies.
Materials and methods
One sample (NA15510; Caucasian) was obtained from the Coriell Institute for Medical Research , and the second sample (HE00069; Caucasian) was obtained from the Scripps Translational Science Institute  "Wellderly" cohort. The genomic DNA for NA15510 was isolated from Epstein-Barr virus-transformed cell line. The Wellderly Study has been approved by Institutional Review Board of Scripps Health, and enrollment of participants and blood collection were carried out in accordance with the Helsinki Declaration. Genomic DNA of HE00069 was isolated from blood by the PAXgene Blood DNA Kit (Qiagen, Inc., Valencia, CA, USA), according to the manufacturer's instructions.
Probe design and synthesis
The biotinylated-cRNA probe solution was manufactured by Agilent Technologies and was provided as capture probes. The sequences corresponding to the three genomic intervals and the 622 genes were uploaded to the Web-based probe-design tool, eArray . The coordinates of the sequence data in this study are based on NCBI Build 36.1 (UCSC hg18). The following parameters chosen were capture-probe length (120 bp), capture-probe tiling frequency (2×), allow overlap into avoid regions (20 bp), and avoid standard repeat masked regions option (eliminates repetitive sequences by using the RepeatMasker program alignment-based method). The 2× tiling-frequency parameter designed one capture probe for targeted sequences 120-bp or more (1× coverage per capture probe), two probes for targeted sequences between 120 and 180 bp (1.5× coverage per capture probe), and base pairs in targeted sequences more than 180 bp have 2× coverage, except for those at the ends of the sequence, which are covered at 1.5×. The genes in the targeted intervals also were included individually in the set of 622 genes. Therefore, the probe-tiling frequency varied from 1× to 4× for the targeted sequences. In total, 52,187 probes were designed (Additional data file 4), synthesized on a wafer, subsequently released off the solid support by selective chemical reaction, PCR amplified through universal primers attached on the probes, and then amplified and biotin-conjugated by in vitro transcription .
Genomic DNA-fragment library
Genomic DNA-fragment libraries were prepared according to the manufacturer's instructions (Illumina, Inc., San Diego, CA, USA) with slight modifications, as described . In brief, 3 μg of each genomic DNA (NA15510 and HE00069) was fragmented by Adaptive Focused Acoustics (Covaris S2; Covaris, Inc., Woburn, MA, USA) by using the following conditions: 20% duty cycle at intensity 5 for 90 seconds with 200 cycles per burst. This resulted in fragmentation of the genomic DNA to an average size of about 200 bp. After end repair and A-base tailing, the Illumina single-end adaptor was ligated. After size selection for a mean insert size of about 250 bp, each fragment library was enriched by 14-cycle PCR amplification by using 4 μl per fragment library as a template. The PCR-amplified fragment libraries were quantified by NanoDrop (ND8000; NanoDrop Technologies, Inc., Wilmington, DE, USA).
Solution hybridization and target enrichment
Technical replicates of the target-enrichment step for both samples NA15510- and HE00069- were performed (Figure 1). The genomic DNA-fragment libraries of the samples were split into two aliquots, with the target-enrichment step performed on one aliquot at Agilent Technologies and on the other aliquot at the Scripps Translational Science Institute. At both institutes, the same protocol was used from the solution hybridization through the PCR-enrichment steps. In a PCR plate, one unit of the capture probe (Agilent Technologies, Inc., Santa Clara, CA, USA; ELID number: 0220261) was mixed with 20 units of RNase inhibitor (SUPERase-In, Ambion, Inc., Austin, TX, USA), heated for 2 min at 65°C in GeneAmp PCR System 9700 thermocycler (Applied Biosystems, Inc., Foster City, CA, USA), and then mixed with prewarmed (65°C) 2× hybridization buffer (Agilent Technologies, Inc., Santa Clara, CA, USA; part number: G3360A). In a separate PCR plate, 500 ng of each genomic DNA-fragment library was mixed with 2.5 μg of human Cot-1 DNA, 2.5 μg of salmon sperm DNA, and 1 unit of blocking oligonucleotides complementary to the Illumina single-end adaptor, heated for 5 minutes at 95°C, and held for 5 minutes at 65°C in the thermocycler. Within 5 minutes, the mixture was added to the capture probes, and the solution hybridization was performed for 24 hours at 65°C.
After the hybridization, the captured targets were selected by pulling down the biotinylated probe/target hybrids by using streptavidin-coated magnetic beads (Dynal DynaMag-2; Invitrogen Corporation, Carlsbad, CA, USA). The magnetic beads were prepared by washing 3 times and resuspending in binding buffer (1 M NaCl, 1 mM EDTA, and 10 mM Tris-HCl, pH 7.5). The captured target solution was then added to the beads and rotated for 30 minutes at room temperature. The beads/captured targets were then pulled down by using a magnetic separator (DynaMag-Spin; Invitrogen Corporation), removing the supernatant, resuspending in prewarmed (65°C) wash buffer (Agilent Technologies, Inc.; part number: G3360A), and then incubated for 15 minutes at room temperature. The beads/captured probes were then pulled down with the magnetic separator and washed by resuspension and incubation for 10 minutes at 65°C in wash buffer. After three washes, elution buffer (0.1 M NaOH) was added and incubated for 10 minutes at room temperature. The eluted captured targets were then transferred to a tube containing neutralization buffer (1 M Tris-HCl, pH 7.5) and desalted with the MinElute PCR Purification Kit (Qiagen, Inc., Valencia, CA, USA). Finally, the targets were enriched by 18-cycle PCR amplification by using 1 μl per sample as a template, and the amplified targets were purified by QIAquick PCR Purification Kit (Qiagen, Inc.).
Sequencing by Illumina GAII
The four target-enriched samples (Figure 1) were quantified by PicoGreen dsDNA Quantitation Assay (Invitrogen Corporation) in quadruplicate. The samples were diluted to 10 nM, denatured with NaOH, and then 2.3 pM of each target-enriched sample was loaded into separate lanes (lane 1 to lane 4) of the same flow cell. Sequencing was performed for 36 cycles by using Illumina Single-Read Cluster Generation Kit and 36 Cycle Sequencing Kit according to manufacturer's instructions.
Mapping, coverage uniformity, and SNP detection
The sequencing data produced by Illumina GAII were processed through the Illumina pipeline v1.3 by using default parameters. For all analyses, the high-quality filtered reads were mapped to the reference sequence (NCBI Build 36.1, UCSC hg18), by using MAQ v0.71  default parameters, except for the allowing of three mismatches during alignment (-n 3). SNP calling was performed by using the Perl-based SNP filter of MAQ after alignment (-map), assembly (-assemble), and consensus calling (-cns2snp). We used the default parameters for both SNP calling and alignment, with the exception of variant quality score of 30 or more and a read depth of 5 or greater. Variants with less than five reads or a quality of less than 30 were marked as no calls. Sequence variants were compared with microarray genotypes generated for both samples (NA15510 and HE00069) by using the Illumina 1 M Infinium bead arrays according to manufacturer's instructions. Illumina 1 M genotypes were converted to reference strand positive from dbSNP forward Bead Studio reports. We removed 15 SNPs from the reports because of discrepancies between Illumina genotypes and dbSNPs reported strands and alleles. Calls were considered discordant regardless of the type of discordance; for example, an AB to AA error affected the concordance score the same as an AA to BB error. All coverage calculations were performed with a combination of custom Perl scripts and the statistics package R. Coverage and uniformity calculations were performed by using the 3.9 Mb of targeted sequence. Mean coverage was calculated by total bases on target divided by 3,886,910 (total bases captured). Normalized coverage for each base was calculated by dividing the coverage at that base by the mean coverage for the sample. For functional analysis of the variants, the variants in the two replicates were combined, and only those with matching high-quality calls in both samples were considered for analysis. Variants were processed through the SIFT program  to determine their functional role.
Additional data files
The following additional data are available with the online version of this article: an Excel file listing the three contiguous genomic intervals that we have targeted in this study (Additional data file 1), an Excel file listing the genes and ECS that we have targeted in this study (Additional data file 2), the 33 discordant positions and the variant characteristics in each targeted sequencing experiment (Additional data file 3), and an Excel file listing the probes that we have designed by using eArray) in this study (Additional data file 4).
evolutionarily conserved sequence
This work was partly funded by NIH CTSA grant 1U54RR025204-01. MN was supported by Japan Foundation for Aging and Health. We thank Karrie Trevarthen for excellent technical assistance and Sarah Murray and Greg Cooper (University of Washington) for providing the NA15510 and HE00069 Illumina 1 M genotype data.
- Frazer KA, Murray SS, Schork NJ, Topol EJ: Human genetic variation and its contribution to complex traits. Nat Rev Genet. 2009, 10: 241-251. 10.1038/nrg2554.PubMedView ArticleGoogle Scholar
- Yeager M, Xiao N, Hayes RB, Bouffard P, Desany B, Burdett L, Orr N, Matthews C, Qi L, Crenshaw A, Markovic Z, Fredrikson KM, Jacobs KB, Amundadottir L, Jarvie TP, Hunter DJ, Hoover R, Thomas G, Harkins TT, Chanock SJ: Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers. Hum Genet. 2008, 124: 161-170. 10.1007/s00439-008-0535-3.PubMedPubMed CentralView ArticleGoogle Scholar
- Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, et al: Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008, 455: 1069-1075. 10.1038/nature07423.PubMedPubMed CentralView ArticleGoogle Scholar
- Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008, 455: 1061-1068. 10.1038/nature07385.View ArticleGoogle Scholar
- Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song X, Richmond TA, Middle CM, Rodesch MJ, Packard CJ, Weinstock GM, Gibbs RA: Direct selection of human genomic loci by microarray hybridization. Nat Methods. 2007, 4: 903-905. 10.1038/nmeth1111.PubMedView ArticleGoogle Scholar
- Okou DT, Steinberg KM, Middle C, Cutler DJ, Albert TJ, Zwick ME: Microarray-based genomic selection for high-throughput resequencing. Nat Methods. 2007, 4: 907-909. 10.1038/nmeth1109.PubMedView ArticleGoogle Scholar
- Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ, Albert TJ, Hannon GJ, McCombie WR: Genome-wide in situ exon capture for selective resequencing. Nat Genet. 2007, 39: 1522-1527. 10.1038/ng.2007.42.PubMedView ArticleGoogle Scholar
- Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C: Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009, 27: 182-189. 10.1038/nbt.1523.PubMedPubMed CentralView ArticleGoogle Scholar
- Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J: Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009, 461: 272-276. 10.1038/nature08250.PubMedPubMed CentralView ArticleGoogle Scholar
- Vijg J, Campisi J: Puzzles, promises and a cure for ageing. Nature. 2008, 454: 1065-1071. 10.1038/nature07216.PubMedPubMed CentralView ArticleGoogle Scholar
- Aguilaniu H, Durieux J, Dillin A: Metabolism, ubiquinone synthesis, and longevity. Genes Dev. 2005, 19: 2399-2406. 10.1101/gad.1366505.PubMedView ArticleGoogle Scholar
- Guarente L, Kenyon C: Genetic pathways that regulate ageing in model organisms. Nature. 2000, 408: 255-262. 10.1038/35041700.PubMedView ArticleGoogle Scholar
- Kenyon C: The plasticity of aging: insights from long-lived mutants. Cell. 2005, 120: 449-460. 10.1016/j.cell.2005.02.002.PubMedView ArticleGoogle Scholar
- Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, Wareham N, Ahmed S, Healey CS, Bowman R, Meyer KB, Haiman CA, Kolonel LK, Henderson BE, Le Marchand L, Brennan P, Sangrajrang S, Gaborieau V, Odefrey F, Shen CY, Wu PE, Wang HC, Eccles D, Evans DG, Peto J, Fletcher O, et al: Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007, 447: 1087-1093. 10.1038/nature05887.PubMedPubMed CentralView ArticleGoogle Scholar
- Kiemeney LA, Thorlacius S, Sulem P, Geller F, Aben KK, Stacey SN, Gudmundsson J, Jakobsdottir M, Bergthorsson JT, Sigurdsson A, Blondal T, Witjes JA, Vermeulen SH, Hulsbergen-van de Kaa CA, Swinkels DW, Ploeg M, Cornel EB, Vergunst H, Thorgeirsson TE, Gudbjartsson D, Gudjonsson SA, Thorleifsson G, Kristinsson KT, Mouy M, Snorradottir S, Placidi D, Campagna M, Arici C, Koppova K, Gurzau E, et al: Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nat Genet. 2008, 40: 1307-1312. 10.1038/ng.229.PubMedPubMed CentralView ArticleGoogle Scholar
- Gudmundsson J, Sulem P, Manolescu A, Amundadottir LT, Gudbjartsson D, Helgason A, Rafnar T, Bergthorsson JT, Agnarsson BA, Baker A, Sigurdsson A, Benediktsdottir KR, Jakobsdottir M, Xu J, Blondal T, Kostic J, Sun J, Ghosh S, Stacey SN, Mouy M, Saemundsdottir J, Backman VM, Kristjansson K, Tres A, Partin AW, Albers-Akkers MT, Godino-Ivan Marcos J, Walsh PC, Swinkels DW, Navarrete S, et al: Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet. 2007, 39: 631-637. 10.1038/ng1999.PubMedView ArticleGoogle Scholar
- Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, Wang Z, Welch R, Staats BJ, Calle EE, Feigelson HS, Thun MJ, Rodriguez C, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Giovannucci E, Willett WC, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Gelmann EP, Tucker M, Gerhard DS, et al: Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007, 39: 645-649. 10.1038/ng2022.PubMedView ArticleGoogle Scholar
- Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, Jonasdottir A, Sigurdsson A, Baker A, Palsson A, Masson G, Gudbjartsson DF, Magnusson KP, Andersen K, Levey AI, Backman VM, Matthiasdottir S, Jonsdottir T, Palsson S, Einarsdottir H, Gunnarsdottir S, Gylfason A, Vaccarino V, Hooper WC, Reilly MP, Granger CB, Austin H, Rader DJ, Shah SH, Quyyumi AA, Gulcher JR, et al: A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science. 2007, 316: 1491-1493. 10.1126/science.1142842.PubMedView ArticleGoogle Scholar
- McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Tybjaerg-Hansen A, Folsom AR, Boerwinkle E, Hobbs HH, Cohen JC: A common allele on chromosome 9 associated with coronary heart disease. Science. 2007, 316: 1488-1491. 10.1126/science.1142447.PubMedPubMed CentralView ArticleGoogle Scholar
- Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, et al: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007, 316: 1341-1345. 10.1126/science.1142382.PubMedPubMed CentralView ArticleGoogle Scholar
- Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS, McCarthy MI, Hattersley AT: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007, 316: 1336-1341. 10.1126/science.1142364.PubMedPubMed CentralView ArticleGoogle Scholar
- Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D, Almgren P, Florez JC, Meyer J, Ardlie K, Bengtsson Bostrom K, Isomaa B, Lettre G, Lindblad U, Lyon HN, Melander O, Newton-Cheh C, Nilsson P, Orho-Melander M, Rastam L, Speliotes EK, Taskinen MR, Tuomi T, et al: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007, 316: 1331-1336. 10.1126/science.1142358.PubMedView ArticleGoogle Scholar
- Hardy J: ApoE, amyloid, and Alzheimer's disease. Science. 1994, 263: 454-455. 10.1126/science.8290946.PubMedView ArticleGoogle Scholar
- van Bockxmeer FM: Apolipoprotein E and Alzheimer's. Nature. 1995, 375: 285-10.1038/375285b0.PubMedView ArticleGoogle Scholar
- Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, Strait J, Duren WL, Maschio A, Busonero F, Mulas A, Albai G, Swift AJ, Morken MA, Narisu N, Bennett D, Parish S, Shen H, Galan P, Meneton P, Hercberg S, Zelenika D, Chen WM, Li Y, Scott LJ, Scheet PA, et al: Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008, 40: 161-169. 10.1038/ng.76.PubMedView ArticleGoogle Scholar
- Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, Burhans R, King DC, Baertsch R, Blankenberg D, Kosakovsky Pond SL, Nekrutenko A, Giardine B, Harris RS, Tyekucheva S, Diekhans M, Pringle TH, Murphy WJ, Lesk A, Weinstock GM, Lindblad-Toh K, Gibbs RA, Lander ES, Siepel A, Haussler D, Kent WJ: 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res. 2007, 17: 1797-1808. 10.1101/gr.6761107.PubMedPubMed CentralView ArticleGoogle Scholar
- Web-based probe design tool, eArray. [https://earray.chem.agilent.com/earray]
- Morgulis A, Gertz EM, Schaffer AA, Agarwala R: WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 2006, 22: 134-141. 10.1093/bioinformatics/bti774.PubMedView ArticleGoogle Scholar
- Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, Durbin R, Swerdlow H, Turner DJ: A large genome center's improvements to the Illumina sequencing system. Nat Methods. 2008, 5: 1005-1010. 10.1038/nmeth.1270.PubMedPubMed CentralView ArticleGoogle Scholar
- Sakharkar MK, Chow VT, Kangueane P: Distributions of exons and introns in the human genome. In Silico Biol. 2004, 4: 387-393.PubMedGoogle Scholar
- Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski JR, Chinault C, Song XZ, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM, Margulies M, Weinstock GM, Gibbs RA, Rothberg JM: The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008, 452: 872-876. 10.1038/nature06884.PubMedView ArticleGoogle Scholar
- Ng PC, Henikoff S: SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003, 31: 3812-3814. 10.1093/nar/gkg509.PubMedPubMed CentralView ArticleGoogle Scholar
- Ng PC, Levy S, Huang J, Stockwell TB, Walenz BP, Li K, Axelrod N, Busam DA, Strausberg RL, Venter JC: Genetic variation in an individual human exome. PLoS Genet. 2008, 4: e1000160-10.1371/journal.pgen.1000160.PubMedPubMed CentralView ArticleGoogle Scholar
- Coriell Institute for Medical Research. [http://www.coriell.org]
- Scripps Translational Science Institute. [http://www.stsiweb.org]
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18: 1851-1858. 10.1101/gr.078212.108.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited.