CpG island density and its correlations with genomic features in mammalian genomes
© Han et al.; licensee BioMed Central Ltd. 2008
Received: 7 April 2008
Accepted: 13 May 2008
Published: 13 May 2008
CpG islands, which are clusters of CpG dinucleotides in GC-rich regions, are considered gene markers and represent an important feature of mammalian genomes. Previous studies of CpG islands have largely been on specific loci or within one genome. To date, there seems to be no comparative analysis of CpG islands and their density at the DNA sequence level among mammalian genomes and of their correlations with other genome features.
In this study, we performed a systematic analysis of CpG islands in ten mammalian genomes. We found that both the number of CpG islands and their density vary greatly among genomes, though many of these genomes encode similar numbers of genes. We observed significant correlations between CpG island density and genomic features such as number of chromosomes, chromosome size, and recombination rate. We also observed a trend of higher CpG island density in telomeric regions. Furthermore, we evaluated the performance of three computational algorithms for CpG island identifications. Finally, we compared our observations in mammals to other non-mammal vertebrates.
Our study revealed that CpG islands vary greatly among mammalian genomes. Some factors such as recombination rate and chromosome size might have influenced the evolution of CpG islands in the course of mammalian evolution. Our results suggest a scenario in which an increase in chromosome number increases the rate of recombination, which in turn elevates GC content to help prevent loss of CpG islands and maintain their density. These findings should be useful for studying mammalian genomes, the role of CpG islands in gene function, and molecular evolution.
CpG islands (CGIs) are clusters of CpG dinucleotides in GC-rich regions and represent an important feature of mammalian genomes . Mammalian genomic DNA generally shows a great deficit of CpG dinucleotides, for example, the ratio of the observed over the expected CpGs (ObsCpG/ExpCpG) is approximately 0.20-0.25 in the human and mouse genomes [2–4]. This deficit is largely attributed to the hypermutability of methylated CpGs to TpGs (or CpAs in the complementary strand) [5, 6]. In comparison, CpGs in CGIs are often unmethylated and their frequencies are close to random expectation (for example, ObsCpG/ExpCpG = ~0.8 in the promoter-associated CGIs ). CGIs are often associated with the 5' end of genes and considered as gene markers [8, 9]. However, a comparison of the human, mouse, and rat genomes indicated that, although these three genomes encode similar numbers of genes, the number of CGIs in the mouse (15,500) or rat (15,975) genome is far fewer than that (27,000) identified in the non-repetitive portions of the human genome [10–12]. The difference is probably due to a faster rate of loss of CGIs in the rodent lineage, rather than faster gains of CGIs in the human lineage [7, 9]. However, it remains unclear whether the loss-of-CGI model holds for other mammalian genomes. Furthermore, to our best knowledge, there has been no comprehensive analysis of CGIs and their density at the DNA sequence level in mammals.
There are three major algorithms for identifying CGIs in a genomic sequence. The original algorithm was proposed by Gardiner-Garden and Frommer  in 1987; the three parameters are GC content >50%, ObsCpG/ExpCpG >0.60, and length >200 bp. This algorithm, often with some modifications, has been widely applied in the analysis of CGIs in single genes, small sets of genomic sequences, or single genomes. However, many repeats (for example, Alu), which are abundant in the vertebrate genome, also meet the criteria, so this algorithm has usually been used to scan CGIs only in non-repeat portions of the genome [2, 11, 12]. Second, Takai and Jones  evaluated the three parameters in Gardiner-Garden and Frommer's algorithm using human gene data and suggested an optimal set of parameters (GC content ≥55%, ObsCpG/ExpCpG ≥0.65, and length ≥500 bp). This algorithm can effectively exclude false positive CGIs from repeats and more likely identify CGIs associate with the 5' end of human genes; it seems to be suitable for other genomes too . Third, more recently, Hackenberg et al.  developed a new algorithm, namely CpGcluster, that entirely depends on the statistical significance of a CpG cluster from random sequences in the same chromosome. Because CpGcluster does not require a minimum length (for example, it identified CpG clusters as short as 8 bp) , it likely identifies many more CGIs (for example, 197,727 in the human genome) than other algorithms. In particular, CpGcluster may exaggerate the number of CGIs (that is, CpG clusters) in low GC-content chromosomes, which often have low gene density, because its CpG clusters were identified relative to the background (random) CpG property. Another similar CpG cluster algorithm identifies CpG clusters by requiring a minimum number of CpGs in each sequence fragment . Since loss of CGIs is likely an evolutionary trend in at least some genomes [7, 9, 17], CpGcluster may be able to identify those CGIs that have undergone degradation and thus can not meet the criteria of Takai and Jones' or Gardiner-Garden and Frommer's algorithms.
Our major aim is to survey extant CGIs (that is, CGIs that meet the three typical criteria: length, GC content, and ObsCpG/ExpCpG) and their distribution in today's genomes, rather than to identify regions that might originally be CGIs, even though they do not meet the three typical criteria. A comparative study of the features of such CGIs will be helpful for studying the evolution of CGIs and sequence composition changes in the course of genome evolution. Recent genome sequencing projects have released a number of mammalian genomes with good quality annotations, but only few non-mammalian vertebrate genomes. Thus, in this study we focused on the analysis and comparison of CGIs and their correlations with genomic features in mammalian genomes. For our aim, it is appropriate to apply the same CGI detection algorithm to screen CGIs in multiple genomes for comparison. According to the introduction of the three algorithms above, we selected Takai and Jones' algorithm as a major algorithm in this study.
We conducted a systematic survey of CGIs in ten sequenced mammalian genomes: eight completely sequenced eutherian genomes (human (Homo sapiens), chimpanzee (Pan troglodytes), macaque (Macaca mulatta), mouse (Mus musculus), rat (Rattus norvegicus), dog (Canis familiaris), cow (Bos taurus), and horse (Equus caballus)); one completely sequenced metatherian genome (opossum (Monodelphis domestica)); and one prototherian genome (platypus (Ornithorhynchus anatinus)) whose sequence was completed with a 6× coverage, though it has not been completely assembled. We also compared the observations from these mammals to seven other non-mammal vertebrates.
CGIs and CGI density in ten mammalian genomes
CpG islands and other genomic features in ten mammalian genomes
Number of chromosome pairs
Number of arms†
GC content (%)
Number of CGIs
CGI density (/Mb)
Avgerage length (bp)
GC content (%)
Correlations between CGI density and other genomic features
CGI densities in chromosomes with different sizes in nine mammalian genomes
Chromosome size (Mb)
Number of chromosomes
CGI density/Mb ± SD
29.7 ± 17.7
24.0 ± 13.2
21.7 ± 11.3
14.7 ± 7.4
11.7 ± 4.6
9.7 ± 2.6
9.4 ± 3.6
16.4 ± 10.5
The dog has overall smaller chromosomes and high CGI density, while the opossum has a few large chromosomes and low CGI density. To check whether our correlation analysis was largely driven by these two species, we performed a similar analysis but excluded the dog and opossum data. The same conclusion still held. For example, we found a significant correlation between CGI density and number of chromosome pairs (r = 0.75, P = 0.026) and a significant correlation between CGI density and log10(chromosome size) (r = -0.49, P = 5.9 × 10-12).
CGIs are considered gene markers, so they are expected to highly correlate with gene density [2, 22]. It is interesting to investigate whether the above correlation results still hold when gene information is excluded. We identified CGIs in the intergenic regions of nine mammalian genomes and found significant correlations between intergenic CGI density and log10(chromosome size) (r = -0.55, P = 7.3 × 10-19), GC content of the chromosome (r = 0.39, P = 8.6 × 10-10), and ObsCpG/ExpCpG (r = 0.67, P = 3.7 × 10-30). Details are shown in Additional data file 3.
Correlation between CGI density and genomic features in different human genomic regions
Gene-associated CGIs (24,228)
Intergenic CGIs (13,026)
Intragenic CGIs (12,136)
TSS CGIs (11,192)
3.9 × 10-3
3.4 × 10-3
3.1 × 10-3
7.0 × 10-3
1.7 × 10-8
2.9 × 10-8
1.9 × 10-7
5.4 × 10-10
1.5 × 10-10
8.3 × 10-10
2.5 × 10-10
1.0 × 10-9
Summary of correlations between CGI density and genomic features
Shown in figure
TJ (9 genomes)
7.9 × 10-4
2.6 × 10-16
Chromosome GC content
3.5 × 10-28
2.8 × 10-41
Genomic GC content
TJ (9 genomes, intergenic CGIs)
7.3 × 10-19
Chromosome GC content
8.6 × 10-10
3.7 × 10-30
TJ (10 genomes)
2.6 × 10-37
Chromosome GC content
3.7 × 10-29
1.5 × 10-81
GF (9 genomes)
2.0 × 10-4
1.3 × 10-25
Chromosome GC content
3.2 × 10-37
2.4 × 10-53
CpGcluster (9 genomes)
1.6 × 10-16
Chromosome GC content
5.5 × 10-24
CGI density and recombination rate
Recombination rate correlates with both the number of chromosomes and the number of chromosome arms, and elevates the GC content, probably via biased gene conversion [23, 24]. Fine-scale recombination rates vary extensively among populations [25, 26], genomic regions , or the homologous regions between two closely related organisms (human and chimpanzee) [28, 29], suggesting a rapid evolution of local pattern of recombination rates. Many genomic features, including CpG dinucleotide frequencies (but not CGIs or CGI density) in genomic sequences, have been employed to analyze the pattern of recombination rate. Here we examined specifically the relationship between CGI density and recombination rate at the genome level. We retrieved human recombination rate data (window size, 1 Mb, 2,772 windows) from the UCSC Genome Browser . We found a significant positive correlation between CGI density and recombination rate (r = 0.18, P = 1.1 × 10-22).
Correlation between CGI density and recombination rate in human, mouse and rat
Window size (Mb)
1.1 × 10-22
5.9 × 10-16
1.7 × 10-12
3.6 × 10-7
8.0 × 10-8
8.1 × 10-5
1.7 × 10-5
Comparison of CGIs in non-mammalian vertebrate genomes
To retrieve information on the CGIs in vertebrate genomes, we scanned CGIs in seven non-mammalian vertebrate genomes, including the chicken, lizard and five fish (tetraodon, medaka, zebrafish, stickleback and fugu) genomes. Except for lizard and fugu, all these genomes had assembled chromosomes.
CpG islands and other genomic features in non-mammalian genomes
Number of chromosome pairs
GC content (%)
Number of CGIs
CGI density (/Mb)
Avgerage length (bp)
GC content (%)
CGI densities in the five fish genomes varied to a much greater extent than in the mammalian genomes. The CGI densities in tetraodon (161.6 per Mb) and stickleback (157.8 per Mb) were about 11 times that in zebrafish (14.7 per Mb). The ObsCpG/ExpCpG ratios in the fish genomes (0.479-0.662) were also much higher than those (0.129-0.296) in the mammalian, the chicken (0.248) and the lizard (0.296) genomes. Fishes are cold-blooded vertebrates and lack GC-rich isochores . An early study found certain fish did not have elevated GC content in nonmethylated CGIs , so our comparison of CGIs in fishes should be taken with caution.
Influence of CGI identification algorithms
There are three major algorithms for identifying CGIs in a genomic sequence (reviewed in the Background). The major aim in this study is to investigate and compare the CGIs in today's mammalian genomes, rather than to identify CGIs in the mammalian ancestral sequences. Thus, our analysis may provide insights into how CGIs have evolved and their association with gene function and other genomic factors. Since CGIs have been widely documented to be approximately 1 kb long [2, 6], Takai and Jones' stringent criteria seem to be the most appropriate for our analysis. To assure the reliability of our analysis, we performed similar analysis using Gardiner-Garden and Frommer's algorithm (only on the non-repeat portions of the genomes) and CpGcluster with the ten mammalian genomes and seven other vertebrate genomes under study. The conclusions were the same; see detailed results in Table 4 and Additional data files 6 and 7. For example, there was a significant positive correlation between CGI density and chromosome number, using Gardiner-Garden and Frommer's algorithm (r = 0.92, P = 2.0 × 10-4; Additional data file 6) or CpGcluster (r = 0.81, P = 0.004; Additional data file 7).
However, we found that the number of CGIs identified by CpGcluster or Gardiner-Garden and Frommer's algorithm was remarkably larger than that identified by Takai and Jones' algorithm (Additional data file 8); for example, the numbers of CGIs identified in the human genome was 37,531 (Takai and Jones), 76,678 (Gardiner-Garden and Frommer), and 197,727 (CpGcluster). The number of genes was estimated to be approximately in the range 20,000-30,000 in mammalian genomes (Additional data file 1). Since CGIs have been widely considered as gene markers, both the Gardiner-Garden and Frommer algorithm and CpGcluster likely identified either many CGIs that are not associated with genes or multiple CGIs that share one gene. To address the latter case, we evaluated the length distribution of CGIs identified by the three algorithms. Among all these vertebrate genomes, the majority of CGIs identified by CpGcluster were shorter than 500 bp (Additional data file 8), which is the minimum length in Takai and Jones' algorithm. For example, the proportions of human CGIs identified by CpGcluster were 44.3% (<200 bp), 45.9% (200-500 bp), 7.3% (500-1,000 bp), 1.9% (1,000-1,500 bp), 0.4% (1,500-2,000 bp), and 0.2% (≥2,000 bp). For Gardiner-Garden and Frommer's algorithm, the proportion of CGIs shorter than 500 bp was also large, for example, 65.8% in the human CGIs and 64.8% in the opossum CGIs (Additional data file 8). Based on the evaluation above, we consider that our analysis using Takai and Jones' algorithm is the most reliable and appropriate, though further evaluation of species-specific algorithms may enhance our results.
Evolution of CGIs
It was hypothesized that CGIs arose once at the dawn of vertebrate evolution and vertebrate ancestral genes were embedded in entirely non-methylated DNA during the divergence of vertebrates . Genome-wide methylation has been found to be common in vertebrates (except for promoter-associated CGIs) and fractional methylation common in invertebrates. The transition from fractional to global methylation likely occurred around the origin of vertebrates . Many CGIs might have lost their typical features due to de novo methylation at their CpG sites and subsequent high deamination rates at the newly methylated CpG sites, leading to TpG and CpA dinucleotides. Excess of TpGs and CpAs as well as other vanishing CGI features (decreasing length, ObsCpG/ExpCpG ratio and GC content) has been found in the homologous gene regions, evidence of frequent CGI losses in mouse and human genes and a faster loss rate in mice [7, 9, 17]. Recent methylation studies revealed weak CGIs in promoter regions (promoters with intermediate CpG content, ICPs), most of which were not found in the CGI library, had a faster loss rate of CpGs than stronger CGIs (promoters with high CpG content, HCPs), suggesting that strong CGIs might be protected from methylation and are thus better conserved during evolution [22, 37, 38]. Using the data in Weber et al.  and Mikkelsen et al. , we found that HCP density has stronger correlations with genomic features than ICPs in both the human and mouse genomes. The CGIs identified by the Takai-Jones algorithm are different from HCPs or ICPs. However, when we separated the promoter-associated CGIs identified by the Takai-Jones algorithm into HCGIs (those that satisfied the HCP criteria) and non-HCGIs, we also found that HCGIs had stronger correlations with genomic features than non-HCGIs. This supports the observations from the methylation studies mentioned above. Although loss of CGIs is likely a major evolutionary scenario in mammals, little comparative analysis at the DNA sequence level has been performed yet, because CGIs have been thought to be poorly conserved between species [7, 9]. Our CGI analysis indicated that rodents have the lowest CGI density and most other eutherians have moderate CGI density when compared to platypus (Table 1). Platypus is one of the only three extant monotremes and has a fascinating mixture of features typical of mammals and of reptiles and birds. Monotremes (mammalian subclass Prototheria) are the oldest branch of the mammalian tree, diverging 210 million years ago from the therian mammals . Although the platypus genome is incomplete, its higher CGI density is likely true because high frequencies of GC and CG dinucleotides and high GC content have been reported . Further, our analysis of the chicken (bird) and green anole lizard genomic sequences, the only reptilian genome available at present, showed higher CGI density than most of the therians (except dogs) we examined. These data support an overall decrease in CGIs in mammalian genomes.
Below we discuss specific CGI features of a few species. The low number of CGIs in the rodent genome is likely due to a much higher rate of CGI loss and a weaker selective constraint in the rodent lineage [7, 17]. Interestingly, the dog has a notably large number of CGIs and high CGI density among the nine therians investigated. Our further analysis revealed that the difference is due to the substantial enrichment of CGIs in dog's intergenic and intronic regions, while the number of CGIs associated with the 5' end of genes is similar to the human and the mouse (data not shown). Whether and how CGIs have accumulated in dog requires further investigation. It is also worth noting that opossum, which belongs to metatheria, is another evolutionarily ancient lineage of mammals. The CGI density is very low (7.5 per Mb). This is likely attributed to its large chromosomes (Table 1), as large chromosomes are correlated with low CGI density (Figure 1). Large chromosomes reduce recombination rate, which has a positive correlation with CGI density (Figure 2).
Other possible factors that might influence CGI density
This study represents a systematic comparative genomic analysis of CGIs and CGI density at the DNA sequence level in mammals. It reveals significant correlations between CGI density and genomic features such as number of chromosome pairs, chromosome size, and recombination rate. Our results suggest a genome evolution scenario in which an increase in chromosome number increases the rate of recombination, which in turn elevates GC content to help prevent loss of CGIs and maintain CGI density. We compared CGI features in other non-mammalian vertebrates and discussed other factors such as body temperature and lifespan that have previously been speculated to influence sequence composition evolution.
Materials and methods
Genome sequences and genome information
Names and sequence information of ten mammals and other vertebrates
Green anole lizard‡
Identification of CpG islands
We used three algorithms to identify CGIs. First, we used the stringent search criteria in the Takai and Jones algorithm : GC content ≥55%, ObsCpG/ExpCpG ≥0.65, and length ≥500 bp. Second, we used the algorithm originally developed by Gardiner-Garden and Frommer : GC content >50%, ObsCpG/ExpCpG >0.60, and length >200 bp. Because some repeats (for example, Alu) meet these criteria, we scanned CGIs in the non-repeat portions of these genomes only, as similarly done in other genome-wide identification studies [2, 11]. For both the Takai and Jones and the Gardiner-Garden and Frommer algorithms, we used the CpG island searcher program (CpGi130) available at . Third, we used CpGcluster developed by Hackenberg et al.  to scan CGIs in the whole genome.
We used the method of Jiang and Zhao  to identify CGIs in different genomic regions (genes, intergenic regions, intragenic regions, and TSS regions). Briefly, we compared the locations of CGIs with the coordinates of genic, intergenic, and intragenic regions and TSSs based on the human gene annotation information from the NCBI database (build 35.1) [44, 49]. CGIs overlapped with any genes were classified as gene-associated CGIs; CGIs whose whole sequences were in intergenic regions were classified as intergenic CGIs; CGIs whose sequences were in gene regions were classified as intragenic CGIs; and CGIs overlapped with TSSs were classified as TSS CGIs.
Recombination rate and CGI density
We retrieved human recombination rate data based on the deCODE genetic map  from the UCSC Genome Browser . The recombination rates were measured in 1 Mb windows. We obtained another set of recombination rates from Jensen-Seaman et al. . These data were measured in 5 Mb and 10 Mb windows for the human, mouse and rat and are available in the supplementary material for Jensen-Seaman et al. . For both datasets, we discarded those regions having more than 50% 'N's . We also discarded those regions whose recombination rates were 0 because of too few genetic markers found in these regions .
Body temperature and lifespan in mammals
Records of body temperature in a species may vary to some extent in the literature because they might be measured in different environments (for example, time of day, season, or geographical location) or different sites of the body. The body temperatures of ten mammals in this study were obtained from the literature (details are shown in Additional data file 9). When a species has a range of body temperatures in the literature, the average was used as the representative temperature. There are several measurements of lifespan, such as maximum lifespan, average lifespan, and lifespan of each sex. We used maximum lifespan, which was based on reports in the literature and from the AnAge database  (Additional data file 9).
Additional data files
The following additional data are available. Additional data file 1 is a table that lists the numbers of genes estimated in mammalian genomes. Additional data file 2 shows the correlations between CGI density and genomic features in ten mammalian genomes (including platypus). Additional data file 3 shows the correlations between intergenic CGI density and genomic features in nine mammalian genomes. Additional data file 4 shows the correlations between CGI density and average recombination rate (cM/Mb) in the human, mouse and rat genomes. Additional data file 5 provides the comparison of CpG islands and other genomic features between mammalian and non-mammalian genomes. Additional data file 6 shows the correlations between CGI density and genomic features in mammalian genomes using the Gardiner-Garden and Frommer algorithm in the non-repeat portions of genomes. Additional data file 7 shows the correlations between CGI density and genomic features in mammalian genomes using the CpGcluster algorithm. Additional data file 8 lists the numbers of CGIs in each genome identified by the three algorithms and shows their length distribution. Additional data file 9 lists the body temperature and lifespan for each species.
CGI satisfying the HCP criteria
high CpG content promoter
intermediate CpG content promoter
transcriptional start site.
We thank the two anonymous reviewers for valuable comments. We are grateful to Dr John Speakman for suggestions on estimating lifespan and body temperature. This project was supported by the Thomas F and Kate Miller Jeffress Memorial Trust Fund and a NARSAD Young Investigator Award to Z Zhao and NIH grants to WH Li.
- Bird AP: CpG-rich islands and the function of DNA methylation. Nature. 1986, 321: 209-213. 10.1038/321209a0.PubMedView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.PubMedView ArticleGoogle Scholar
- Zhao Z, Zhang F: Sequence context analysis in the mouse genome: Single nucleotide polymorphisms and CpG island sequences. Genomics. 2006, 87: 68-74. 10.1016/j.ygeno.2005.09.012.PubMedView ArticleGoogle Scholar
- Zhao Z, Zhang F: Sequence context analysis of 8.2 million single nucleotide polymorphisms in the human genome. Gene. 2006, 366: 316-324. 10.1016/j.gene.2005.08.024.PubMedView ArticleGoogle Scholar
- Bird AP: DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980, 8: 1499-1504. 10.1093/nar/8.7.1499.PubMedPubMed CentralView ArticleGoogle Scholar
- Antequera F: Structure, function and evolution of CpG island promoters. Cell Mol Life Sci. 2003, 60: 1647-1658. 10.1007/s00018-003-3088-6.PubMedView ArticleGoogle Scholar
- Jiang C, Han L, Su B, Li WH, Zhao Z: Features and trend of loss of promoter-associated CpG islands in the human and mouse genomes. Mol Biol Evol. 2007, 24: 1991-2000. 10.1093/molbev/msm128.PubMedView ArticleGoogle Scholar
- Bird AP: CpG islands as gene markers in the vertebrate nucleus. Trends Genet. 1987, 3: 342-347. 10.1016/0168-9525(87)90294-0.View ArticleGoogle Scholar
- Antequera F, Bird A: Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci USA. 1993, 90: 11995-11999. 10.1073/pnas.90.24.11995.PubMedPubMed CentralView ArticleGoogle Scholar
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, et al: The sequence of the human genome. Science. 2001, 291: 1304-1351. 10.1126/science.1058040.PubMedView ArticleGoogle Scholar
- Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticleGoogle Scholar
- Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera , Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, et al: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428: 493-521. 10.1038/nature02426.PubMedView ArticleGoogle Scholar
- Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J Mol Biol. 1987, 196: 261-282. 10.1016/0022-2836(87)90689-9.PubMedView ArticleGoogle Scholar
- Takai D, Jones PA: Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci USA. 2002, 99: 3740-3745. 10.1073/pnas.052410099.PubMedPubMed CentralView ArticleGoogle Scholar
- Hackenberg M, Previti C, Luque-Escamilla PL, Carpena P, Martinez-Aroza J, Oliver JL: CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics. 2006, 7: 446-10.1186/1471-2105-7-446.PubMedPubMed CentralView ArticleGoogle Scholar
- Glass JL, Thompson RF, Khulan B, Figueroa ME, Olivier EN, Oakley EJ, Zant GV, Bouhassira EE, Melnick A, Golden A, Fazzari MJ, Greally JM: CG dinucleotide clustering is a species-specific property of the genome. Nucleic Acids Res. 2007, 35: 6798-6807. 10.1093/nar/gkm489.PubMedPubMed CentralView ArticleGoogle Scholar
- Matsuo K, Clay O, Takahashi T, Silke J, Schaffner W: Evidence for erosion of mouse CpG islands during mammalian evolution. Somat Cell Mol Genet. 1993, 19: 543-555. 10.1007/BF01233381.PubMedView ArticleGoogle Scholar
- Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, Zody MC, Mauceli E, Xie X, Breen M, Wayne RK, Ostrander EA, Ponting CP, Galibert F, Smith DR, DeJong PJ, Kirkness E, Alvarez P, Biagi T, Brockman W, Butler J, Chin CW, Cook A, Cuff J, Daly MJ, DeCaprio D, Gnerre S, et al: Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005, 438: 803-819. 10.1038/nature04338.PubMedView ArticleGoogle Scholar
- International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.View ArticleGoogle Scholar
- Grutzner F, Graves JA: A platypus' eye view of the mammalian genome. Curr Opin Genet Dev. 2004, 14: 642-649. 10.1016/j.gde.2004.09.006.PubMedView ArticleGoogle Scholar
- McQueen HA, Fantes J, Cross SH, Clark VH, Archibald AL, Bird AP: CpG islands of chicken are concentrated on microchromosomes. Nat Genet. 1996, 12: 321-324. 10.1038/ng0396-321.PubMedView ArticleGoogle Scholar
- Illingworth R, Kerr A, Desousa D, Jorgensen H, Ellis P, Stalker J, Jackson D, Clee C, Plumb R, Rogers J, Humphray S, Cox T, Langford C, Bird A: A novel CpG island set identifies tissue-specific methylation at developmental gene loci. PLoS Biol. 2008, 6: e22-10.1371/journal.pbio.0060022.PubMedPubMed CentralView ArticleGoogle Scholar
- Pardo-Manuel de Villena F, Sapienza C: Recombination is proportional to the number of chromosome arms in mammals. Mamm Genome. 2001, 12: 318-322. 10.1007/s003350020005.PubMedView ArticleGoogle Scholar
- Meunier J, Duret L: Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol. 2004, 21: 984-990. 10.1093/molbev/msh070.PubMedView ArticleGoogle Scholar
- Evans DM, Cardon LR: A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations. Am J Hum Genet. 2005, 76: 681-687. 10.1086/429274.PubMedPubMed CentralView ArticleGoogle Scholar
- McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P: The fine-scale structure of recombination rate variation in the human genome. Science. 2004, 304: 581-584. 10.1126/science.1092500.PubMedView ArticleGoogle Scholar
- Myers S, Bottolo L, Freeman C, McVean G, Donnelly P: A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005, 310: 321-324. 10.1126/science.1117196.PubMedView ArticleGoogle Scholar
- Ptak SE, Hinds DA, Koehler K, Nickel B, Patil N, Ballinger DG, Przeworski M, Frazer KA, Pääbo S: Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet. 2005, 37: 429-434. 10.1038/ng1529.PubMedView ArticleGoogle Scholar
- Winckler W, Myers SR, Richter DJ, Onofrio RC, McDonald GJ, Bontrop RE, McVean GA, Gabriel SB, Reich D, Donnelly P, Altshuler D: Comparison of fine-scale recombination rates in humans and chimpanzees. Science. 2005, 308: 107-111. 10.1126/science.1105322.PubMedView ArticleGoogle Scholar
- UCSC Genome Browser. [http://genome.ucsc.edu/]
- Jensen-Seaman MI, Furey TS, Payseur BA, Lu Y, Roskin KM, Chen C-F, Thomas MA, Haussler D, Jacob HJ: Comparative recombination rates in the rat, mouse, and human genomes. Genome Res. 2004, 14: 528-538. 10.1101/gr.1970304.PubMedPubMed CentralView ArticleGoogle Scholar
- McQueen HA, Siriaco G, Bird AP: Chicken microchromosomes are hyperacetylated, early replicating, and gene rich. Genome Res. 1998, 8: 621-630.PubMedPubMed CentralGoogle Scholar
- International Chicken Genome Sequencing Consortium: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432: 695-716. 10.1038/nature03154.View ArticleGoogle Scholar
- Bernardi G, Bernardi G: Compositional transitions in the nuclear genomes of cold-blooded vertebrates. J Mol Evol. 1990, 31: 282-293. 10.1007/BF02101123.PubMedView ArticleGoogle Scholar
- Cross S, Kovarik P, Schmidtke J, Bird A: Non-methylated islands in fish genomes are GC-poor. Nucleic Acids Res. 1991, 19: 1469-1474. 10.1093/nar/19.7.1469.PubMedPubMed CentralView ArticleGoogle Scholar
- Tweedie S, Charlton J, Clark V, Bird A: Methylation of genomes and genes at the invertebrate-vertebrate boundary. Mol Cell Biol. 1997, 17: 1469-1475.PubMedPubMed CentralView ArticleGoogle Scholar
- Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, Schübeler D: Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007, 39: 457-466. 10.1038/ng1990.PubMedView ArticleGoogle Scholar
- Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, Lee W, Mendenhall E, O'Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, Bernstein BE: Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007, 448: 553-560. 10.1038/nature06008.PubMedPubMed CentralView ArticleGoogle Scholar
- Varriale A, Bernardi G: DNA methylation and body temperature in fishes. Gene. 2006, 385: 111-121. 10.1016/j.gene.2006.05.031.PubMedView ArticleGoogle Scholar
- Bernardi G: The neoselectionist theory of genome evolution. Proc Natl Acad Sci USA. 2007, 104: 8385-8390. 10.1073/pnas.0701652104.PubMedPubMed CentralView ArticleGoogle Scholar
- Eskes T, Haanen C: Why do women live longer than men?. Eur J Obstet Gynecol Reprod Biol. 2007, 133: 126-133. 10.1016/j.ejogrb.2007.01.006.PubMedView ArticleGoogle Scholar
- Brown-Borg HM: Hormonal regulation of aging and life span. Trends Endocrinol Metab. 2003, 14: 151-153. 10.1016/S1043-2760(03)00051-1.PubMedView ArticleGoogle Scholar
- Brown-Borg HM: Hormonal regulation of longevity in mammals. Ageing Res Rev. 2007, 6: 28-45. 10.1016/j.arr.2007.02.005.PubMedPubMed CentralView ArticleGoogle Scholar
- NCBI RefSeq Database. [ftp://ftp.ncbi.nih.gov/genomes/]
- Olson SA: EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. Brief Bioinform. 2002, 3: 87-91. 10.1093/bib/3.1.87.PubMedView ArticleGoogle Scholar
- Ensembl. [http://www.ensembl.org/]
- CpG Island Searcher Program. [http://cpgislands.usc.edu/]
- Jiang C, Zhao Z: Mutational spectrum in the recent human genome inferred by single nucleotide polymorphisms. Genomics. 2006, 88: 527-534. 10.1016/j.ygeno.2006.06.003.PubMedView ArticleGoogle Scholar
- Zhao Z, Jiang C: Methylation-dependent transition rates are dependent on local sequence lengths and genomic regions. Mol Biol Evol. 2007, 24: 23-25. 10.1093/molbev/msl156.PubMedView ArticleGoogle Scholar
- Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K: A high-resolution recombination map of the human genome. Nat Genet. 2002, 31: 241-247.PubMedGoogle Scholar
- AnAge Database. [http://genomics.senescence.info/species/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.