Open Access

Universality in large-scale structure of complete genomes

  • Li-Ching Hsieh1, 2,
  • Ta-Yuan Chen1,
  • Chang-Heng Chang1,
  • Wen-Lang Fan1 and
  • Hoong-Chien Lee1, 2, 3Email author
Genome Biology20045:P7

Received: 26 January 2004

Published: 28 January 2004


The abundance of duplications in genomes in the form of paralogs, pseudogenes and a variety of repeats suggests that genomes may have used duplications as one mode for their growth. However a systematic knowledge on all possible duplications in whole genomes is still lacking. This paper reports the results of a detailed study of occurrence frequencies of short oligonucleotides in all extant complete genomes. We found a systematic pattern of repeats of short oligonucleotides that places all the complete genomes except Plasmodium in a single universality class expressed by an extremely simple formula. Our analysis of the data combined with computer simulation of genome growth models suggest a simple coarse-grain representation of genome growth: the ancestors of the genomes began to grow when they were no greater than 300 b in length via a mechanism whose main components were neutral stochastic segmental replicative translocations and random small mutations.