Predicting effective genome size from marker gene density. (a) Gene counts for various functional classes  and their relationship with genome size. Although counts of genes belonging to the categories T (signal transduction mechanisms), K (transcription) and J (translation, ribosome structure, and biogenesis) scale (to a greater or lesser extent) with genome size, the set of 35 universal, single-copy genes used in this study does not. (b) Calibration plane used to identify the relationship between marker gene density, read length, and genome size. The calibration was based on a simulated shotgun dataset of randomly extracted 'reads' from the sequenced genomes (see Materials and methods), because insufficient raw shotgun sequence data are currently available in the trace archives to allow a robust calibration based on 'real' data. Circles represent shotgun datasets. Circle fill color indicates the goodness of fit to the plane (blue = <1 standard deviation [SD], green = <2 SD, yellow = <3 SD, red = >3 SD). Circle border indicates position relative to plane (blue = above, red = below). OG, orthologous group.