Number of domain classes versus genome size. (a) Plot of empirical data for 327 bacteria, 75 eukaryotes, and 27 archaeal genomes. Data refer to superfamily domain classes from the SUPERFAMILY database . Larger data points indicate specific examples. Data on SCOP folds follow the same trend (section A2 in Additional data file 1). (b) Comparison of data on prokaryotes (red circles) with simulations of 500 realizations of different variants of the model (yellow, grey, and green shaded areas in the different panels), for fixed parameter values. Data on archaea are shown as squares. α = 0 (left panel, graph in log-linear scale) gives a trend that is more compatible with the observed scaling than α > 0 (middle panel). However, the empirical distribution of folds in classes is quantitatively more in agreement with α > 0 (Table 1 and Figure 2). The model that breaks the symmetry between domain classes and includes specific selection of domain classes (right panel) predicts a saturation of this curve even for high values of α, resolving this quantitative conflict. (c) Usage profile of SUPERFAMILY domain classes in prokaryotes, used to generate the cost function in the model with specificity. On the x-axis, domain families are ordered by the fraction of genomes they occur in. The y-axis reports their occurrence fraction. The red lines indicate occurrence in all or none of the prokarotic genomes of the data set.