Table 1 Erroneously duplicated sequences in vertebrae genomes

  Gallus gallus(chicken) Pan troglodytes(chimpanzee) Bos taurus(cow) Canis familiaris(dog)
Assembled genome size 1.00 Gb 2.89 Gb 2.57 Gb 2.33 Gb
DCCs 4,418 (7.6 Mb) 5,467 (8.0 Mb) 1,297 (3.71 Mb) 80 (170 Kb)
Mis-assembled DCCs 2,303 (3.61 Mb) 2,298 (2.97 Mb) 394 (1.18 Mb) 2 (1.8 Kb)
DOCs 5,947 (11.2 Mb) 13,571 (14.1 Mb) 1,366 (1.88 Mb) 22 (34.7 Kb)
Mis-assembled DOCs 5,698 (10.8 Mb) 13,159 (13.7 Mb) 1,094 (1.09 Mb) 8 (7.9 Kb)
Total mis-assembled contigs 8,001 (14.4 Mb) 15,457 (16.7 Mb) 1,488 (2.27 Mb) 10 (9.7 Kb)
  1. Genome sizes were determined by summing the lengths of all contigs and linked gaps in each assembly. Duplicated contained contigs (DCCs) include all contigs that aligned to nearby sequence where the contig is completely contained within another contig, as shown in Figure 1b. Mis-assembled DCCs are the subset of DCCs that we identified by mate pairs as erroneous duplications (assembly errors). Duplicated overlapping contigs (DOCs) include all pairs of nearby contigs that overlap at their ends, followed again by the subset found to have more consistent mate pairs when merged. Contigs that were not designated as mis-assembled either had consistent mate pairs in their original location, or else lacked sufficient mate-pair data to make a determination. Note that this analysis used the UMD 1.6 version of the Bos taurus genome, and based on these results, erroneous duplications were removed from the published UMD 2.0 assembly.