Table 1 Data collection description: summary of the data sources

From: A critical assessment of Mus musculusgene function prediction using integrated genomic evidence

Data type Description Representation
Gene expression Expression data from oligonucleotide arrays for 13,566 genes across 55 mouse tissues (Zhang et al. [21]) Median-subtracted, arcsinh intensity measurements
  Expression data from Affymetrix arrays for 18,208 genes across 61 mouse tissues (Su et al. [44]) gcRMA-condensed intensity measurements
  Tag counts at quality 0.99 cut-off from 139 SAGE libraries for 16,726 genes [45] Average and total tag counts
Sequence patterns Protein sequence pattern annotations from Pfam-A (release 19) for 15,569 genes with 3,133 protein families [46] Binary annotation patterns
  Protein sequence pattern annotations from InterPro (release 12.1) for 16,965 genes with 5,404 sequence patterns [47] Binary annotation patterns
Protein interactions Protein-protein interactions from OPHID for 7,125 genes [28] (downloaded on 20 April 2006) Binary interaction patterns and shortest path between genes
Phenotypes Phenotype annotations from MGI for 3,439 genes with 33 phenotypes [48] (downloaded on 21 February 2006 from [49]) Binary annotation patterns
Conservation profile Conservation pattern from Ensembl (v38) for 15,939 genes across 18 species [50] Binary conservation patterns and conservation scores
  Conservation pattern from Inparanoid (v4.0) for 15,703 genes across 21 species [51] Binary conservation patterns and Inparanoid scores
Disease associations Disease associations from OMIM for 1,938 genes to 2,488 diseases/phenotypes [52, 53] (downloaded on 6 June 2006 from [54]) Binary annotation patterns
  1. gcRMA, robust multi-array analysis with background adjustment for GC content of probes; OMIM, Online Mendelian Inheritance in Man; OPHID, Online Predicted Human Interaction Database; SAGE, serial analysis of gene expression.