Skip to main content

Table 1 Data collection description: summary of the data sources

From: A critical assessment of Mus musculusgene function prediction using integrated genomic evidence

Data type

Description

Representation

Gene expression

Expression data from oligonucleotide arrays for 13,566 genes across 55 mouse tissues (Zhang et al. [21])

Median-subtracted, arcsinh intensity measurements

 

Expression data from Affymetrix arrays for 18,208 genes across 61 mouse tissues (Su et al. [44])

gcRMA-condensed intensity measurements

 

Tag counts at quality 0.99 cut-off from 139 SAGE libraries for 16,726 genes [45]

Average and total tag counts

Sequence patterns

Protein sequence pattern annotations from Pfam-A (release 19) for 15,569 genes with 3,133 protein families [46]

Binary annotation patterns

 

Protein sequence pattern annotations from InterPro (release 12.1) for 16,965 genes with 5,404 sequence patterns [47]

Binary annotation patterns

Protein interactions

Protein-protein interactions from OPHID for 7,125 genes [28] (downloaded on 20 April 2006)

Binary interaction patterns and shortest path between genes

Phenotypes

Phenotype annotations from MGI for 3,439 genes with 33 phenotypes [48] (downloaded on 21 February 2006 from [49])

Binary annotation patterns

Conservation profile

Conservation pattern from Ensembl (v38) for 15,939 genes across 18 species [50]

Binary conservation patterns and conservation scores

 

Conservation pattern from Inparanoid (v4.0) for 15,703 genes across 21 species [51]

Binary conservation patterns and Inparanoid scores

Disease associations

Disease associations from OMIM for 1,938 genes to 2,488 diseases/phenotypes [52, 53] (downloaded on 6 June 2006 from [54])

Binary annotation patterns

  1. gcRMA, robust multi-array analysis with background adjustment for GC content of probes; OMIM, Online Mendelian Inheritance in Man; OPHID, Online Predicted Human Interaction Database; SAGE, serial analysis of gene expression.