Table 7 Information extracted from different data sources

From: Computational prediction of human metabolic pathways from the complete human genome

Data source (version) Information extracted (for each gene or locus) Number of genes
   Obtained Nonredundant
Ensembl (Build 31) Gene name, chromosome or contig, start and end positions, strand (transcription direction), exons, gene-product (including function name(s) or description(s), synonyms and EC number(s)), cross references (IDs) to other databases (SwissProt, HUGO, PDB, GO, RefSeq, OMIM, Entrez, SPTREMBL, EMBL, LocusLink). 24,847  
LocusLink (03/29/2003) Gene name, chromosome, gene product (function name or description), function synonyms, EC number(s), gene and protein comments, cross references (IDs) to other databases (Entrez, UCSC Genome, RefSeq, GO, OMIM, UniGene, PubMed) 18,880 3,936
GenBank NC_001807 (mitochondrion) Gene name, start and end positions, transcription direction, gene product (function name or description) 35  
  1. Functional information in Ensembl had to be extensively parsed to extract multiple functions, EC numbers, and/or synonyms. The 'nonredundant' column shows the number of genes from LocusLink that had no corresponding gene in the other two data sources (Ensembl and GenBank).