Skip to main content

Table 1 Gold standard sets

From: Sequence-based prediction of protein-protein interactions by means of codon usage

Organism GSTD References No. of ORFs No. of ORF pairs Comments/details
S. cerevisiae P [13, 22] 732 3,400 Derived from MIPS [42] complex catalog. We excluded ribosomal proteins to avoid bias towards extreme codon usage similarity of their genes
  N [13, 22] 2,760 1,442,691 Pairs of proteins that are not localized in the same cell compartment. We excluded ribosomal proteins
P. falciparum P [43] 352 7,689 Protein pairs within the same KEGG [19] pathway
  N [43] 354 27,367 Protein pairs with KEGG information, excluding pairs in the gold standard positive set
E. coli P [44] 2,196 7,063 Pull-down assay using a His-tagged ORF library
  N - 3,703 4,437,833 We compiled a set of protein pairs that are not in the gold standard positive set, given that at least one protein from each pair is copurified with an associate protein by Arifuzzaman et al. [44]
  1. Each set comprises only ORFs that could be associated with their genomic sequences using the names that were provided in the original references. Self interactions were considered in neither the training nor the testing process. GSTD, gold standard dataset; N, negative; P, positive.