Skip to main content

Table 1 Gold standard sets

From: Sequence-based prediction of protein-protein interactions by means of codon usage

Organism

GSTD

References

No. of ORFs

No. of ORF pairs

Comments/details

S. cerevisiae

P

[13, 22]

732

3,400

Derived from MIPS [42] complex catalog. We excluded ribosomal proteins to avoid bias towards extreme codon usage similarity of their genes

 

N

[13, 22]

2,760

1,442,691

Pairs of proteins that are not localized in the same cell compartment. We excluded ribosomal proteins

P. falciparum

P

[43]

352

7,689

Protein pairs within the same KEGG [19] pathway

 

N

[43]

354

27,367

Protein pairs with KEGG information, excluding pairs in the gold standard positive set

E. coli

P

[44]

2,196

7,063

Pull-down assay using a His-tagged ORF library

 

N

-

3,703

4,437,833

We compiled a set of protein pairs that are not in the gold standard positive set, given that at least one protein from each pair is copurified with an associate protein by Arifuzzaman et al. [44]

  1. Each set comprises only ORFs that could be associated with their genomic sequences using the names that were provided in the original references. Self interactions were considered in neither the training nor the testing process. GSTD, gold standard dataset; N, negative; P, positive.