Skip to main content

Table 1 Statistical significance of the presence of predicted specificity residues in known interfaces of protein-protein and protein-DNA/RNA complexes

From: Determinants of protein function revealed by combinatorial entropy optimization

PDBa Protein nameb Superfamilyc Alignmentd Se Ce Ligandf Ig S&Ig P S&I g C&Ig P C&I g (S+C)&Ig P (S+C)&I g
1wq1R1 (1 to 166) Ras P-loop containing nucleoside triphosphate hydrolases Superfamily (human) 156/0.90/0.90 13 7 1wq1G, GDP, Mg, AF3 42 8 0.00434 5 0.0118 13 0.00007
1wq1G2 (718 to 1, 037) P120Gap GTPase activation domain, GAP Superfamily (human) 20/0.90/0.90 36 15 1wq1R, GDP, Mg, AF3 33 11 0.00024 6 0.00183 17 0
1fvuA3 (1 to 133) Botrocetin α-chain C-type lectin-like Superfamily (swiss) 64/0.90/0.90 21 14 1fvuB 39 10 0.092 5 0.391 15 0.035
1fvuB4 (401 to 525) Botrocetin β-chain C-type lectin-like Superfamily (swiss) 136/0.90/0.90 29 8 1fvuA, Mg 39 13 0.077 3 0.507 16 0.0668
1a2kA5 (10 to 121) NTF2 NTF2-like Pfam 87/0.90/0.90 18 2 1a2kD, GDP, Mg 16 7 0.005 0 1 7 0.0085
1a2kD6 (12 to 170) RAN P-loop containing nucleoside triphosphate hydrolases Superfamily (human) 170/0.90/0.90 17 7 1a2kA, GDP, Mg 27 6 0.0445 6 0.00009 12 0.00004
1i2mB7 (24 to 417) RCC1 RCC1/BLIP-II Superfamily (nrd90) 77/0.90/0.90 45 23 1i2mA 37 10 0.008 0 1 10 0.089
1i2mA8 (12 to 170) RAN P-loop containing nucleoside triphosphate hydrolases Superfamily (human) 170/0.90/0.90 17 7 1i2mB 42 6 0.096 1 0.8 7 0.18
1rrpB9 (17 to 150) NUP358 PH domain-like Superfamily (nrd90+swiss) 59/0.90/0.90 31 3 1rrpA 51 16 0.075 2 0.323 18 0.032
1rrpA10 (12 to 170) RAN P-loop containing nucleoside triphosphate hydrolases Superfamily (human) 170/0.90/0.90 17 7 1rrpB, GNP, Mg 53 3 0.964 6 0.0058 9 0.4
1blxB11 (41 to 72) P19INK4D Ankyrin repeat PFAM (human) 1043/0.95/0.95 7 3 1blxA 11 7 0 0 1 7 0
1blxB11 (73 to 105) P19INK4D Ankyrin repeat PFAM (human) 1043/0.95/0.95 7 3 1blxA 7 5 0 0 1 5 0
1blxB11 (106 to 137) P19INK4D Ankyrin repeat PFAM (human) 1043/0.95/0.95 7 3 1blxA 1 1 0.21 0 1 1 0.3
1blxA12 (5 to 309) CDK6 Protein kinase-like (PK-like) Superfamily (human) 81/0.90/0.95 31 25 1blxB 24 4 0.19 0 1 4 0.19
2cciA13 (4 to 286) CDK2 Protein kinase-like (PK-like) Protein Kinase Resource 390 20 22 1h27B1, 1h27B2, TPO 78 13 0.0003 11 0.0173 24 0
2cciB114 (181 to 307) Cyclin A Cyclin-like Pfam N-cyclin 379/0.95/0.90 17 16 2cciA, 2cciF, TPO 48 12 0.00356 7 0.396 19 0.0063
2cciB215 (309 to 431) Cyclin A Cyclin-like Pfam C-cyclin 238/95/90 14 3 2cciA, TPO 4 2 0.063 0 1 2 0.092
1n7tA21 (14 to 98) Erbin PDZ domain PDZ domain PFAM (human) 237/0.90/0.90 10 3 peptide 17 6 0.0036 1 0.493 7 0.0032
1g4dA16 (13 to 81) Repressor protein C Putative DNA-binding domain Superfamily (nrd90) 244/0.90/0.95 12 0 DNA 25 9 0.0034 0 n/a 9 0.0034
1e3oC17 (104 to 160) Oct-1 Pou lambda repressor-like DNA-binding domains Superfamily (swiss) 397/0.90/0.90 4 5 DNA 17 4 0.00603 3 0.151 7 0.0018
2up1A18 (10 to 92) Hnrnp A1, Up1 RNA-binding domain (RBD) Superfamily (swiss) 552/0.90/1.0 16 2 DNA 21 10 0.001 0 1 10 0.00166
1ec6A19 (4 to 90) NOVA-2 Eukaryotic type KH-domain (KH-domain type I) Superfamily (nrd90+swiss) 463/0.90/0.80 12 2 RNA 24 7 0.019 2 0.074 9 0.0019
1serB20 (501 to 610) Seryl tRNA synthetase tRNA-binding arm Superfamily (swiss) 96/0.90/0.90 18 8 tRNA 19 7 0.022 2 0.412 9 0.0106
  1. aProtein Data Bank (PDB) four character code followed by the chain identifier. bName of the protein chain in the title of PDB file. cName of the corresponding Structural Classification of Proteins (SCOP) Superfamily. dSource of the alignment (Superfamily or Protein Families [PFAM]); actual number of homologous sequences used in calculations, and the fractional values of the selection filters used to clean the alignments: sequence identity and gap. eS and C represent the number of specificity and conserved residues, respectively. fPDB identifiers of the molecular fragments and co-factors (excluding water) interacting with the corresponding protein. gI, S&I, C&I, (S+C)&I stand, respectively, for the total number of interface residues (selected under ≤4.5 Å atom-atom distance threshold between ligands and the protein), the number of specificity residues in the interface, the number of conserved residues in the interface, and the number of specificity and conserved residues in the interface. PS&I, PC&I, and P(S+C)&I are the corresponding probabilities of obtaining these numbers by chance. Low values of the probabilities indicate good agreement between prediction and observation. Significant P values (< 0.05) are in bold.