Skip to main content

Table 1 A comparison of the number of pufferfish hits by hmmsearch results versus the pufferfish database both before and after the THoR process

From: THoR: a tool for domain discovery and curation of multiple alignments

Domain name (SMART name) N(SMART) N(THoR) N(THoR) - N(SMART)
14-3-3 homologs (14_3_3) 9 9 0
Domains in Ataxins and HMG-containing proteins (AXH) 6 6 0
Breast cancer carboxy-terminal domain (BRCT) 31 39 8
Bromo domain (BROMO) 89 89 0
Bulb-type mannose-specific lectins (B_lectin) 1 2 1
Chromatin organization modifier domain (CHROMO) 62 69 7
Calpain-like thiol protease family (CysPc) 31 32 1
Tandem repeat (DM15) 6 6 0
Endothelin (END). 5 5 0
Exonuclease (EXOIII) 10 12 2
Receptor for Ubiquitination Targets (FBOX) 34 45 11
Formin homology 2 domain (FH2) 20 35 15
Fibronectin type 1 domain (FN1) 49 49 0
High mobility group (HMG) 82 84 2
Homeodomain (HOX) 319 323 4
Protein kinase C-related kinase homology region 1 homologs (HR1) 19 19 0
Short calmodulin-binding motif containing conserved Ile and Gln residues (IQ) 228 226 -2
Kyprides, Ouzounis, Woese motif (KOW) 12 12 0
Kringle (KR) 33 34 1
Zinc-binding domain present in Lin-11, Isl-1, Mec-3 (LIM) 204 214 10
Pleckstrin homology (PH) 373 436 63
Zinc finger (PHD) 216 303 87
Phosphoinositide 3-kinase, region postulated to contain C2 domain (PI3K_C2) 10 12 2
Motif in proteasome subunits, Int-6, Nip-1 and TRIP-15 (PINT) 16 17 1
Phosphatidylinositol phosphate kinases (PIPKc) 14 15 1
Domain found in a protein subunit of human RNase MRP and RNase P ribonucleoprotein complexes and archaeal proteins (POP4) 1 1 0
Domain found in Plexins, Semaphorins and Integrins (PSI) 116 119 3
Domain with conserved PWWP motif (PWWP) 27 29 2
Guanine nucleotide exchange factor for Rho/Rac/Cdc42-like GTPases (RhoGEF) 99 111 12
Src homology 2 domains (SH2) 142 153 11
Src homology 3 domains (SH3) 358 373 15
Staphylococcal nuclease homologs (SNc) 3 6 3
Domain in short gastrulation protein and chordin (SOG) 3 3 0
snRNP Sm proteins (Sm) 18 18 0
TopoisomeraseII (TOP2c) 3 3 0
Tetratricopeptide repeats (TPR) 573 552 -21
Tudor domain (TUDOR) 25 44 19
Domain present in VPS-27, Hrs and STAM (VHS) 15 14 -1
  1. N(SMART) is the number of domains found in the predicted set of pufferfish proteins using hmmsearch with SMART thresholds. N(THoR) is the number of domains found in pufferfish using hmmsearch with SMART thresholds using the alignment created by THoR. N(THoR) - N(SMART) is the difference between the THoR results and the SMART results. The SMART domain families COLIPASE, ChW, CheW, Galanin, IL10, IL2, LIGANc, POLIIIc, POX and REC were used for the benchmarking as negative controls. None of these domains was expected to provide positive hits to the pufferfish database, because they are prokaryote-specific or mammal-specific domains; indeed, no pufferfish homologs were detected by THoR. The domains AAA and WD40 were both searched by THoR with only one round of PSI-BLAST, because they were known to contain many members and a full search of five rounds would require an unnecessarily lengthy period of time to complete. They are not shown because they encountered memory-allocation errors with hmmbuild and their search iterations did not complete.