Skip to main content

Table 1 A comparison of the number of pufferfish hits by hmmsearch results versus the pufferfish database both before and after the THoR process

From: THoR: a tool for domain discovery and curation of multiple alignments

Domain name (SMART name)

N(SMART)

N(THoR)

N(THoR) - N(SMART)

14-3-3 homologs (14_3_3)

9

9

0

Domains in Ataxins and HMG-containing proteins (AXH)

6

6

0

Breast cancer carboxy-terminal domain (BRCT)

31

39

8

Bromo domain (BROMO)

89

89

0

Bulb-type mannose-specific lectins (B_lectin)

1

2

1

Chromatin organization modifier domain (CHROMO)

62

69

7

Calpain-like thiol protease family (CysPc)

31

32

1

Tandem repeat (DM15)

6

6

0

Endothelin (END).

5

5

0

Exonuclease (EXOIII)

10

12

2

Receptor for Ubiquitination Targets (FBOX)

34

45

11

Formin homology 2 domain (FH2)

20

35

15

Fibronectin type 1 domain (FN1)

49

49

0

High mobility group (HMG)

82

84

2

Homeodomain (HOX)

319

323

4

Protein kinase C-related kinase homology region 1 homologs (HR1)

19

19

0

Short calmodulin-binding motif containing conserved Ile and Gln residues (IQ)

228

226

-2

Kyprides, Ouzounis, Woese motif (KOW)

12

12

0

Kringle (KR)

33

34

1

Zinc-binding domain present in Lin-11, Isl-1, Mec-3 (LIM)

204

214

10

Pleckstrin homology (PH)

373

436

63

Zinc finger (PHD)

216

303

87

Phosphoinositide 3-kinase, region postulated to contain C2 domain (PI3K_C2)

10

12

2

Motif in proteasome subunits, Int-6, Nip-1 and TRIP-15 (PINT)

16

17

1

Phosphatidylinositol phosphate kinases (PIPKc)

14

15

1

Domain found in a protein subunit of human RNase MRP and RNase P ribonucleoprotein complexes and archaeal proteins (POP4)

1

1

0

Domain found in Plexins, Semaphorins and Integrins (PSI)

116

119

3

Domain with conserved PWWP motif (PWWP)

27

29

2

Guanine nucleotide exchange factor for Rho/Rac/Cdc42-like GTPases (RhoGEF)

99

111

12

Src homology 2 domains (SH2)

142

153

11

Src homology 3 domains (SH3)

358

373

15

Staphylococcal nuclease homologs (SNc)

3

6

3

Domain in short gastrulation protein and chordin (SOG)

3

3

0

snRNP Sm proteins (Sm)

18

18

0

TopoisomeraseII (TOP2c)

3

3

0

Tetratricopeptide repeats (TPR)

573

552

-21

Tudor domain (TUDOR)

25

44

19

Domain present in VPS-27, Hrs and STAM (VHS)

15

14

-1

  1. N(SMART) is the number of domains found in the predicted set of pufferfish proteins using hmmsearch with SMART thresholds. N(THoR) is the number of domains found in pufferfish using hmmsearch with SMART thresholds using the alignment created by THoR. N(THoR) - N(SMART) is the difference between the THoR results and the SMART results. The SMART domain families COLIPASE, ChW, CheW, Galanin, IL10, IL2, LIGANc, POLIIIc, POX and REC were used for the benchmarking as negative controls. None of these domains was expected to provide positive hits to the pufferfish database, because they are prokaryote-specific or mammal-specific domains; indeed, no pufferfish homologs were detected by THoR. The domains AAA and WD40 were both searched by THoR with only one round of PSI-BLAST, because they were known to contain many members and a full search of five rounds would require an unnecessarily lengthy period of time to complete. They are not shown because they encountered memory-allocation errors with hmmbuild and their search iterations did not complete.