Skip to main content

Table 5 Impact of different context types on human gene mention normalization

From: Gene mention normalization and interaction extraction with context models and sentence motifs

Context type

Precision

Recall

F measure

Baseline: NER only

9.7

91.1

17.5

NER + GeneRifs

50.8

78.3

61.6

NER + GO terms

46.3

81.2

59.0

NER + EntrezGene summaries

49.0

66.7

56.5

NER + diseases

22.7

43.9

29.9

NER + functions

50.8

72.5

59.7

NER + keywords

53.0

53.6

53.3

NER + locations

74.2

14.8

24.7

NER + tissues

39.4

29.1

33.4

NER + immediate context filter (heuristics)

23.5

89.8

37.2

NER + immediate context filter (HMM)

52.9

80.8

63.4

NER + PMIDs

96.2

50.8

66.4

  1. Starting from a baseline configuration (pure recognition of named entities; see text), each context type was evaluated separately. In addition, we present the impact of filtering by the immediate context: excluding genes from wrong species, abbreviations, and similar heuristics, and using an hidden Markov model (HMM) learned from the training data. Using PubMed IDs (PMIDs) curated for each gene (for instance, via GeneRIFs, Gene Ontology [GO] annotation, and UniProt) would be the best way to ensure high precision and F measure, although these data were not used for the BioCreative II evaluation. NER, named entity recognition.