Skip to main content

Table 7 Filtering rules for species, direct references, and chromosomal locations

From: Gene mention normalization and interaction extraction with context models and sentence motifs

 

Species

-

non-human-species <candidate name>

+

human and nonhuman-species <candidate name>

-

<candidate name> {(, ','} {a, an, the} not-human-species

+

<candidate name> {(, ','} {a, an, the} human

+

human <candidate name> {(, ','} {a, an, the}

 

Direct mentions, cell lines, chromosomal loci

+

<candidate name> {gene, protein}

-

<candidate name> {cell(s), culture(s)}

+

{locus, loci, location, chromosome, chromosomal, gene * associated}

  1. Examples for heuristic rules to filter out candidate names when they appear to refer to some other concept (gene from another species, cell line, disease locus). '<candidate name>' refers to the occurrence of the potential gene name under consideration. Keep (+) or remove (-) a candidate name when the sentence contains the pattern ('+' rules have preference). 'human' includes references to mammals.