From: Gene mention normalization and interaction extraction with context models and sentence motifs
n | Cause | Evidence or examples |
---|---|---|
False negatives | Evidence from abstract/closest lexicon entry | |
24 | Polluting tokens | spectrin betaIV/spectrin beta non-erythrocytic |
35 | Unrecognized variations (orthographic, | DCoHm/DCOHM |
lexical, structural, morphological) | prothrombin/thrombin | |
4 | Segmentation of name failed | hOBP (IIb)/hOBPIIb |
2 | Syntactically unrelated | polycomblike/PHD finger protein |
66 | Removed by filtering step | |
False positives | Examples, with EntrezGene ID | |
30 | Triggered by wrong name boundary | type II IL-1 receptor |
30 | Context filtering (reference to cell etc.) | CD4+ |
22 | TF*IDF filter | five EGF-like domains; ARC complex |
11 | Disambiguation picked wrong gene | Nup358 (440872 instead of 5903) |
8 | Abbreviation resolution failed | Wolf-Hirschhorn syndrome (WHS) |
4 | Wrong species | Notch1 (...) murine tissues |
2 | Overlap of names not recognized | |
2 | NER missed correct ID | TR2 (8740 instead of 10587) |
26 | Multiple identifiers for one name | |
40 | Other |