From: Automating curation using a natural language processing pipeline
File type
Normalization
Correct interactions
% of gold
PDF
Exact
1,204
59.0
HTML
1,196
58.7
Fuzzy
1,503
73.7