Skip to main content

Table 5 An example for constructing string features

From: Mining physical protein-protein interactions from the literature

Input and processed documents and condidate string features

Details

Input document

The Three Human Syntrophin Genes Are Expressed in Diverse Tissues, Have Distinct Chromosomal Locations, and Each Bind to Dystrophin and Its Relatives

Processed document

the three human syntrophin genes are expressed in diverse tissues have distinct chromosomal locations and each bind to dystrophin and its relatives

Candidate string features

the thr

 

he thre

 

e three

 

three h

 

hree hu

 

ree hum

 

ee huma

 

e human

 

...

  1. The length of substring is fixed to 7. The example document only has one sentence (the title of the document of PMID:8576247). A seven-character window moves along the sequential text. All characters are converted to lower case. Only alphabetical letters and the space character are processed. Punctuation is converted to the space character.