From: Automating curation using a natural language processing pipeline
Description | Regexp |
---|---|
Capitals, lower case, hyphen then digit | [A-Z]+[a-z]*-[0-9] |
Capitals followed by digit | [A-Z]{2,}[0-9]+ |
Single capital | [A-Z] |
Single Greek character | \ p{InGreek} |
Letters followed by digits | [A-Za-z]+[0-9]+ |
Lower case, hyphen then capitals | [a-z]+-[A-Z]+ |
Single digit | [0-9] |
Two digits | [0-9][0-9] |
Four digits | [0-9][0-9][0-9][0-9] |
Two capitals | [A-Z][A-Z] |
Three capitals | [A-Z][A-Z][A-Z] |
Four capitals | [A-Z]{4} |
Five or more capitals | [A-Z]{5,} |
Digit then hyphen | [0-9]+- |
All lower case | [a-z]+ |
All digits | [0-9]+ |
Nucleotide | [AGCT]{3,} |
Capital, lower case then digit | [A-Z][a-z]{2,}[0-9] |
Lower case, capitals then any | [a-z][A-Z][A-Z].* |
Greek letter name | Match any Greek letter name |
Roman digit | [IVXLC]+ |
Capital, lower, capital and any | [A-Z][a-z][A-Z].* |
Contains digit | .*[0-9].* |
Contains capital | .*[A-Z].* |
Contains hyphen | .*-.* |
Contains period | .*\ ..* |
Contains punctuation | .*\ p{Punct}.* |
All digits | [0-9]+ |
All capitals | [A-Z]+ |
Is a personal title | (Mr|Mrs|Miss|Dr|Ms) |
Looks like an acronym | ([A-Za-z]\.)+ |