By default the gazetteer creates annotations of type Lookup
with majorType
and minorType
features, for example an entry in the .def
file of
oss.lst:software:open_source
would create Lookups with majorType
"software" and minorType
"open_source" for entries in the list. The usual approach then would be to write JAPE rules that process the Lookup
annotations and create the final annotations.
It is possible to create other annotation types directly from the gazetteer, by adding more fields to the .def
line:
oss.lst:software:open_source::Software
would create annotations of type Software
instead of Lookup
(the fields are list file name, major type, minor type, language, and annotation type). But generally I'd recommend sticking with Lookup
and then creating your final annotations with JAPE, so you can add additional rules as necessary (the gazetteer blindly annotates any mentions of anything in the list, you often need heuristics to filter this down, for example "Apache" might be considered software most of the time, but not when followed by the word "License").
Finally, if you want to add your own gazetteer lists and/or JAPE rules then we recommend you don't edit the files under plugins/ANNIE
directly. Instead create your own lists.def
somewhere else, and load that into a separate instance of the gazetteer PR, inserted at the appropriate place in the pipeline.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…