Just a quick update on the Unlock Text interface I posted about last time… Eric noticed that the ‘gazref’ attribute doesn’t point back to the original source gazetteer – you needed a second interrogation of the server to get that info. That’s now been changed and an extra attribute has been added to the output for locations: ‘source-gazref’.
So, for example, ‘Halicarnassus’ appears in the output (the .lem. files, as explained in previous post) as:
<ent feat-type="other" gazref="unlock:14186862" id="rb1" in-country="" lat="37 .25" long="27.25" pop-size="" source-gazref="plplus:599636" type="location"> <parts> <part ew="w67" sw="w67">Halicarnassus</part> </parts> </ent>
The ‘gazref’ attribute is a local Unlock reference, whereas the ‘source-gazref’ points to Pleiades+ (which was specified as the gazetteer for this run) and corresponds to http://pleiades.stoa.org/places/599636, which is indeed Halicarnassus.
Note, incidentally, that the ‘parts’ entity has attributes indicating the start word (‘sw’) and end word (‘ew’) of the string identified as a ‘location’ in the tokenised text. The .lem. output files present the entire input text in tokenised format first, followed by standoff xml pointers for the entities, such as locations, that the geoparser found.
Many thanks to Colin at Edina and Claire at LTG for fixing this. We now have an api that will allow you to specify that you want the Pleiades+ gazetteer, and link placename mentions found in your input text directly back to the corresponding entry at Pleiades.