Even more unlocked

Just a quick update on the Unlock Text interface I posted about last time… Eric noticed that the ‘gazref’ attribute doesn’t point back to the original source gazetteer – you needed a second interrogation of the server to get that info. That’s now been changed and an extra attribute has been added to the output for locations: ‘source-gazref’.

So, for example, ‘Halicarnassus’ appears in the output (the .lem. files, as explained in previous post) as:

<ent feat-type="other" gazref="unlock:14186862" id="rb1" in-country="" lat="37
.25" long="27.25" pop-size="" source-gazref="plplus:599636" type="location">
 <part ew="w67" sw="w67">Halicarnassus</part>

The ‘gazref’ attribute is a local Unlock reference, whereas the ‘source-gazref’ points to Pleiades+ (which was specified as the gazetteer for this run) and corresponds to http://pleiades.stoa.org/places/599636, which is indeed Halicarnassus.

Note, incidentally, that the ‘parts’ entity has attributes indicating the start word (‘sw’) and end word (‘ew’) of the string identified as a ‘location’ in the tokenised text. The .lem. output files present the entire input text in tokenised format first, followed by standoff xml pointers for the entities, such as locations, that the geoparser found.

Many thanks to Colin at Edina and Claire at LTG for fixing this. We now have an api that will allow you to specify that you want the Pleiades+ gazetteer, and link placename mentions found in your input text directly back to the corresponding entry at Pleiades.


About katefbyrne

Researcher in the Language Technology Group at Edinburgh University.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s