Now that the new extended version of the Pleiades name-set based on GeoNames (aka Pleiades+) is available, I’ve altered the Geoparser to use it as a gazetteer in the georesolving step for working out the geographical location of places mentioned in the text. I’ve posted some sample results, for Book 1 of the HESTIA Herodotus, at http://synapse.inf.ed.ac.uk/~kate/gap/plplusdisplay.html. This shows the place-names found in the geotagging step and the location that was ranked first by the georesolver, if there were one or more matches in Pleiades+.
As this sample shows, there are some erroneous “places” (like “Priam”) and some valid places for which no location was found (like “Egypt”). The first issue is to do with improving precision in the geotagging step, as discussed in my last post. The second issue arises because Pleiades+ does not include modern place-names: Pleiades obviously has Egypt in its dataset, but it resides under the label “Aegyptus”. (Pleiades also prefers the Latinised forms of places to the Greek.)
We are currently trying to work out the best way of dealing with the missing place-names problem. One option is to use GeoNames as a default, if no match can be found in Pleiades+: but this solution brings with it the danger of swamping the user with contemporary place-names, of the kind that we wouldn’t expect to find in ancient texts. A neat idea suggested by Leif is to look up missing places like Egypt in GeoNames, and then try to match the alternative names found listed there against Pleiades+. As it happens this would work fine in the case of Egypt – because “Aegyptus” is indeed one of the alternative names – but we wouldn’t find the match in the first place because it’s only listed as an alternative for “Arab Republic of Egypt” not for “Egypt” itself. Over the next week or so I’m going to investigate whether this option is feasible within the geoparser’s architecture. (We may find there are simply too many alternative names to handle, as in general every place name has multiple candidates in GeoNames, each of which has in turn itself multiple alternative forms.)
Another intriguing idea has been proposed by Prof. Bruce Robertson of Mount Allison University (Canada), who acts as a Technical Observer for the Pleiades Project: he wonders whether we could effectively use a Latin Wikipedia, especially given the fact that Pleiades has a penchant for Latin names – and, indeed, in our case, this resource would find Egypt using the string “Aegyptus”. Additionally, at the top of the each page, the relevant entity has Lat/Long given, which at the very least could be used as a sanity check!
In the end, however, it may turn out that we need an enhanced version of Pleiades+ – a Pleiades++ as it were – which would contain the kinds of names that we expect to occur.