We’ve now finished the geoparsing work – all the source texts have been put through the pipeline to identify place-names (geotagging) and provide spatial co-ordinates for them (georesolution). Geoparsing is the first step for GAP, providing the material for the various visualisations.
Previous posts have described the story so far on the geoparsing aspect of the project:
- processing the Hestia Herodotus data for the Edinburgh Geoparser and working out how to evaluate the geotagging step, using the hand-annotated Herodotus text as a gold standard for toponym identification (described here);
- setting up a local Pleiades+ database and experimenting with making it cross-searchable with Geonames (described here);
- analysing the georesolution step to work out how to improve it (described here);
- doing the same for the geotagging step (described here).
In my next posts I’ll bring things up to date by describing what I’ve been doing over the past few months, to complete my end of the project:
- Improving the geotagging.
- Improving the georesolution.
- Using the tuned system on Google Books and Open Library texts.