Filling in some Gaps in GAP

Sometimes there’s nothing like a little face-to-face time to get a project going, even one that’s dealing predominantly with on-line material and involving collaborators from around the globe. So last week I flew to the UK to work first with Elton and Leif in Oxford, and then with Kate in Edinburgh. It was surprising just how much we could achieve in one week, and how much fun we could have in the process!

First off, it’s important to highlight the extent to which GAP derives from an impressive body of prior work; in fact a large part of this initial visit for us all was spent finding out about, understanding and marshaling the resources available to us, which we will be able to use to identify ancient places in the Google Books corpus. For example, I quickly learned that:

HESTIA had already identified some 757 places in the Histories of Herodotus. HESTIA itself had built upon prior digitization efforts of the Perseus Digital Library at Tufts University, from which HESTIA had got the digital text of Herodotus in the first place;

The places identified in HESTIA are only the tip of the iceberg: with GAP, we want to try to identify far more places in far more books (ultimately the entire Google Books corpus). To do this, we need a larger database of places. Fortunately, the prior (and on going) work with the Pleiades Project provides this larger database.

In many ways, HESTIA represents a microcosm of what we hope to achieve with GAP. Because it has already identified a good number of ancient places in one text, it provides an excellent dataset for experimentation. Therefore we intend to use the HESTIA data to test ideas on what we might be able to do with the potentially thousands of ancient places that may be identified in many thousands of books from the Google corpus.

Identifying places is one thing, what to do with them once identified is another. Since I am a novice to HESTIA and the Histories, I thought it may be interesting to see whether there was any meaningful relationship between where placenames appeared in the text and the geospatial connections between those places. It turns out that there may well be a number of significant relationships, but that they reflect many different semantic dimensions which go beyond geospatial proximity alone. For example, the following placenames appear in proximity to the placename “Byzantium” in the Histories:

Bosporus (9), Hellespont (6), Miletus (5), Plataea (3), Euxine (3), Thrace (3)

The first two places, the Bosporus and Hellespont, reflect places closely related in geographic proximity to Byzantium. However, the following placenames appear in proximity to “Sparta”, a place that is much more prominent than Byzantium in the Histories:

Lacedaemon (50), Athens (45), Delphi (42), Hellas (39), Aegina (37), Asia (21), Susa (21)

In the case of Sparta, places that appear nearby in the text of the Histories tend not to have much to do with geospatial proximity. Instead, the place terms seem to be much more indicative of key political and military relationships having to do with the Greco-Persian Wars. (“Lacedaemon” here is a special case, since it is a synonym for Sparta.)

It will be interesting to see how these results from HESTIA and the Histories compare with the much larger Google Books corpus. Here are some issues we want to explore:

Will the proximity (in text) of placenames show any historical significance?

Will we be able to note changes in the geospatial orientation of research in Classics over the centuries? Can we even identify different national traditions of classical scholarship, where, for example, certain regions may be favored for discussion in German literature as opposed, say, to French or English?

But, first things first, we have to identify the ancient places. In looking forward to scaling up beyond HESTIA and the Histories to other books in the Google Book corpus, we spent a great deal of effort on reconciling HESTIA places with the places currently being compiled by the Pleiades project. The Pleiades gazetteer is largely based on the Barrington Atlas, arguably the key reference work on Greco-Roman geography. Pleiades has amassed a database of some 35,000 places. Our first major task has been to relate all of the HESTIA places with those covered by Pleiades, which we have just about done! Not only does this allow us to cite Pleiades (a key Web-based source), but it also means that we can include different toponyms (from HESTIA) to variants provided by the Pleiades Gazetteer. Along these lines, we have also related some additional 47,000 toponyms (from GeoNames) to Pleiades place entities. In doing all of this, we have a much richer database of toponyms that we can use when we index the Google Books corpus.

About these ads

About erickansa

Working to do the right thing with open data, on Open Context (http://opencontext.org)
This entry was posted in progress. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s