Text analysis
The document index provides access to plain-text versions of all the texts in the collection (in UTF-8 encoding) through simple URLs, so that it's possible to run text-analysis tools against them. Text-analysis across the whole collection, or subsets of the collection, will most likely be more interesting, however, and we will provide suitable links here in future. For the moment, plain text and XML versions of the whole collection are available:
In the case of all of the plain-text versions, all editorial content (annotations, metadata, notes, etc.) has been stripped out, leaving only the original text. In the case of marked-up images, this means only the text that appears as part of the engraving. The XML corpus is complete with all headers and editorial annotation.
A simple way to get started is to plug one of these URLs into the TAPoR Tools available here:

