OCR’d: A Draft of XXX Cantos

I have completed the OCR work on A Draft of XXX Cantos and though the scans were of a high quality and there were almost no images, Pound’s text presents some unique challenges. The primary challenge for the program, ABBYY 11, is that sections of the text are in so many different languages. Sections of French, Italian, Latin, and ancient Greek (more on this in a minute) mean ABBYY is continually trying to read words it doesn’t recognize. For instance, if a French word has accents and ABBYY reads it in English, not only does the program flag that section of text as misspelled, but those accents are lost. It took me far too long to figure out how to switch between languages in ABBYY and then how to have ABBYY do it automatically, but by the last third of the text, ABBYY was reading any modern language with relative ease (I will say, I feel ABBYY’s English dictionary is occasionally disappointing, I’m not fluent enough in any other language to make that assessment). There are sections of these early cantos that are in ancient Greek, however, and they present problems I am not currently sure how to solve. Because the ancient Greek alphabet is not recognized by ABBYY (so far as I know), not only will the program flag those words as misspelled, but it won’t read them at all. ABBYY will translate the graphemes into more familiar contemporary Western letters (or symbols like #). This means that the text file will have non-sensical, and more importantly, non-searchable sections. My provisional solution to this issue is to typeCarroll Terrell’s translation of these sections from his Companion to the Cantos of Ezra Pound, in the text file. This way the Greek will be visible in the PDF for anyone who searches for that section of text.  Going forward, I have a few small concerns related to this solution.Companion Cover

As so many critics, including Terrell himself, remark, Pound’s translations of different languages are idiosyncratic, evolving, and highly important to understanding the context in which these passages appear (especially for the relational, emergent meaning-making in a text like The Cantos). Terrell’s translation of these sections of Greek (or whoever he draws from in his annotations) have the potential to be different enough from Pound’s understanding of a given passage to alter the reader’s understanding of the text as a whole. Naturally, these aren’t unique concerns with regard to translations, however, it will become more of an issue when Chinese ideograms appear in later sections of The Cantos. There are no ideograms in A Draft of Thirty Cantos, and when they do appear in the later cantos Terrell is very good in pointing out how scholars feel Pound translated the figures as well as presenting more standard translations. On the other hand, this creates a specific technical problem for my project. If there are multiple ways of translating an ideogram and I choose one to type into the text file (because ABBYY won’t read them, as in the ancient Greek) I run the risk of foreclosing interpretive avenues for those using the texts I’ve generated. This is not an apocalyptic concern, but something I’m becoming aware of as I move forward with the project.  Below is a link to the PDF:

Page 62 from “Canto 14” with ancient Greek text

Here is a link to the text file with Terrell’s translation:

This is the txt file with Terrell’s translation of the ancient Greek, which is “picture of the earth”

One thing I’ve been doing, and this isn’t a solution as much as a leaving myself a way to go back and change things, is to save the Finereader files ABBYY creates. These files allow me to go back and resave any of the types of files (PDF, HTML, txt) I make with different settings and then replace files I feel are outmoded or represent my own mistakes and flawed methodologies. At any rate, so far this has been a learning experience and presents many challenges I haven’t encountered doing similar types of work for the Modernist Journals Project.

The next stage is to learn more advanced TEI encoding to attach metadata to the text.  Luckily, the TEI Consortium provides amazing (free) tools for learning TEI methods.  For now, however, the heavy duty computing necessary to run ABBYY and handle large PDFs is complete.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

archivefutures

Manuscripts, materiality, method: a research network

Text and Medium: Intro to Digital Humanities

ENGL 2393, The University of Tulsa

Posts on the Penman

An Iowa Joycean's Musings on the Hilarious, Irreverent Irish Genius

rasmuskleisnielsen.net

an online business card, entry point, and space for musings

The Digital HuMannist

English PhD student wields laptop, wants to talk about it.

Social Media Collective

Hosted for Microsoft by Wordpress. © 2011-2016 Microsoft

Ragman's Circles

Talking about what the matter was

Modernist Fragmentation and After

International Postgraduate Conference, Princeton University, September 29-30 2016

This and That Continued

Jenny Diski - Writing and stuff.

Parrots Ate Them All

A blog about Stevie Smith and the art of the aphorism.

New Readia

WordPress site of Nicholas M Kelly, PhD, Lecturer in Rhetoric at the University of Iowa, literature, media, and digital culture scholar

Stewart Varner

scholarship, libraries, technology

Media Theory of/for the present

graduate seminar, UC Santa Barbara (Winter 2016)

Woodbine

An experimental hub in Ridgewood, Queens for developing the skills, practices, and tools for building autonomy in the Anthropocene.

The Zero-Waste Chef

No packaging. Nothing processed. No waste.

The Concept Lab

A three year research project at the University of Cambridge

%d bloggers like this: