Archive

Tools

This morning I was notified I had received a second DHSI tuition scholarship in June 2015.  This means I will be taking Text Encoding Fundamentals from June 8-12 and then XSLT: A Collaborative Approach from June 15-19.  First of all, I wanted to thank DHSI for offering these classes, but also the various organizations that contribute funds to make these scholarships possible.  Without the funding, it would likely be too expensive for someone on my budget to attend one, let alone two, classes.

I applied for these two classes because they will provide practical skills I will need to complete the TEI of Ezra Pound’s A Draft of XXX Cantos project I am working on for my dissertation.  As I’ve outlined in other posts, this project is part of a hybrid digital and written chapter of my dissertation in which I attempt to enact some of the textual dynamics I argue Pound employs in the XXX Cantos.  The TEI version of the XXX Cantos I have created thus far is a specific interpretation of how the text works.  The coding structure I’ve developed encodes specific types of information that I’ve gathered under the conceptual umbrella of intertextual references.  The chief goal is to use TEI to encode associational relationships between informational elements and use these tags to represent how the text generates procedurally.  In other words, what is on the page of the XXX Cantos is a result of informational processing that “calls” another associationally related piece of information to the page.

In order to do this, I have settled on tags that indicate person names, place names, ventriloquisms and allusions to other texts, quotations, and then tried to block off the different sections of Pound’s text that are associated with these intertextual sources.  In addition to the structural tags for line numbers, page breaks, Canto breaks, and so forth, these are the tags I settled on for now:

<persName key=”OVI1” reg=”Ovid”>Ovidio</persName>
<placeName>Tuscany</place
<cit type=”quotation” text=”The Odyssey” author=”Homer” part=”12.200-205”>… </cit>

The other half of this argument is that Pound was very focused on textual dynamics as they materialize on the page as an interface.  This is drawing from my previous class at DHSI, Digital Humanities Databases in which I learned about the components of databases, the most essential of which is a relay between the user interface and the informational infrastructure.  This is to say that the relationship between my dissertation’s argument about information, interface, and modernist textual effects in the work of H. D., Joyce, and Pound is tightly raveled with the practical skills I have been and will be able to acquire at DHSI.

Developing the encoding structure and methodology for my TEI of the XXX Cantos has proven a bit more troublesome than I anticipated.  The tags I have listed above have been changed back and forth between other options multiple times as I’ve read the TEI P5 Guidelines and the Gentle Introduction.  These guides are essential tools and I would be nowhere without them, but I need to describe the goals of my project to someone and work with them to arrive at a way to achieve them.  I have had some difficulty determining which tags to use, which validation schema to insert into my header, and how to make the cleanest, most translatable version of the text that I can.  This is all toward the goal of using languages such as XSLT to process the TEI encoded text I’ve created.  In this sense then, I’ve chosen classes for 2015 that will help me solidify my technical foundation as well as teach me the skills and concepts I will need to complete the project.

Advertisements

I have completed the OCR work on A Draft of XXX Cantos and though the scans were of a high quality and there were almost no images, Pound’s text presents some unique challenges. The primary challenge for the program, ABBYY 11, is that sections of the text are in so many different languages. Sections of French, Italian, Latin, and ancient Greek (more on this in a minute) mean ABBYY is continually trying to read words it doesn’t recognize. For instance, if a French word has accents and ABBYY reads it in English, not only does the program flag that section of text as misspelled, but those accents are lost. It took me far too long to figure out how to switch between languages in ABBYY and then how to have ABBYY do it automatically, but by the last third of the text, ABBYY was reading any modern language with relative ease (I will say, I feel ABBYY’s English dictionary is occasionally disappointing, I’m not fluent enough in any other language to make that assessment). There are sections of these early cantos that are in ancient Greek, however, and they present problems I am not currently sure how to solve. Because the ancient Greek alphabet is not recognized by ABBYY (so far as I know), not only will the program flag those words as misspelled, but it won’t read them at all. ABBYY will translate the graphemes into more familiar contemporary Western letters (or symbols like #). This means that the text file will have non-sensical, and more importantly, non-searchable sections. My provisional solution to this issue is to typeCarroll Terrell’s translation of these sections from his Companion to the Cantos of Ezra Pound, in the text file. This way the Greek will be visible in the PDF for anyone who searches for that section of text.  Going forward, I have a few small concerns related to this solution.Companion Cover

As so many critics, including Terrell himself, remark, Pound’s translations of different languages are idiosyncratic, evolving, and highly important to understanding the context in which these passages appear (especially for the relational, emergent meaning-making in a text like The Cantos). Terrell’s translation of these sections of Greek (or whoever he draws from in his annotations) have the potential to be different enough from Pound’s understanding of a given passage to alter the reader’s understanding of the text as a whole. Naturally, these aren’t unique concerns with regard to translations, however, it will become more of an issue when Chinese ideograms appear in later sections of The Cantos. There are no ideograms in A Draft of Thirty Cantos, and when they do appear in the later cantos Terrell is very good in pointing out how scholars feel Pound translated the figures as well as presenting more standard translations. On the other hand, this creates a specific technical problem for my project. If there are multiple ways of translating an ideogram and I choose one to type into the text file (because ABBYY won’t read them, as in the ancient Greek) I run the risk of foreclosing interpretive avenues for those using the texts I’ve generated. This is not an apocalyptic concern, but something I’m becoming aware of as I move forward with the project.  Below is a link to the PDF:

Page 62 from “Canto 14” with ancient Greek text

Here is a link to the text file with Terrell’s translation:

This is the txt file with Terrell’s translation of the ancient Greek, which is “picture of the earth”

One thing I’ve been doing, and this isn’t a solution as much as a leaving myself a way to go back and change things, is to save the Finereader files ABBYY creates. These files allow me to go back and resave any of the types of files (PDF, HTML, txt) I make with different settings and then replace files I feel are outmoded or represent my own mistakes and flawed methodologies. At any rate, so far this has been a learning experience and presents many challenges I haven’t encountered doing similar types of work for the Modernist Journals Project.

The next stage is to learn more advanced TEI encoding to attach metadata to the text.  Luckily, the TEI Consortium provides amazing (free) tools for learning TEI methods.  For now, however, the heavy duty computing necessary to run ABBYY and handle large PDFs is complete.

Image

I am in the very beginning stages of a new digital project in which I will put the skills I have learned as a member of the Modernist Journals Project staff to use on A Draft of XXX Cantos.  I will use ABBYY 11 to OCR A Draft of XXX Cantos (thanks to TU Copy for those scans).  The Finereader files ABBYY creates will allow me to create viewable and searchable PDFs of the text for the future.  For now, I am most interested in the text files ABBYY creates.  I will be pulling the full text of these early cantos into Oxygen XML Editor and then using TEI markup language to create new ways of searching, arranging, and most importantly, sharing the text.  The project will end up being part of a dual format chapter in my dissertation tentatively entitled Database Modernism.  My dissertation examines the role of what I’m calling database aesthetics as well as textual and formatting experimentation in modernist literature.  Using TEI to attach metadata to Pound’s Cantos is an attempt to enact the text’s interfacing ambitions, close the gap between print and digital ontologies, and describe how these innovations of literary modernism on both the formal and content levels forecast late twentieth and early twenty-first century media culture.

This is the first in what I hope are a series of posts about research, writing, and analysis with digital tools as I begin the dissertation process. 

Having recently passed my comprehensive exams, I began doing some preliminary reading for my abstract/prospectus and eventually my dissertation. The first few texts I read were Bruno Latour’s Reassembling the Social, Matthew Kirschenbaum’s Mechanisms, Willhelm Ernst’s Digital Memory and the Archive, Sven Spieker’s The Big Archive, Katherine Hayles’ How We Think, and Alexander Galloway’s The Interface Effect.  Most of that reading is, at least for the way I work, about comprehension and understanding the framework for their approaches. For theory-type reading (as opposed to primary texts or literary criticism), I almost always have to go back, reread and, often, note-taking on these texts isn’t productive time-wise (in the moment marginal notation seems more useful). Now that I’m reading more texts on and by the various authors I’m thinking about including in my dissertation (H.D., Pound, Eliot, Joyce, Williams, Tolson, Stein, etc.), I am starting to take what I feel like are more traditional textual notes, ways to find things of interest later on. So far I’ve read three or four texts and tried two different ways of taking notes: Google Drive/Docs and Zotero.  Of course, each platform has its positives and negatives.

Positives:

Google Drive/Docs: The main positives Google products provide me with are versatility and mobility.  Because both my phone and tablet run on Android, I am able to seamlessly access and add to Google in any environment.  This means that I can take a tiny idea and put it in Google Keep, or do long form, detailed notations of poetry, novels, and films.  I can do it on my phone quickly, and I can access those notes at almost any time.  Also, Google provides huge, seemingly infinite, amounts of free storage, and I can embed the docs into this WordPress with minimal expertise and, as a neophyte digital humanist, this is essential.  The user-friendliness Google provides makes it extremely valuable.

Zotero: The main benefit of Zotero is that I am able to use it to for a much deeper tracking of my reading and research activity.  It goes without saying how amazing a (free) product Zotero is; just the collection of data from webpages is terrific, but, in addition, with the new standalone version that operates in Chrome (having it on Chrome is, in itself, great), I am able to do a lot more things with the data I collect (instant timelines!).  Most importantly, I am able to attach metadata and then export it to different formats with that metadata intact.  I’m not just talking about auto-generated bibliographies, but also exporting to spreadsheets, timelines, MODS records, TEI, etc.  Having that metadata is essential for using programs like Gephi, some of the items in the MIT SIMILE suite, etc.

Negatives:

Google Drive/Docs: There is no automated metadata.  This means that whenever I want to transfer the work I do with texts from one place to another, I have to be manually recording the metadata (which is mostly limited to content tagging) at each step.  Of course, this can be avoided by using spreadsheets, but even in this case, they’re a brittle, often reductive format that requires information translation that proves costly (in terms of lost nuance and specificity; I’m continually fitting things into categories or endlessly granulating them), and is therefore less than ideal.  This metadata problem probably outweighs the convenience benefits of Google products.

Zotero: The only problem I have with Zotero is that it’s not that portable. If I want to use Zotero it’s necessary to bring my laptop wherever I read (which isn’t actually that inconvenient, but I tend to get distracted by an available internet terminal as this blog post testifies; I should be reading).  Unless I’m on my computer it’s difficult to choose which collection to save bibliographic entries to, and I also can’t take notes or create tags associated with those entries.  I have played with Zotero’s mobile site some and I bought the Zandy Android app both of which I feel are admirable efforts to mobilize a superb and hugely useful (again, free) program, but find the interface slows down the research process enough to make it somewhat prohibitive.  I should add that I haven’t probably done due diligence with Zotero’s capabilities and so as I explore, perhaps these problems will solve themselves.  I will also continue to test Zandy and the mobile site.

Takeaway: I still haven’t used many note-taking products.  Evernote seems the next logical step and I just downloaded it after I learned it has connectivity with Zotero.  As I experiment and work through my dissertation project I hope to find a way to adapt my research methods to fit the tools I have available.  I am also currently trying out different visualization and analytic tools and it stands to reason that each one will work better with different data collection methods.

triproftri

researching, writing, triathloning

archivefutures

Manuscripts, materiality, method: a research network

Text and Medium: Intro to Digital Humanities

ENGL 2393, The University of Tulsa

Posts on the Penman

An Iowa Joycean's Musings on the Hilarious, Irreverent Irish Genius

rasmuskleisnielsen.net

an online business card, entry point, and space for musings

The Digital HuMannist

English PhD student wields laptop, wants to talk about it.

Social Media Collective

Hosted for Microsoft by Wordpress. © 2011-2016 Microsoft

Ragman's Circles

Talking about what the matter was

Modernist Fragmentation and After

International Postgraduate Conference, Princeton University, September 29-30 2016

This and That Continued

Jenny Diski - Writing and stuff.

Parrots Ate Them All

A blog about Stevie Smith and the art of the aphorism.

New Readia

WordPress site of Nicholas M Kelly, PhD, Lecturer in Rhetoric at the University of Iowa, literature, media, and digital culture scholar

Stewart Varner

scholarship, libraries, technology

Media Theory of/for the present

graduate seminar, UC Santa Barbara (Winter 2016)

Woodbine

An experimental hub in Ridgewood, Queens for developing the skills, practices, and tools for building autonomy in the Anthropocene.

The Zero-Waste Chef

No packaging. Nothing processed. No waste.

The Concept Lab

A three year research project at the University of Cambridge