A couple years ago I was approached by the staff at the Modernist Versions Project at the University of Victoria in Victoria, British Columbia to help with digitizing a rare copy of James Joyce’s Ulysses. The copy in question is held in the University of Tulsa’s McFarlin Library Special Collections which is part of their extensive Joyce collection. The MVP’s aim was to have this variant text included in their Algorithmic Ulysses text analysis tool. Having been on the staff of the Modernist Journals Project for just about two years by then, I had been familiar with the processes involved in digitization. For this project, however, I had to expand my comfort zone a bit. My work at the MJP to that point had been OCR work in ABBYY Finereader, TEI text encoding and MODS records generation in Oxygen, and PDF creation in Acrobat. None of this was too technical, but it was all digital. I had no contact with the paper copies of the magazines, McClure’s, The Masses, The Smart Set, and Camera Work, because they were scanned by a private company on the east coast.
The digitization of the Roth Ulysses on the other hand, required I get my hands dusty working with the actual text. I worked alongside Special Collections librarians who educated me on procedures and conventions of working with delicate and rare texts. The process was fairly simple in theory, but in practice consisted of a lot of trial and error, retaken pictures, data management adjustments, and image tweaks. Oh, and lots and lots of climbing up and down from a chair to take the photos.
The equipment we used included two digital cameras, both borrowed from TU English Department faculty, the small page turning wand I became very familiar with during the process, and an image taking apparatus consisting of a cradle for the book, two angled sheets of plexiglass to flatten the pages, and a box frame on which all this moved. It looked something like this:
The process itself was more difficult than I anticipated. We struggled to get the light correct and ended up with reflections of each of the cameras appearing in the images as a result of glare on the plexiglass (you can kind of see it on the cover image above). We were also very cautious about the physical book because the spine was quite delicate. This made us hesitant to use the plexiglass to flatten the pages because the more we forces the pages against the glass, the more pressure the spine had to endure. Eventually, we settled on a set of practices that got things working tolerably well and the book itself experienced no lasting stress. As a side note, throughout the entire process, the Specials Collections staff were not only tireless in helping me (they turned every one of the more than 600 pages), but also educated me on handling of fragile artifacts.
The photography itself may not sound all that interesting, but it was certainly the part of the process that required the most effort on my part. Both cameras were digital Nikons with more than enough resolution for what we required, but because of several factors, they would autofocus with different levels of precision. This required that I look at nearly every image before I moved on to the next page to ensure we had a reliable enough image of the text for OCR but also because the payoff for Special Collections that they would receive the digital copy for potential online exhibits. This meant that not only did the resolution need to be good enough for ABBYY Finereader to generate a reliable .txt file for integration into the Algorithmic Ulysses, but also to make a presentation-worthy version for possible digital exhibits and online samples.
In order to ensure each image was of sufficient quality, I was forced to climb a chair to view and delete them if they weren’t acceptable. After awhile this became habit, but it did underscore one of the major differences between this aspect of digitization and the work I had done at the MJP to that point. Working with materials in archives is a physical process. I can testify to the sometimes perverse levels of resistance each of our software programs would test us with as we digitized magazines, but there is something about the obduracy of physical objects and their levels of pliability, rigidity, and weight that leaves a lasting impression.
The physicality of the process and the care and time it took also made it clear how the massive digitization processes undertaken by outfits such as Google Books or Hathitrust require not only the technical expertise, algorithmic image processing, and data storage, but also the refinement and engineering of the physical process. The fact that these efforts have produced such massive resources is truly impressive whatever you may feel about the legal aspects of their efforts.
It turned out that we had produced reliable enough images that, though we were forced to retake about 50 page images, the OCR process went pretty smoothly. The editing of the text was taken on by none other than Hans Walter Gabler and his team.
My takeaway from the experience was overwhelmingly positive and provided me with a sense of working with archival materials, an idea of the processes that are required when the digital meets the analog, and how to collaborate with archivists, initiatives at other universities, and the dedication necessary to take a project from beginning to end and solve minor problems along the way.