Visualization of James Ralph Jewell’s Agricultural Education Including Nature Study and School Gardens, written in 1907 for the US Dept, of the Interior’s Bureau of Education. This document is part of a digital archive of school garden literature created by MSU librarian Suzi Teghtmeyer.

I just made something for an exercise in a doctoral seminar that looks like it was done by a second grader on Microsoft Paint. Huh. Yet, this is more than just some scribbles. It is a visualization of the five most common terms in a primary document. It was made by Voyant using a tool called “knots.” An animated line represents each of the five most common words in a text (outside of designated stop words). Each line turns when a word is used and moves in straight line when it is not. As the application reads the document (it will only do one at a time) music plays reminiscent of György Ligeti’s “micropolyphonic” symphonies. It’s pretty cool, in a 2001 SO retro sort of way. Press play above and wait a few seconds and you can experience it.

But does it really mean anything? I understand the uses of text visualizations like word clouds, at least as a quick way to map out some important words in a text. I have had luck using them to get students to discuss readings. But is serious computational analysis a useful tool for historians? Probably, but I just don’t know enough about it.

I came into this with absolutely no clue about what computational text analysis is. After reading Ted Underwood’s retrospective I feel I am beginning to understand a bit of what this is all about. Although I am not very familiar with literary analysis, I am familiar with mid-century structuralism and content analysis which used literary criticism as one of its guideposts to study cultures and societies. As Underwood argues, “distant reading” may have been coined by Franco Moretti in 2000 (and I am not going to act like I have read Moretti), but scholars have been attempting to connect literary canons with society and culture for quite some time.

The gist, at least from what I garnered from my limited background, is that computational text analysis, or what is called “distant reading,” is a form of extreme, objective, structural analysis of texts, or more aptly, a strict etic perspective across a corpus of texts producing quantitative data which makes up for the reality that no one can read everything from a particular genre or time period. Thick meaning is eschewed for algorithmic patterns, using Latent Dirichlet allocation (I don’t really know how this works, but I thought I would look up the acronym LDA that kept popping up). Or, to put it in regular people terms, you try to include every text you can and not sit down and analyze a few books closely.

Counting words and phrases over an entire archive can uncover patterns not apparent by simply reading what are considered representative texts, and those patterns can answer questions and create new ones. I am a bit skeptical of the process, but then again, I am in no educated position to enter the fray. Its use for historians became a bit more clear after I looked at Robert K. Nilson’s Mining the Dispatch site, however.

Nilson used MALLET, a free and powerful topic modeling tool, to analyze the complete archive of the Richmond Daily Dispatch, not just samples. Topics are uncovered based on words and word clusters which can reveal patterns that would unlikely be seen through close reading. It can be powerful, as Nilson shows through dentification of spikes in fugitive slave ads each time the Union Army came close to Richmond. This observation would have been nearly impossible without topic modeling.

Although I see the power in these tools, I am not sure they will make there way into my own work. It would be interesting to see when retail farmers markets opened around Michigan with topic modeling through newspapers. I did dink around a bit with the the interface for MALLET we were to use for our exercise, but I had issues getting intelligible output when I switched to some of my own texts. I also am not sure how you get access to an entire archive that is behind a paywall. Do you download the whole thing? Another question to think about as I soldier forward into more of this text analysis stuff for next week. Maybe I can make some more meaningful scribbles.

Leave a Reply