Author Archives: Daria Vaisman

DigitalHUAC group update

DigitalHUAC project update

Search Form Update

After finalizing the taxonomy with our historian experts, we created a public project on DocumentCloud, where we uploaded the five sample testimonies. For each testimony, we input key value pairs based on our taxonomy.

We are still working on the script that will talk to the DocumentCloud API. In the meanwhile, we started working on a search form with HTML only. After making some very basic search forms, we came across a form builder for Bootstrap which allowed us to add more search options very easily. The form builder also provided the html, which we pasted into our test website.

Below is a screenshot:


API Script (Form Action) Update

Working with DocumentCloud, we found a) an app that allows users to work with DocumentCloud-documents through a (Django-powered) CMS (built by The Bay Citizen):

Screen Shot 2015-03-22 at 10.08.15 PM

And b) a Python wrapper built for the DocumentCloud API:

We looked at other documentation that explains how to post html form values into Python script (e.g.,

But are currently working with the Python API wrapper, which required downloading a more recent version of Python, with Pip installed, and then installing the python-documentcloud library:

Screen Shot 2015-03-22 at 9.40.18 PM

Though the initial attempt(s) return the following:

Screen Shot 2015-03-22 at 9.55.58 PM

We are continuing with the following Python-documentcloud tutorial:

In order to be able to extract text from the HUAC PDFs uploaded in DocumentCloud and return the excerpted text to the user:

And are meanwhile also playing with getting input from a browser via:

-Web forms in Django:

-And by using GET/POST methods inside a Python class index:



Digital HUAC – Workplan & Wireframe & Update




Digital HUAC - Workflow_Page_1

Workplan: what & why

Pages from Digital HUAC - Workflow


The documents (which are already scanned) will be manually tagged using an XML editor according to identified categories, then read into an open-source relational database (MySQL), which reads XML documents. The MySQL database will be incorporated into the website using PHP in conjunction with the site (syntax—PHP within the HTML/CSS site schema). Finally, the API will allow users to export their searches to text-analysis resources.

Historians and Corpus

We’ve identified a number of historians, librarians and archivists, and digital humanists to potentially work with on this project and are in the process of reaching out to them in an advisory capacity. We seek guidance on our taxonomy and controlled vocabularies in the short term, and on future developments of our project beyond the scope of this semester.

At the top of this list are historians Blanche Cook and Josh Freeman, CUNY professors and experts on the HUAC era. Steve Brier is in the process of introducing us to both Cook and Freeman. Other historians include Ellen Schrecker (Yeshiva), Mary Nolan (NYU), Jonathan Zimmerman (NYU), and Victoria Phillips (Columbia), each with subject expertise and research experience on the time, events, and people central to Digital HUAC. We have also identified Peter Leonard, a DH librarian at Yale; David Gary, the American History subject specialist at Yale who holds a PhD in American History from CUNY; John Haynes, a historian who served as a specialist in 20th-century political history the Manuscript Division of the Library of Congress; and Jim Armistead and Sam Rushay, archivists at the Truman Library, as potential advisors.
We have narrowed down the corpus of text that we’ll be working with to include 5 transcripts: Bertold Brecht; Ronald Reagan; Ayn Rand; Pete Seeger; and Walt Disney. This list of major cultural figures spans the hearings themselves and features both friendly and hostile witnesses, offering users a varied look into the nuances of interrogation. It is our opinion that by focusing on a witness base of recognizable figures that is thematically organized, users may examine their testimony as individuals and in context with one another. This quality of the HUAC hearings cannot be understated, and Digital HUAC seeks to draw attention to it through the overall user experience.


Different-Sized Data

I really liked Manovich’s overview of ‘big social data’, and was glad to have read it first of his three readings for the week, to use as a guide. In particular, I was fascinated by two of his paradigms—between ‘surface’ and ‘deep’ data and, in context of information visualization, between description and prediction in explaining what information visualization is (and/or is supposed to do).

What does data visualization do? Is it just descriptive (and if so, how does including a description that is visual improve our comprehension of the thing in question, as compared to or in addition to a description that is purely textual?)? Or predictive, a counterpart to inferential statistics? I’ve always put things into pictures to understand them, and I am crazy about Scott McCloud’s high recommendable book on comics [Understanding Comics], which shows (in comic form) how pictures and text can go together to explain or evoke things like space and time better than just pictures or text alone. So I’m a big fan of this new infographics/data visualization trend.

But I wanted to discuss Manovich’s discussion of sampling and behavior data, though class starts in 5 minutes. To be continued..

Some other good Python resources

Just wanted to add to the list of good resources for (teaching yourself) Python:

Learn Python the Hard Way and the Code Academy Python tutorial, both mentioned in the previous blog post, are both excellent (and free)–both are interactive (which I’ve found to be *hugely* helpful in writing code that actually runs (instead of spending ages rewriting the thing before realizing the problem is something mundane, like a missing colon)) and LPTHW is rote (i.e., typing out the prompt code verbatim) which, unlike pretty much any other examples of good pedagogy, I’m finding to be a pretty good way to learn this stuff.

Another very good resource (which is also the book assigned in my computational linguistics class) is John Zelle’s Python Programming: An Introduction to Computer Science, which can be accessed for free at as well as Zelle’s companion website, which has slides and downloadable code for all the program examples in the book:

There’s also, also with an interactive function (in addition to countless examples and explanations) and an MIT open courseware class, A Gentle Introduction to Programming Using Python:, which I haven’t yet looked at but appealingly promises to be a “gentle, yet intense, introduction” to programming.

Interesting piece on Moretti

After our class discussion last Thursday, I went back to find this piece on Moretti by Elif Batuman (who overlapped with Moretti at Stanford) and thought to post it. I found the discussion of Moretti in context of Russian Formalism particularly interesting.

“THE DARWINIAN THREAD is taken up in the third chapter, which opens with an evolutionary mystery: how can we understand the survival of Sherlock Holmes, as opposed to the failure of the other fictional detectives of his time? Again, a triangle: instead of Conan Doyle and his readers—Conan Doyle, his rivals, and his readers. In search of answers, Moretti and his team of graduate students dove into the archive of Victorian detective fiction and resurfaced with 108 long-forgotten rival texts—which, after appropriate analysis, yielded the secret of their own extinction: their authors didn’t know how to use clues.”