Author Archives: Sarah Cohn

Final presentations

We were happy to have the chance to see everyone’s presentations last week. Over the course of the semester we’ve all been working so intently on our own projects, it sometimes felt like we didn’t have enough opportunity to fully appreciate what everyone else has been working on.

Not surprisingly, all of the projects are awesome!

This has been an amazingly group of classmates, and we would like to thank you all for your support and input over the last few months. We’re excited to see where things go from here–not just for the projects, but for everyone individually as well.

Looking forward to Tuesday, you’re all rockstars!


Gossip Girl Team Digital HUAC


As we close in on the final weeks, we’ve come to realize that what we may not be able to write a script that will do all that we want, search-wise. Fortunately, working with DocumentCloud as our database has allowed us to utilize their robust functionality, and we have used their tools to provide basic search and browse functions on our site. With these in place, we’re focused on polishing our front end, pitch, and documentation. We are also considering adding one more layer of fun…

We would like to position this project part as useful tool for historians, part as a template for replicable front-end to DocumentCloud, and as participatory digital scholarship.

To the participatory aspect, we’re considering creating and implementing a crowd sourcing platform to help with assigning the needed metadata to the individual testimonies.

One of the early and lasting story lines behind the project has been making publicly accessible a collection of materials with a shadowy past and curious relationship to public/private spaces, agendas, politics, and notions of guilt. At this stage, scholars would appreciate having the transcripts collected and rendered (simply) searchable; the scattered nature of the testimonies themselves is a major roadblock to HUAC studies that we’re trying to level out. But beyond that, incorporating crowd sourcing would resonate with the true spirit of the Digital HUAC project, which in a sense is the anti HUAC project, by relying on contributions from the public. To include a wide array of contributors in documenting and publicizing material whose origins lie in silencing or coercing folks seems powerful.

We’d love to hear input from you, our classmates, on this potential new addition to the project.


This week, team Digital HUAC worked on refining our project narrative. This work dovetails with both outreach and site content: we’ll use narrative material to pitch potential users and partners and beef up our site itself. Juliana developed a thorough “pitch kit” with relevant topics and questions, and in response we filled out sections such as: “Challenges with the Current State of HUAC Records,” and “Our Solution.” We feel that such an approach effectively communicates vital information to all parties. It also helps us think through issues concretely. Nothing forces you to articulate your project means and aims better than thinking about how strangers will interact with it all.

We also demoed a new MVP as a fallback plan. Given that we are gravitating towards fully leveraging Document Cloud’s search interface, we experimented with embedding the DC viewer and search mechanism in our site itself. This is less than ideal: for one thing, this only rendered string-search results that didn’t make use of the robust, standardized metadata that we took time to tag each transcript with. But it was helpful to think about recasting our MVP just in case, and we welcomed the chance to get under the hood of Document Cloud in more detail.

Digital HUAC Update

A short update today, as we continue to push forward on getting our search functional. We’re stalled out on a few specific questions that are, hopefully, the final barriers in putting it all together. We’ve reached out to the digital fellows and a few other people we hope can help us on these questions–

-What is the best way to connect to a REST API? Our code is currently configured using curl. Is that the best approach?

-What is the best way to structure our search in JSON—using a list (with indexed search results by location) or using an associative array of key-value pairs? We have created key-value metatags for our documents in DocumentCloud, but the resulting JSON search results only display the built-in metadata tags (e.g., title: “”, id: “”) and not our created metadata tags. Is that an issue on the DocumentCloud or on the coding side?

We’ve added a bunch more testimonies to our DocumentCloud group, and have started on entering the metadata for it. The writing and outreach process continue to move forward, along with some of the smaller aspects of UX and development.

Digital HUAC update

This week we are working on some large items:

Our number one goal this week has been to get our search functionality up and running. Daria has been a coding machine, working on this non-stop. We’re nearly there. Some of things Daria has been grappling with are connecting to the DocumentCloud API using a REST API call function and trying to figure out what is the best taxonomy to be read by both PHP and JSON. The existing tutorials and scripts either explain using PHP to connect to a MySQL database, or use Python to connect to the DocumentCloud API, however, Google Developers has a tutorial on using PHP to connect to the Google Books and Google News APIs, which has proven a useful tool in working the PHP to DocumentCloud API situation. For a peek behind the scenes, check out some of Daria’s code here.

Juliana and Chris have been hitting Twitter hard, and our followers have doubled in the last week. Juliana created an NYC DH account and is exploring it as a place for potential groups and people who might be interested in our project. We continue to amass a list of historians and institutions that will be interested in Digital HUAC. All of this outreach is working toward our short-term goal of getting our project name out there, and also our long-term goal of finding an institution to partner with (which is one more step on the road to Digital HUAC world-domination).

Juliana and Chris have also begun to write up our overarching narrative (the theme: NO APOLOGIES!) as a way to create a story to pitch, but also looking toward the future beyond class. What direction to we want the project to go in, and how is the narrative helpful in this regard? Along these same lines, we’re simultaneously writing content for our site, since many of our current pages are just placeholders. We’re slowly but steadily working toward a functional, robust site.

We started with 5 testimonies, because that seemed like a manageable number when we had a lot of technological unknowns. Now that we’ve gotten over some of our biggest technology hurdles, we’re able to increase the size of our corpus with relative ease. I am adding new testimonies to our DocumentCloud group daily and the associated metadata will be added in the coming week as well. We don’t have an updated target number of testimonies, but would like to get as many in as possible. This process of adding testimonies will continue throughout the rest of the semester. The added testimonies will make search testing significantly more interesting, as well as showcasing more of what this project’s full potential is.

We’ve also been at work on some smaller items:

–Getting our contact form to send an email to us.
–Getting the browse functionality going, at least in very beta way. For now, this will just be an alphabetical list of names. Each name, when clicked upon, will provide a results page of all the documents that person is named in.
–We agreed upon a Creative Commons license and have added that to our site in place of the ©.
–We have a new week-by-week action plan that details what needs to get done to get us to a fully-functional MVP by May 19.

HUAC User Stories

User Story #1: A forensic computational linguist doing research on how interviewing style impacts witness responses. The value of the site to the user is being able to compare friendly vs. unfriendly witnesses (difficult to determine in general court transcripts) and the sheer number of available court transcripts available (also difficult to collect re general court transcripts). The person clicks the API link and follows the prompts to extract a cluster of readings from unfriendly witness testimony, and does a second export for a cluster from friendly witness testimony. The API exports the two corpora into an intermediary location (such as Zotero), which can be used with Python (NLTK) to compare, for example, the number of times interviewers repeated question for friendly vs. unfriendly.

User Story #2: High school civics & US history teacher, Chris. He is wants to assign the students to search the archive to find primary source documents from the McCarthy era. Students will have a list of topics and names to choose from as their research areas. Chris tests the site to see if it will be useful to his students. Chris uses the simple search box to search for both topics such as ‘treason’ and ‘democracy’. Chris uses the advanced search options to combine topics with names.  Chris is looking for clean results pages, the option to save and export searches, and help with citation.

User Story #3: American Political Scientist, Jennie, doing research on US government responses during periods of perceived national security threats. Specifically, she is interested in the Foreign Intelligence Surveillance Courts (FISC), which are closed courts. Jennie wants to read about now-released documents that record the conduct of closed courts. Jennie wants to do mixture of qualitative and quantitative analysis. Qualitative: Jennie uses the advanced search to specify she wants only to look at hearings that have been identified in the category as ‘closed’ hearings, then does the same to specify only ‘open’ hearings. Quantitative: Jennie uses API and follows the prompts to extract data with the category ‘closed’ and date filter to do a statistical analysis of number of closed trials by year and how/if correlated to outside events.

Skillset, Sarah

Developer–I’m pretty sure I have no skills in this area. Basic HTML and the ability to write SQL queries just about covers it. I am going to quote Chris M, who summed up my feelings on this role perfectly “Relatively little experience in this area and so I fear that this would be biting off more than I could chew. I have interest in learning alongside my team’s developer, but in the interest of efficiency, I’m probably not the team member best suited for this role.”

Outreach–I have social media accounts. Occasionally I use them. Definitely a growth area for me.

Designer–This is the role that interests me the most. I have some basic design program knowledge–Photoshop, InDesign etc. I critique usability and web design all the time, so I’d like try and put my $ where my mouth is and move beyond critique. As a quilter and a former pastry chef, I have experience working in some really different visual languages.

Program Manager–I love to make lists & organize things. I run super-efficient meetings. I’m goal oriented and good at problem solving. I think this is my strongest skill area.

Hacking Scholarship, Planned Obsolescence & the ACRL Framework

On Friday I went to a talk about the new ACRL (Association of College & Research Libraries) information literacy guidelines. The guidelines currently in place are officially titled Information Literacy Competency Standards for Higher Education and are a rubric of points, subpoints and subsubpoints that guide librarians in teaching and evaluating information literacy. The proposed new guidelines (still under review) are titled Framework for Information Literacy for Higher Education and are based on threshold concepts “which are those ideas in any discipline that are passageways or portals to enlarged understanding or ways of thinking and practicing within that discipline.” (line 26)

As they currently stand, the six threshold concepts in the new Framework are:

  1. Scholarship is a Conversation
  2. Research as Inquiry
  3. Authority is Contextual and Constructed
  4. Format as a Process
  5. Searching as Exploration
  6. Information has Value

I found the talk and the new Framework ideas really interesting, especially in conjunction with this week’s readings, which I see as closely related to the concepts in the Framework, and the direction ACRL is trying to move information literacy in higher education. I like the movement away from a checklist of skills and towards a more encompassing platform that encourages thinking both critical and creative—core components of humanities education. Given that trends in higher ed (especially assessment, accreditation and concepts like ROI) are moving toward the quantifiable, I’m sort of surprised (though pleased) at the direction ACRL is taking with this.

I am including the longer explanation from the Framework with the three areas most connected to the readings. Most of these are not fully formed thoughts, but the start of some connections. Fortunately, Fiztpatrick is very support of the blog as a way to hash out ideas! (p 70-71)

Scholarship is a Conversation

 Scholarship is a conversation refers to the idea of sustained discourse within a community of scholars or thinkers, with new insights and discoveries occurring overtime as a result of competing perspectives and interpretations. (Framework, lines 138-140)

This is right out of Fitzpatrick (maybe it actually is). She states that we need to “…understand peer review as a part of an ongoing conversation among scholars rather than a convenient method of determining “value”…” (p 49) I agree that the traditional peer review model can be really limiting in terms of scholarly conversation, and the idea that it confers value or status is something I don’t necessarily agree with. Yet I have to explain it to students in that way, because that is the model their professors know and expect their students to learn. Trying to explain the peer review model and simultaneously offer ways to question it is hard in a 50 minute class period, where peer review is only one small aspect of what I have to cover.

Daniel J. Cohen says that “Writing is writing and good is good” (Hacking, p 40), and Jo Guldi, in thinking through an alternative wiki-process for review of publications, says that an author should “produce a stronger article then at the beginning [of the process]” (Hacking, p 24). Both of these come back to what gives value to a source. Who decides what good is good? Who decides if an article is stronger after the revision process? In both of these alternative models suggested still need someone to be an arbiter of the final product.

Authority is Constructed and Contextual

 Authority of information resources depends upon the resources’ origins, the information need, and the context in which the information will be used. This authority is viewed with an attitude of informed skepticism and an openness to new perspectives, additional voices, and changes in schools of thought. (Framework, lines 224-227)

Guldi says “The web suffers from a crisis of authority” (Hacking, p 20) and also points out that only 3 types of scholarship are highly valued (editorial, peer review, book review) and that other forms of scholarship have been excluded. Fitzpatrick also argues for a more expansive view of authorship, one that values collaborative efforts more than the current model.


The idea that authority is constructed is a way for me to push a little bit on the idea that scholarly, peer reviewed sources are ‘best’ or more valued. This week (inspired by Friday’s talk), I asked two classes of first year students what conferred authority. The first answers from both classes were ‘published’, ‘researcher’, ‘has PhD’. Only student said ‘lived experience’, and no one mentioned societal status as something that conferred authority. I didn’t see any obvious lightbulbs going off (or thresholds crossed) but hopefully they’ll continue to think about it.

Format as a Process

Format is the way tangible knowledge is disseminated. The essential characteristic of format is the underlying process of information creation, production, and dissemination, rather than how the content is delivered or experienced. (Framework, lines 279-281)

This element is slightly more obscure than the others, and the title of it has actually been changed in the upcoming draft, although I didn’t write down what the new title will be. There were discussions of format in the readings, and the two that most appealed to me were David Parry and Jo Guldi’s essays. Parry says ‘Books tell us that one learns by acquiring information, something which is purchased and traded as a commodity, consumed and mastered, but the Net shows us that knowledge is actually about navigating, creating, participating.” (Hacking, p 16) Moving away from the scholarly monograph or article as primary and working to include other formats as relevant and valuable is huge. Guldi offers several suggestions as to ways that journals can reposition themselves to take advantage of the potential changes in scholarly publishing. Most students entering college now have been raised in an information environment that encourages participation and would take easily to a wider and more flexible view of what constitutes a scholarly source, and how format can inform the scholarship.

I am very much looking to hearing Kathleen Fitzpatrick this week!