Category Archives: Spring 2015

Digital HUAC Progress Report- Outreach Plan

This past week, our team has reached out to HUAC experts to help us with our taxonomy, which needs to be finalized before we can start our development work. We have also made a lot of progress in the design and workflow front.

Below is our outreach plan, which includes things that we have already done and what we hope to do.

Objective:

  1. Consult with subject matter and technology experts.
  2. Promote Digital HUAC to potential users, supporters, and adopters.

Target:

Objective #1- Consult with subject matter and technology experts.

  1. Historians or librarians familiar with HUAC
    • Josh Freeman
    • Blanche Cook
    • Jerry Markowitz
    • John Hayes
    • KA Cuordileone
    • Ellen Schrecker
  2. DH or technological advisors
    • Dan Cohen, the historian on Old Bailey project, now DPLA
    • Victoria Kwan and Jay Pinho of SCOTUS Search

Objective #2- Promote Digital HUAC to potential users, supporters, and adopters.

  1. Digital humanities scholars & programs
    • Stanford Digital Humanities
    • UCLA DH
    • DiRT Directory
    • List on HASTAC site
  2. Academics
    • American History
    • Political Science
    • Linguistics
  3. High School Educators
    • History
    • Civics
    • US Government
    • National Council of Social Studies
  4. Archives, Collections, and Libraries
    • Woodrow Wilson Cold War Archive
    • Truman Library
    • Tamiment Library
    • Harriman Institute
    • Kennan Institute
    • Davis Center at Harvard University
    • The Internet Archives (archives.org)
  5. Associations
    • American Historical Association
    • Association of American Archivists
  6. Academic journals
    • Digital Humanities Quarterly
    • Journal of Digital Humanities
    • American Communist History
  7. Blogs
    • LOC Digital Preservation blog
    • DH Now
    • DH + Lib
  8. Other related DH Projects
    • SCOTUS Search
    • Old Bailey
    • NYPL Labs
    • Digital Public Library of America

Approach

Objective #1:

Outreach started on February 19.

  1. Email referrals from Matt, Steve, Luke and Daria.
  2. Find other experts through online research.

Objective #2:

Outreach to start on March 10.

  1. Social media: Twitter (@DigitalHUAC) and Wikipedia page.
  2. Create emails lists of key contacts of the above listed organizations.
  3. Prepare user/supporter-specific emails for email blast.
    • User- why this project is relevant and how it can help them with their research, what this database offers that the current state of HUAC transcripts does not
    • Supporter- why this project is relevant to the academic community and if they would consider doing a write-up or linking our site to their “Resources” page. (try to secure some kind of endorsement.)
  4. Dates of outreach:
    • April 15- Introducing the project (website launch)
    • May 10- Introducing the APIs
    • May 19- Project finalized, with full documentation

Pitch (the “voice”):

Objective #1:

(Students working on a semester-long project, looking for guidance.)

To DH practitioners: Our project, Digital HUAC, aims to develop the House Un-American Committee (HUAC) testimony transcripts into a flexible research environment by opening the transcripts for data and textual analysis. We seek to establish a working relationship with both digital humanities practitioners and HUAC experts to help advise the technological and scholarly aspects of our project more broadly, especially given that our hope is for Digital HUAC to grow and thrive past our work this semester. Our project is the first attempt to organize HUAC materials in this way, using digital humanities methodologies. We see great opportunity for collaboration with the academic community and additional academic research, as we are opening up a resource that has not been easily accessible and usable previously. We believe our efforts can help uncover new research topics, across disciplines, by utilizing DH research methods.

To historians: We are working on a semester-long project which aims to make the full text of the House Un-American Committee (HUAC) testimony transcripts into a searchable online archive. Our project is the first attempt to collect and organize HUAC transcripts online in one central, searchable location. The first stage of this project is to take our sample set of 5 testimony transcripts and denote common identifiers that will be useful to researchers using the archive. These common identifiers will allow our users to search based on categories of data, as opposed to only simple word searches, giving more value to the transcripts. We have developed a list of these identifiers (also known as a controlled vocabulary), but would like a historian with a deeper working knowledge of the HUAC hearings to advise us on this list.

Going forward, we hope to establish a working relationship with HUAC experts to help advise scholarly aspects of our project more broadly, especially given that our hope is for Digital HUAC to grow and thrive past our work this semester.

Objective #2:

(Pitching to potential users.)

We are excited to present Digital HUAC, an interactive repository of the House Un-American Activities Committee (HUAC) testimonies that uses computational methods for data and textual analysis. This is the first attempt to create such a database for the HUAC transcripts, which currently are not centralized in one location, nor are they all searchable. Our aim is to develop the HUAC transcripts into a flexible research environment by giving users the tools to discern patterns, find testimonies based on categories and keywords, conduct in-depth data and textual analysis, as well as export data sets. For the beta stage of this project, we will start with five selected testimonies.

Researchers: Digital HUAC is an interactive repository that will give researchers unprecedented access to HUAC transcripts. Supported with advanced search functionality across all records and a built-in API for additional data and text analysis, Digital HUAC has opened up one of the largest collections of primary source material on American Cold War history. Researchers will now be able to use the HUAC transcripts for comparative political analysis, informant visualization, social discourse analysis of court transcripts, linguistics analysis, as well as other research topics that have not been realized due to the previous inaccessibility of the HUAC transcripts.

High School Teachers. Digital HUAC aims to provide access to one of the most substantive collections of primary source material on American Cold War history. Your students will have the opportunity to delve further into their inquiry learning through the repository’s search functionality and PDF library of the original material. While the subject matter may be vast and complex, we have created a supportive research and learning environment with an easy to use interface, clean results pages, customizable options to save and export searches, and assistance with citation.

 

TANDEM: Team Project Update

TANDEM related information now has a home on our Commons page.

Technology Notes Week 4 (via Steve)

To build Tandem, we are utilizing a variety of existing tools. The tools are:

  1. NLTK: We plan to use NLTK to work with language data after it has been OCR’d by Tesseract. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging and parsing.
  2. FeatureExtractor (or QTip): We need a tool that will analyze the features of the images on each page of a corpus. Both of these tools are developed by The Software Studies Lab run by Lev Manovich. While FeatureExtractor is more powerful, QTip is easier to use. We have run into an issue determining the viability of redistributing FeatureExtractor as it relies on proprietary software. We continue to work on whether we can work with this. In the mean time we have scheduled a meeting with Lev to discuss our options.
  3. Tesseract: This is an open source OCR tool provided by Google. We will use this tool to identify text elements on each page of the corpus under analysis. The output of Tesseract will be input to NLTK for analysis. While we continue to lean on Tesseract, our outreach has yielded information about Ocular via USC Berkeley. We have reached out to the programmers behind this alternative OCR option and they have been receptive and helpful. We are vetting this as a possible OCR option.

NLTK

Programming

We are currently writing a Python Script that will pass a directory of files to NLTK for processing.

Status

NLTK install complete on developer’s and PM’s machine. Prototype program completed and tested by developer. Prototype program enhanced by PM to take multiple input files and pass them to NLTK for multiple different analyses.

Next Steps

Enhance script to handle variable number of input files that are output from the OCR Step.

______

QTip

Programming

TBD. QTip does not seem to have an API or a way to launched from a Python Script

Status

Install complete and tested.

Next Steps

Determine if there is a way to utilize QTip from a Python program. A meeting is scheduled with Lev Manovich on March 4 to discuss this.

______

Tesseract

Programming

A prototype module has been created and tested by the developer to process a variable number of PNG files through the Tesseract OCR engine.

Status

Install complete on the devleoper’s computer. A prototype program has been tested that verifies that viability of using Tesseract with PNG files. A significant challenge will be to find methods to improve the poor quality of the OCR output.

Next Steps

We will focus on improving the quality of the output and to test the OCR engine with other types of input files (TIF, JPEG, GIF and BMP are obvious candidates). We also continue to explore alternative OCR engines that may work better than Tesseract.

______

On the outreach front:

  • We continue to start conversations via the #picturebookshare and #tandem hashtags on Twitter.
  • Emails are currently being exchanged on the OCR and image feature extraction front to determine what best practices we can take advantage of.

______

On the design front:

Focusing mainly on our backend functionality, all forward facing design work continues to remain in the outreach department. We are working on playful picture book related logos and marketing materials as well as maintaining a presence in the illustration community. We are not yet at the stage of developing designs for the user interface, but we are beginning to consider the types of functionality, buttons, sliders, etc, that we will incorporate in the final tool.

Process Report CUNYcast

CUNYCast is an online experimental broadcast in the Digital Humanities. The CUNYcast site will model Ds106 Radio. It will also document the process, and create a “how to” manual for future CUNYcasters.  A link from the CUNYCast group page on the Academic Commons will lead people to an external site where content will be streamed. CUNYcast is a live online radio stream that anyone can take over and populate with their own DH audio radio broadcast. Cunycast is a non-archivable broadcast that will be accessible on the web. CUNYCast’s aim is to empower a DH guerrilla broadcast community.

Our team’s goal this week was to test an audio upload to Ds106 Radio, and begin to build out our WordPress site, while documenting and reporting on our process and our progress. *Note: although documentation appears here, it has not been verified between team members. Please do not attempt or post until we have completed final edits on the manual. Thanks!

Process Report 2/25/15:

Joy edited in-class audio from DH Praxis 2014-15, added music, and recorded an introduction.  James’s task was to upload that content in order to better understand how Ds106 radio works.

  1. Using edited audio recordings of our in-class conversations James converted ab .m4a, (advanced audio coding (AAC) file format) to an mp3 file.
  2. Using online converter media.io took about three minutes to convert, reducing its size from 28MB to 19MB.

Note: James chose 128kb/s as the quality, remembering that Ds106 radio has a 128kb/s stream. Next, we needed to figure out what would come first, the ds106radio how-to, Airtime, or Icecast? Airtime has a giant button on their landing page that says START NOW, so that seemed like a good place to begin. 30 day free trial, otherwise it’s 9.95 a month.

Question: If we do work with Ds106 we’ll have to get them to “grant a login, we think? Though it also possible that when we are preparing our radio station, it might cost us $10 monthly to maintain it via Airitme?

  1. Ds106radio is located in the interwebs, and how to access it via Icecast, it links to here: http://networkeffects.ca/?p=1478
  2. Download Icecast here: http://icecast.org/download/
  3. Start Icecast. It launches a console.
  4. Follow instructions by typing the address into Chrome.

Note: If I we were hosting Icecast via our local machine, this is how it would be controlled.

  1. Go to the Icecast installation directory and find a .xml doc.
  2. Open with my text

Note: This seems like it will be very important later, but we’re not sure that it will help complete the goal now. The next thing we attempt to try is looking at “Broadcasting Software” in the ds106radio how-to. We come across this document. We go for Mixxx; another broadcasting tool.

  1. Download Mixxx. Mixxx is 85MB: It does audio editing, mixing, broadcasting, recording.
  2. Enable Live Broadcasting

Note: It began importing James’s whole audio library. He loaded a song and just played with some dials. He encourages everyone to do this.

  1. Open up our cmd (command prompt and type in some commands for installing the codec:

Photo:James_Broadcast

Note: Watch for compatibility issues. We had a 64-bit version of Mixxx that was accidentally installing the 32-bit encoder. Some folders are inaccurately named. For Macs, this process seems smoother.

  1. Load up the audio for broadcast on ds106radio into Mixxx, by dragging and dropping. take
  2. Take the server info from ds106radio and put it into Mixxx:

Name: ds106rad.io / Server: ds106rad.io / Port: 8010 / Mountpoint: live / Username: source / Password: ds106 / Codec: mp3 / Bitrate: 128 (or less) / Protocol: Icecast2 / Stereo: Y/N

  1. Success = playing live audio from our class on Ds106

Process Report 2/28/15:

  1. This week Julia went to a workshop on “Bootstrap”http://getbootstrap.com/
    It is a model for a responsive website.
    2. This is our template:http://getbootstrap.com/examples/carousel/#
    Note: We have had some concerns with the constraints of wordpress. This will aford us more freedom although it may require a bit more now to update and change the site = More freedom less of a fancy wordpress back end.file structure
  2. Download Bootstrap; accessed here:http://getbootstrap.com/
  3. Use textwrangler (a bare bones html building editor) she saved the document as a .html file like (index.html).
  4. Place the file in the same folder on the desk top that held the Bootstrap.

Note: We are assuming that this series of files will be able to be uploaded to a server so they may become live. There may be a few steps missing that we’re unaware of since we’re not directly familiar with server setups.

5. Using Textwrengler to build the site; start with a blank text editor. Go to the template mentioned above (http://getbootstrap.com/examples/carousel/#) and open the site. It is a browser and look at the view page source option.

6. Copy and paste the page source from that page and place it into a plain text document.

Note: CSS of this document was all whacked out at first. The file connections to the rest of the folders would be different if they were sitting on the desktop.

  1. Go through the preliminary documentation to fix the <!DOCTYPE html> heading issues in the .html file.
  2. Screen shot of the website displayed in a browser on her computer. It is bare bones but it does display.CUNYcast_Web_SampleNote: Julia will next play with the style of CUNYcast site to reflect the new direction of the project. Barbara Kruger is a visual inspiration since we’re going guerrilla.

Please join us at our new twitter account @CUNYcast #CUNYcast
Also, we’ll be making our group page public on the commons this week.

NYC Fashion Index: Team Report

NYCFI_logo_2nd

Currently, our team focuses on technologies and outreaches. We created the lists of taglines related to fashion and sprezzatura.

Developer -Tessa

Decided to use Python  as the programming language to pull big image data sets.

Installed homebrew for more productive data mining, interm, and Text Wrangler.

Made a github account and linked github to my terminal with an SSH key

Cloned the repository for python for instagram

Next Steps:

Download free database (possibly postgres because it has geotagging functionality built in)

Use python and text editor to get images and metadata into database
Use django app and python to build front-facing site

Outreach -Renzo

Created instagram and twitter accounts.

If you are interested please follow @nycfhasionindex

Designer – Min 

Designed the logo and prototype of the webpage.

Setting up placeholder web linking to social media platforms (twitter, instagram, tumblr)

 

Overall, we are setting up the reclaim hosting.

CUNYCast PROJECT MAP

ABSTRACT:

CUNYCast is an online experimental podcast in the Digital Humanities.  Project organizers will record content, make it available online, document the process, and create a “how to” manual for future podcasters, eventually encouraging others to contribute to a network of podcasts. CUNYCast is free, open sourced, collaborative, interdisciplinary, and shared. It is more than a single project; it is an initiative that offers students and faculty the tools and knowledge they need to share their work through audio. It can be used in classes, workshops, clubs, or as a stand-alone project to enrich not only the community at the Graduate Center but also the community across all CUNY schools. A link from the CUNYCast group page on the Academic Commons will lead people to an external site where content will be hosted. Links will also connect participants to resources that encourage information sharing, and cooperative production through CUNYCast. The CUNYCast site will be hosted via WordPress, which supports audio, video, and a number of other easy to use plugins. It will also implement source code from ds106 radio, and be powered using Icecas; a streaming media server. CUNYCast can enrich the long commute to the city, whether it is by train or car, or can be enjoyed from the comfort of home, when one is sick and unable to get to class. It is a low cost project capable of producing content and knowledge for individual classes in the future while encouraging community connection. CUNYCast’s aim is to empower a DH community on demand; one that anyone can participate in.

USER STORIES:

Digital Humanities student:

In the new world of multimedia scholarship students are looking for new modes of production. They may ask themselves, “How can I reach a larger audience with my academic work and discoveries?” Publishing in academic journals is an integral part of academia but in an increasingly open source world how do graduate students reach out to wider communities to create digital content? A digital humanities student can tap into the CUNYCast web presence to learn how to make a high quality audio/digital repository. The student will learn different ways they can host their work, including requesting a block on the CUNYCast site. The student can also browse other CUNYCast programs (or more directly, ds106 programs) to see the open and conversational ways Digital Humanities scholarship shows a process-documented approach to scholarship. The WordPress production and code will be shared and easily accessible for student learning.

CUNY Faculty member:

The CUNY faculty member is interested in opening up digital publication opportunities for their students. Instead of a traditional end-of-class paper, the professor may want students to produce something that will exist online so that students can share their scholarship in a wider community and create their online academic persona. This faculty member would use CUNYCast to show students how to produce and post their podcast final project. The CUNYCast would also include documentation about the pedagogical practices being formed around digital media production in the classroom.

Outside Non-Affiliate interested in Podcasting:

Academic publishing and information has historically been a closed system where information seeking community members and independent scholars have to jump through hoops to get access to the most current and revolutionary scholarly work. This non-affiliate could look toward CUNYCast, and learn how to create his or her own online digital podcast publication. Here, they would be able to see how investigative and scholarly podcasting practices have their techniques outlined.

Technical Specs:

  • We are building an open online course in Digital Humanities. Documentation and technical explanations will be available to GC users via the Academic Commons.
  • Web hosting
  • Icecas Account
  • WordPress A/V Plugins, including Soundcloud
  • RSS
  • Airtime
  • ds106 Radio
  • Recording Equipment
  • Audacity (open source audio editing software)

Cunycast

Fashion Index (Sorry…I accidentally deleted the posting)

The Team:

Min: Designer

Tessa: Developer

Renzo: Outreach

You Gene: Project Manager

 

Abstract

 

In this project we will fundamentally navigate what people wear in their daily life for the sake of function and beauty, and how they like to show it off to the world through the internet. Fashion has a long history, but today fashion due to the rise of the Internet and SNS (Social Network Services) fashion now has a chronological narrative. In particular, instagram provides an enormous amount of images as people post their daily outfits. The collection of images can be indexed and achieved. As a result, it becomes a fashion trend with a wide range of colors, patterns, and styles.

 

Our methodology is creating a visual timeline of fashion trends in New York based on harvesting the keywords and geospatial data from instagram. On instagram, we will focus on collecting and analyzing hashtags taken in New York, specifically one hashtag, “sprezzatura.” To do data mining, we will mainly utilize API, which provides statistics and help analyzing the trend and popularity. We also want to emphasize a user friendly aspect to the design of the final project with the possibility that viewers will come to the site and see a wall of photos and clicking on one would bring up information and statistics linking it to other photos.

 

The main audiences for this project are people who belong to fashion communities, including designers, fashion journalists, and those interested in fashion studies and trends. Yet, there is no social index of fashion images. We pursue our goal that this project will lead to create a new social index for fashion. We also aim to create a lively updated visualization datasets what we have collected. In addition, our goal is to define and investigate future fashion trends by looking back our fashion chronology. We expect this project will bring fashion into the dialogue of digital humanities and open up interdisciplinary possibilities between digital humanities and fashion studies.

Keywords: fashion, instagram, data mining, API, social index, sprezzatura, fashion trend, visual timeline

Environmental Scan:

 

Fashion acts as a barometer of popular tastes, social trends, and socio-economic climate of a place and time. Significantly, fashion schools in New York including MAD, Pratt and Parsons New School reflect scholarly approaches in fashion studies. Social Network Services, especially Instagram, Twitter, and Snapchat capture the now and at the best of time it can represent a momentary unified cultural experience, or at worst, fleeting moment to moment frivolities demanding your attention.

 

As of December in 2014, Instagram reported there were over 300 million unique users uploading more than 70 million photos and videos every day. The New York Times deemed this phenomenon as “fashion in the age of Instagram.” A new era has arrived where digital media is changing not only the way clothes are presented, but even the way they are designed. Active users help enriching and indexing databases by interacting with images and videos in real time. For example, if an Instagram user tagged an image with #sprezzatura, this image would function as the index on the site. Then, he or she  would search the image and check if it matched with one’s own standard.  If three users negate the image, the images would be discarded from the historical timeline. Unfortunately, these types of applications are not designed to an archival study in mind, so all the pictures taken can be outdated and lost. Although new digital tools offer novel access to capture and share images, they do not include the voices of individuals or trends outside the mainstream fashion world.

 

Recently, a proliferation of fashion forums as known as thefashionspot.com and lookbook.nu allow to share the snapshots and comments on current fashion trends. Although these sites host a wealth of fashion community images, they are often niche curated by fashion experts. Fashion has developed as a history told by the victors, meaning the designers and brands that endure financially and the celebrities they dress, while leaving small designers out of the narrative. There is no fashion social index on fashion images. Thus, we pursue a big data-drive look for fashion timelines which has not been operated by fashion experts. Instead, we will aggregate the imageries which help produce and discover new types of fashion languages. Franco Moretti discussed this aggregation approaches in his book Graphs, Maps, Trees that “the lost 99 percent of the archive and reintegrating it into the fabric of literary history.”  In this context, we will bring the 99% of the forgotten fashion archives and replace it into the dialogue. This project will lay the groundwork for how we can capture snapshots in fashion at a specific time and place.

What problem does this solve?

 

This project provides a more permanent snapshot of rapidly changing fashion trends by pulling them from sources that are not designed with archiving in mind (Instagram). The internet has this simultaneous problem of hanging around forever while also being here today, gone TODAY.

 

What lacuna does it fill?

In cities like New York there is a constant demand for data on fashion tastes and industry.

By Aggregating scattered sources of data and crowdsourcing we will create a more unified view of fashion in New York in a way that was previously unfeasible due to a lack of archiving.

Issues of broken images and credibilities of data.

 

What similar projects are there?

There are ongoing projects to digitize fashion records, namely the Vogue Archive, The Women’s Wear Daily Archive, and the Council of Fashion Designers of America Archive, but each of these are institution-specific. The Bloomsbury Fashion Photography Archive, part of the Berg Fashion Library, is in the process of archiving over 600,000 images indexed by expert scholars, but these images are mainly comprised of runway photos from the Niall McInerney Fashion Photography Archive. There has been a proliferation of fashion forums in recent years, such as the thefashionspot.com and lookbook.nu, where fashion enthusiasts gather to share photos of themselves and comment on current fashion.

What technologies will be used?

Instagram API- data  mining- digging instagram, spatial analysis, some basic website building tools like WordPress. Integrating analytics data into visualization.

Which of these are known?

Data, layer verification, indexing, database, tags, the NYPL picture collection

 

Which need to be learned?

We need to learn how to yank the API from Instagram through Apigee, better understanding of analytics integration and UX design.

How will the project be managed?

Documents and communication will be retrained on Google drive, but we’re also considering looking at the Graduate Center Redmine.

 

Milestones

Week 1: Gathering the team, organizing roles, defining tasks

Week 2: Website mockup draft and project text and branding drafts

Week 3: Website live with project description and placeholder for indexing interface

Week 4: Social media accounts live, project promotion begins

Week 5: Instagram API loads into indexing interface

Week 6: Users can interact with indexing interface to tag and validate images

Week 7 – 10: Ongoing indexing and data cleaning

Week 11: Initial visualizations prepared

Week 12: Final visualizations uploaded to website

 

Fashion Index User Stories

IMG_20150217_221537

 

Fashion studies is an interdisciplinary academic field. We believe it strongly bridges to social science areas including sociology, anthropology, and psychology.  People who came from fine arts and design studies as well as those who have social science background of studies will show integrating interests to our project.

1. Socioeconomic scholars

They will analyze data from instagram based on geographic information, neighborhood, and economic status. Instagram users tag their location and brand of clothes once they post their ootd (outfit of the day).  Based on the information, socioeconomic scholars can predict the users’ economic climate. Depending on neighborhoods and districts, people dress differently. Thus, dress code symbolizes the occupations along with the social classes. Certain clothes are dressed in special occasions to discuss social or political issues and to give messages.

2. Psychologists

Fashion is a form of expressing oneself.  Psychologists can predict what certain groups of people want to express through their clothing. They can also read the psychological fabrics of the people   through patterns, colors, and styles. What’s more, fashion is deeply related to the psychology of the consumers, and psychologists can analyze and predict its trend.                  Personal style without chasing the trends defines one’s identity through what one is wearing and reflects strong points what looks good on one’s body shape.

3. Costume Designers

They work on the design of items of clothing  and pay attention to specific reference materials especially textiles and colors. They will get inspiration to design their clothes for the performers especially from film and broadcasting. In particular, costume designers are responsible for overall look of the clothes and costumes in theatre, film, or television producers. They need to have excellent design skills as well as organization skills to lead the team. They should also reflect the socioeconomic and psychological aspects of the people’s life in different backgrounds.

In this project, we aim to provide straight facts about the who, what, when, where. The tags will be listed by either the initial user or added by crowdsourcing.

HUAC User Stories

User Story #1: A forensic computational linguist doing research on how interviewing style impacts witness responses. The value of the site to the user is being able to compare friendly vs. unfriendly witnesses (difficult to determine in general court transcripts) and the sheer number of available court transcripts available (also difficult to collect re general court transcripts). The person clicks the API link and follows the prompts to extract a cluster of readings from unfriendly witness testimony, and does a second export for a cluster from friendly witness testimony. The API exports the two corpora into an intermediary location (such as Zotero), which can be used with Python (NLTK) to compare, for example, the number of times interviewers repeated question for friendly vs. unfriendly.

User Story #2: High school civics & US history teacher, Chris. He is wants to assign the students to search the archive to find primary source documents from the McCarthy era. Students will have a list of topics and names to choose from as their research areas. Chris tests the site to see if it will be useful to his students. Chris uses the simple search box to search for both topics such as ‘treason’ and ‘democracy’. Chris uses the advanced search options to combine topics with names.  Chris is looking for clean results pages, the option to save and export searches, and help with citation.

User Story #3: American Political Scientist, Jennie, doing research on US government responses during periods of perceived national security threats. Specifically, she is interested in the Foreign Intelligence Surveillance Courts (FISC), which are closed courts. Jennie wants to read about now-released documents that record the conduct of closed courts. Jennie wants to do mixture of qualitative and quantitative analysis. Qualitative: Jennie uses the advanced search to specify she wants only to look at hearings that have been identified in the category as ‘closed’ hearings, then does the same to specify only ‘open’ hearings. Quantitative: Jennie uses API and follows the prompts to extract data with the category ‘closed’ and date filter to do a statistical analysis of number of closed trials by year and how/if correlated to outside events.