Category Archives: Uncategorized

TANDEM Project Update 4.19.15

PROJECT:

DEVELOPMENT:

Steve continues to power away like some sort of half-man, half-robot, mostly magic developer. As of today, we have successfully incorporated a single-file upload functionality to our Django app. Our next action items include:

  • Testing the new upload functionality on the server
  • Implementing multi-file upload and testing on local hosts and our server
  • Hooking up the necessary analytics engine to our Django configuration
  • Adding validation and error checking

UI/UX:

With the technical side of the interactivity mapped, we are working on the mockups for the evolution of the front end. We are working through envisioning each step of the process that a user will experience in the TANDEM front end.

We have begun answering:

  • What are the users met with as a landing page?
  • Are there prompts for users ready to upload their files?
  • As the files upload, what kinds of elements will show what is happening in the backend? (Progress bars, spinners, written prompts)
  • Once the files are completed, what are the users met with?
  • How does a table look with our data fed into it for in-browser?
  • What does the download page look like post-processing?
  • Where and how are the downloads delivered?

In a short time, we will be able to show in full color and depth each of the above.

Giving life to those mockups will be the capstone for our main body of work pre-presentation.

OUTREACH:

This week’s outreach centered around a number of events:

  1. Django NYC MeetUp this Wednesday
    • Geoff Sechter continues to be a valuable resource, though his opinion seems to be that JQuery is our best bet for uploading multiple files. He’s been super patient helping explicate the particularities of Python, as well.
    • Peter Karp of Buzzfeed also had interesting ideas and recommended attending the OpenCV Office Hours hosted at Buzzfeed by Andrew  Kelleher, Adam Kelleher and Katarina Kufieta. Their next meetup is April 21 http://www.meetup.com/NYC-OpenCV-Study-Group/events/221727855/.
  2. Theorizing the Web 2015 at ICP
    • While the many fascinating panels on surveillance did not bear directly on TANDEM, several artists spoke and their work involved text image, including Claudia Pederson @cc3pc and Nicholas Knouf @zeitkunst who work on #artforspooks, and Ben Grosser, @bengrosser, who create #scaremail. Another interesting talk treated Victorian carte du visite as early social media.
    • Spoke more with Erin Glass about potential publicity for TANDEM
  3. The Verge NYC after party @Thoughtworks
    • On Tessa’s invitation, Jojo attended the closing party for the workshop week for innovative design
    • Met John Bruce, Assistant Professor of Strategic Design & Management at the New School, who seemed interested in DH overlap
    • Ran into Hannah Lane who does UX at Thoughtworks — a contact point should things get thorny moving forward with TANDEM UI/UX

You might want to know about … SSH

A number of you have been asking questions about running on your Reclaim server directly. The tool you need for that is SSH, or Secure Shell. I used to have a great SSH tip sheet but it has been removed from the internet. We can talk about that later. In the meantime, I cobbled together a not-half-bad recap of my original tip sheet.

Continue reading

week 6 project update / TANDEM

Development

On the image processing side of things, Chris has identified the syntax for generating our key values. Now we are working toward stitching the pieces together in a way that makes sense for our output. The extreme minimum of computer vision is accessible via OpenCV and while the possibilities are tantalizing, we have continued to keep a direct focus on the key pieces we need to access for the mvp. TANDEM is still on track.

We have also begun to reevaluate our progress. To do so, we created a new list of dev tasks that range from bite-sized to larger steps so we can visualize how much further we have to go. Steve has been doing a great job of keeping track of progress and using git for version control of his scripts.

In addition we successfully implemented a routine to convert PDF to TXT. Input files are screened by type. If they are JPG, PNG or TIFF, they are passed to Tesseract for OCR processing. If they are PDF they are passed to a PDFMiner routine that extracts the text. In each case the program writes TXT files to “nltk_data/corpora/ocrout_corpus” with a name that matches the first order name of the input file. The latest version of the backend code is here: https://github.com/sreal19/Tandem

Web functionality remains problematic. Most effort this week has been merely trying to get through the Flask tutorial.

To end on a positive note, developmentally, good progress has been made with Text Analysis processing. We are computing the word count and average word length for a single page. The program also creates a complete list of words for each input file. In the very near future work will be completed to create a list of unique words and the count of each. The team must make a decision about whether to strip punctuation from the analysis, since many of the OCR errors are rendered as punctuation.

Design/UI/UX

We’ve been working to identify the ideal UX functionalities for javascript. Most of this was fairly straight-forward, such as giving the user the ability to browse local folders & view a progress bar of the upload/analysis. It has been difficult to locate a script to produce error messages. Searching for anything involving “error” in the name retrieves a different type of request, and “progress” only gets to half of the need.

For instance, we had discussed having the ability to let users identity upload/analysis errors by file, either with a prompt on the final screen or with indicator text in the CSV output. Such a feature will provide the user with the ability to go back and fix the error for 1 file, versus having to comb through the entire corpus and re-uploaded. An example of how this would look would be something this, with text & visual cues that indicate that  which file needs review:

an image of suggested UX functionality to identify errors in file uploads

There is some documentation on Javascript progress events and errors, but we need to need to discuss how it could be employed for TANDEM, and whether its necessary for the 0.5 version.

Outreach

Twitter continues to be the primary platform for outreach. While #picturebookshare continues to chime away, we are also now using it to generate research ideas for potential TANDEM users. Fun distant futures for TANDEM might involve the visual trajectories of various aspects of books: visuality of covers or book spines, as well as the visual history of education materials.

Jojo spoke with Carrie Hintz, who has is starting a Childhood Studies track via the English Department, to see if she knew anyone studying illustrated books at the GC. She has no leads yet, but said come the fall she would have a better idea of people interested in TANDEM. Meanwhile, Long LeKhac, an English PhD at Stanford, was giving her a sense of the DH scene there and said he would ask around the DH community beyond Moretti’s lab. Jojo is in the process of devising outreach to text studies experts — Kathleen Fitzpatrick at MLA, Steve Jones — and folks in journalism — Nick Diakopoulos, NICAR and Jonathan Stray, per Amanda Hickman’s suggestion. Keep on keeping on — keep the tweets t(w)eeming.

CUNYcast Update week 5

SHOUT IT OUT WITH CUNYcast!

CUNYcast is moving forward and expanding our knowledge of the technical requirements involved in online radio broadcast. This week major strides were taken in outreach and development.

  • Contact was made with support and specialty knowledge in online radio broadcast technology (Mikhail Gershovich)
  • Reclaim hosting server space was finalized
  • Icecast and Artime were uploaded to server space.

THE MINIMUM VIABLE PRODUCT (MVP)

CUNYcast is a live online radio website offering students an opportunity to stream audio using original content from classes, lectures, and projects. CUNYCast’s aim is to empower a DH guerrilla broadcast community.

CUNYcast will reach out to the GC through an academic commons page that will link users, listeners, and curious DHers to our CUNYcast web presence. The CUNYcast web page will have a space for listeners to listen to the live streaming CUNYcast content. It will have a space where users may learn how to access the CUNYcast live stream and upload their own content. CUNYcast is designed to inform and inspire its users, to facilitate this experience CUNYcast’s web page will house a manual that will empower user’s to add their own content to the CUNYcast live streaming radio and inform them on how they could create their very own digital live stream radio channel. A portion of the manual will help users learn how to create their own audio content if the wish to explore a more polished radio stream format.

As an added bonus the CUNYcast website will have links to educational audio content and pedagogy surrounding teaching practices that utilize audio creation as mode of production.

Technical specifications for MVP

The technical map of CUNYcast lays in the Icecast media server and the Airtime client used to manage media on the media server. This back end structure will be given its public face on our website and our cuny commons presence.projectmap03-02-2015

Outreach: Report of activities to date

How To Succeed Even When You Fail

Spring semester 2015. Our Digital Humanities class broke into teams. We were only mildly anxious. Like the television show, “Shark Tank” which features new pitches for products and services each week, we were convinced our ideas were sound and that we could excel. The thing was, within just a few days we started to drown. Instead of devouring the material and spitting it back out for human consumption, we started sinking in a sea of possibilities. No tech geeks on our team. Just dreamers. That didn’t stop us from grabbing at every idea that seemed to float.

But, wait, our group of four people diminished to only three by week two. Man down. He disappeared and dropped the class (we wished him well). The three of us had to take a good hard look at the CUNYcast concept and decide what would assure our chances of survival. (Think of the music to Jaws playing underneath these words).

We took our overblown idea of a RSS-feed calendar linked into the CUNY system, that would record remotely via an app, after two afternoons of staring at code and realizing that by the time the project was due, we’d maybe have gotten through a couple of introductory tutorials. There was no way any of us would be coding experts in 12 weeks.

We trimmed the fat. Bit back with strength and vigor, and began on the current instantiation of CUNYcast: a live online radio website offering students an opportunity to stream audio using original content from classes, lectures, and projects. Our professors urged us to aim outside the box and empower an entire DH guerrilla broadcast community at the Graduate Center. Reporting in on week 4 and things are going swimmingly. We’ve gelled as a team and we’re optimistic.

We are not afraid anymore We are not afraid anymore – even if we should be.

Development:

This week, the goal was to configure an Icecast Media server in a local environment.Airtime and Icecast were configured on our server when we received the server configuration thanks to Reclaim hosting.

Icecast is (again) a media server. When you have an online radio station, the media server is where the audio/video lives for the duration of the stream, sort of an intermediary between the streamer (host machine) and the watcher (listener). Airtime is sort of a GUI that gives a face to the media server. Not only does it make the media server friendlier, it also makes it prettier. Airtime comes with a calendar that allows shows to be planned in advance.

One interesting thing about media servers, is that if someone has the access information to an Icecast server (ds106 allows their’s to be public, as will we, that’s kind of the point) they can use broadcasting programs to take over the station. If another person tries to take over the station when a show is going on, they’ll be met with an error. Airtime simplifies this with the above-mentioned calendar feature, as it allows users to see when shows are planned, and as such, schedule their planned broadcasts around that. Of course, this also allows for anarchy…

Bugs! The Icecast Server worked perfectly. We were able to access it via broadcasting software (Mixxx) and pick up that broadcast via VLC and browser (the address currently being cunycast.net:8000/live, kind of ugly) . However, Airtime specifically had some trouble connecting to our Icecast server, even after multiple troubleshooting attempts. When transmitting via Airtime, a connection could be established to the Icecast server for roughly ~10 seconds before falling flat, despite Airtime claiming the show was still airing. I hate it when machines lie to me. Anyway, after doing some GoogleFu I came across a thread on the SourceFabric forums (SourceFabric developed Airtime) about this exact problem. The fix stated in the thread claimed that  I needed to restart certain Airtime services via commandline using the “sudo” command. Sounds scary. Because Airtime was installed for us, I was a little worried about messing it up, fearing that I would have to reinstall things that I do not understand. However, we were able to fix the bug more easily, by switching the broadcasting format from OGG Vorbis to simpler MP3 format.

development goals include:

  • Figure out how to interact with Airtime via command line (need help from Digital Fellows here)
  • Bring the backend media server to the front ASAP such that we have a simpler/prettier way for users to tune in.
  • Implement an AutoDJ to play over the station and maintain it when no broadcasts are coming in (this is where we may need to talk to a ds106 person).
  • Determine how incoming users will be able to manipulate/interact with Airtime.

Design:

Slow progress is being made constructing the structure and elements to the CUNYcast web presence using Bootstrap. The pre organized Java and CSS allows for immediate product but there is still a bit to understand about the addition of and linking to media.

The CUNYcast Academic commons site is being designed to mirror the CUNYcast website.

CUNYcast

The guide on how to create websites is being updated to make sure that the CUNYcast manual evolves as the project evolves.

Database Question

Not too long ago (within the last couple of years, I think) Oracle acquired MySQL and there was, I know, a fair amount of concern within the open source community that Oracle wouldn’t support it very well–that they might even deliberately try to kill it or convert it into a profit making product. Perhaps these concerns have come true? Does anybody have a sense that the open source community is moving away from MySQL or whether Oracle has done a good job supporting this DBMS?

Moretti in March

Happy March DH Praxers!

I just wanted to share information the lovely digital fellow Erin Glass alerted me to —

Mr. Graphs, Maps, and Trees himself is in town this week!

Franco Moretti is speaking at NYU and both events are open to the public:

I cut and pasted from the NYU site:

  • Wednesday, March 4th, 6:00 p.m.
    First Wednesday Speaker Series and English Dept. Annual Goldstone Lecture: “Micromégas: The very small, the very large, and the object of Digital Humanities,” Franco Moretti (Stanford University)
    Location: Room 102, Cantor Film Center
  • Thursday, March 5th, 12:30 p.m.
    Goldstone Seminar: “Canon/Archive. Large-scale Dynamics in the Literary Field,” Franco Moretti (Stanford University)
    Please RSVP here.
    Location: The Event Space, 244 Greene St.
    I am going and will report back for those who can’t make it!
    -Jojo

Digital HUAC – Workplan & Wireframe & Update

Wireframe:

wireframe

Workplan:

Digital HUAC - Workflow_Page_1

Workplan: what & why

Pages from Digital HUAC - Workflow

Workflow 

The documents (which are already scanned) will be manually tagged using an XML editor according to identified categories, then read into an open-source relational database (MySQL), which reads XML documents. The MySQL database will be incorporated into the website using PHP in conjunction with the site (syntax—PHP within the HTML/CSS site schema). Finally, the API will allow users to export their searches to text-analysis resources.

Historians and Corpus

We’ve identified a number of historians, librarians and archivists, and digital humanists to potentially work with on this project and are in the process of reaching out to them in an advisory capacity. We seek guidance on our taxonomy and controlled vocabularies in the short term, and on future developments of our project beyond the scope of this semester.

At the top of this list are historians Blanche Cook and Josh Freeman, CUNY professors and experts on the HUAC era. Steve Brier is in the process of introducing us to both Cook and Freeman. Other historians include Ellen Schrecker (Yeshiva), Mary Nolan (NYU), Jonathan Zimmerman (NYU), and Victoria Phillips (Columbia), each with subject expertise and research experience on the time, events, and people central to Digital HUAC. We have also identified Peter Leonard, a DH librarian at Yale; David Gary, the American History subject specialist at Yale who holds a PhD in American History from CUNY; John Haynes, a historian who served as a specialist in 20th-century political history the Manuscript Division of the Library of Congress; and Jim Armistead and Sam Rushay, archivists at the Truman Library, as potential advisors.
We have narrowed down the corpus of text that we’ll be working with to include 5 transcripts: Bertold Brecht; Ronald Reagan; Ayn Rand; Pete Seeger; and Walt Disney. This list of major cultural figures spans the hearings themselves and features both friendly and hostile witnesses, offering users a varied look into the nuances of interrogation. It is our opinion that by focusing on a witness base of recognizable figures that is thematically organized, users may examine their testimony as individuals and in context with one another. This quality of the HUAC hearings cannot be understated, and Digital HUAC seeks to draw attention to it through the overall user experience.