Tag Archives: MVP

TANDEM Project Update 4.11.15

TANDEM Week 9 Presentation

TANDEM: A Brief Agenda

I. Review our project goals

  • Discuss new interested users (advertising, biodiversity cataloging)
  • Discuss output applications in “Mother Goose Counts”

II. Describe our development drive

  • Branches of Dev underway
    • UI/UX dynamic pages
    • Django framework
    • TANDEM tool python script

III. Explain our development steps

  • Two parallel paths were followed building Python “backend” code to run the analytics on the users’ input files
  • The paths were merged and tested on a laptop
  • The Python environment was then built on the server
  • A command line versionTANDEM will now run on the server using local server-based files.
  • @sreal19 will Demo TANDEM! (Fasten your seatbelts, folks!)

IV. Discuss next steps

  • What still needs doing hooking up front and back ends.
  • Getting polished examples of our output up along with clear links to available datavis resources.
  • Getting Kelly’s best practices documentation live.
  • Outreach (not just to beta testers, but to users who might not have considered these tools before — looking for education applications/journalism
  • Now is also the time to start considering the life beyond Praxis:
  • Grants for continuing work?
  • How much labor/manpower/development would be needed to move beyond MVP?
  • What does 1.0 look like?

Thanks for following and stay tuned for updates!

@dhTANDEM #picturebookshare

tufte retweet

 

 

CUNYcast Update week 5

SHOUT IT OUT WITH CUNYcast!

CUNYcast is moving forward and expanding our knowledge of the technical requirements involved in online radio broadcast. This week major strides were taken in outreach and development.

  • Contact was made with support and specialty knowledge in online radio broadcast technology (Mikhail Gershovich)
  • Reclaim hosting server space was finalized
  • Icecast and Artime were uploaded to server space.

THE MINIMUM VIABLE PRODUCT (MVP)

CUNYcast is a live online radio website offering students an opportunity to stream audio using original content from classes, lectures, and projects. CUNYCast’s aim is to empower a DH guerrilla broadcast community.

CUNYcast will reach out to the GC through an academic commons page that will link users, listeners, and curious DHers to our CUNYcast web presence. The CUNYcast web page will have a space for listeners to listen to the live streaming CUNYcast content. It will have a space where users may learn how to access the CUNYcast live stream and upload their own content. CUNYcast is designed to inform and inspire its users, to facilitate this experience CUNYcast’s web page will house a manual that will empower user’s to add their own content to the CUNYcast live streaming radio and inform them on how they could create their very own digital live stream radio channel. A portion of the manual will help users learn how to create their own audio content if the wish to explore a more polished radio stream format.

As an added bonus the CUNYcast website will have links to educational audio content and pedagogy surrounding teaching practices that utilize audio creation as mode of production.

Technical specifications for MVP

The technical map of CUNYcast lays in the Icecast media server and the Airtime client used to manage media on the media server. This back end structure will be given its public face on our website and our cuny commons presence.projectmap03-02-2015

Outreach: Report of activities to date

How To Succeed Even When You Fail

Spring semester 2015. Our Digital Humanities class broke into teams. We were only mildly anxious. Like the television show, “Shark Tank” which features new pitches for products and services each week, we were convinced our ideas were sound and that we could excel. The thing was, within just a few days we started to drown. Instead of devouring the material and spitting it back out for human consumption, we started sinking in a sea of possibilities. No tech geeks on our team. Just dreamers. That didn’t stop us from grabbing at every idea that seemed to float.

But, wait, our group of four people diminished to only three by week two. Man down. He disappeared and dropped the class (we wished him well). The three of us had to take a good hard look at the CUNYcast concept and decide what would assure our chances of survival. (Think of the music to Jaws playing underneath these words).

We took our overblown idea of a RSS-feed calendar linked into the CUNY system, that would record remotely via an app, after two afternoons of staring at code and realizing that by the time the project was due, we’d maybe have gotten through a couple of introductory tutorials. There was no way any of us would be coding experts in 12 weeks.

We trimmed the fat. Bit back with strength and vigor, and began on the current instantiation of CUNYcast: a live online radio website offering students an opportunity to stream audio using original content from classes, lectures, and projects. Our professors urged us to aim outside the box and empower an entire DH guerrilla broadcast community at the Graduate Center. Reporting in on week 4 and things are going swimmingly. We’ve gelled as a team and we’re optimistic.

We are not afraid anymore We are not afraid anymore – even if we should be.

Development:

This week, the goal was to configure an Icecast Media server in a local environment.Airtime and Icecast were configured on our server when we received the server configuration thanks to Reclaim hosting.

Icecast is (again) a media server. When you have an online radio station, the media server is where the audio/video lives for the duration of the stream, sort of an intermediary between the streamer (host machine) and the watcher (listener). Airtime is sort of a GUI that gives a face to the media server. Not only does it make the media server friendlier, it also makes it prettier. Airtime comes with a calendar that allows shows to be planned in advance.

One interesting thing about media servers, is that if someone has the access information to an Icecast server (ds106 allows their’s to be public, as will we, that’s kind of the point) they can use broadcasting programs to take over the station. If another person tries to take over the station when a show is going on, they’ll be met with an error. Airtime simplifies this with the above-mentioned calendar feature, as it allows users to see when shows are planned, and as such, schedule their planned broadcasts around that. Of course, this also allows for anarchy…

Bugs! The Icecast Server worked perfectly. We were able to access it via broadcasting software (Mixxx) and pick up that broadcast via VLC and browser (the address currently being cunycast.net:8000/live, kind of ugly) . However, Airtime specifically had some trouble connecting to our Icecast server, even after multiple troubleshooting attempts. When transmitting via Airtime, a connection could be established to the Icecast server for roughly ~10 seconds before falling flat, despite Airtime claiming the show was still airing. I hate it when machines lie to me. Anyway, after doing some GoogleFu I came across a thread on the SourceFabric forums (SourceFabric developed Airtime) about this exact problem. The fix stated in the thread claimed that  I needed to restart certain Airtime services via commandline using the “sudo” command. Sounds scary. Because Airtime was installed for us, I was a little worried about messing it up, fearing that I would have to reinstall things that I do not understand. However, we were able to fix the bug more easily, by switching the broadcasting format from OGG Vorbis to simpler MP3 format.

development goals include:

  • Figure out how to interact with Airtime via command line (need help from Digital Fellows here)
  • Bring the backend media server to the front ASAP such that we have a simpler/prettier way for users to tune in.
  • Implement an AutoDJ to play over the station and maintain it when no broadcasts are coming in (this is where we may need to talk to a ds106 person).
  • Determine how incoming users will be able to manipulate/interact with Airtime.

Design:

Slow progress is being made constructing the structure and elements to the CUNYcast web presence using Bootstrap. The pre organized Java and CSS allows for immediate product but there is still a bit to understand about the addition of and linking to media.

The CUNYcast Academic commons site is being designed to mirror the CUNYcast website.

CUNYcast

The guide on how to create websites is being updated to make sure that the CUNYcast manual evolves as the project evolves.

Digital HUAC: MVP Post

Over the course of this project so far, and in relation to the feedback that we’ve been receiving, we have scaled up and down our goals and expectations. It has been both humbling and empowering to consider everything we can do within the constraints of a single semester project. When asked to brainstorm our minimum viable product (MVP) this week, over a conference call we all agreed on the following:

– a central repository with basic search functionality that stores our corpus of 5 transcripts.

– a database that can be scaled.

What does this mean, and how does it differ from our current project goals?

We are attempting to generate a platform that connects a relational database to a robust search interface and utilizes an API to allow users to extract data. We envision Digital HUAC to be the start of a broader effort to organize HUAC transcripts and allow researchers and educators access to their every character. By allowing advanced searches driven by keywords and categories, we seek to allow users to drill down into the text of the transcripts.

Our MVP focuses on storing the transcripts in a digital environment that returns simple search results: in absence of a robust search mechanism, users would instead receive results indicating, for example, that a sought after term appeared in a given transcript and not much more.

Our MVP must lay out a model for a scalable database. We are still very much figuring out exactly how our database will operate, so it is hard to fully commit to what even a pared-down version of this would look like. But we know that the MVP version must work with plain text files as the input and searchable files as the output.

Generating an MVP has been a useful thought experiment. It has forced us to hone in on the twin narrative and technical theses of this project: essentially, if everything else was stripped away, what must be left standing. For us, this means providing basic search results and a working model of a relational database that, given appropriate time and resources, could be expanded to accommodate a much greater corpus.

TANDEM Project Update Week 5

TANDEM_big logo

Excitement!

Team TANDEM is working fast and furiously on all fronts. We’ve hit a few snags but all told, we feel like we’ve got a handhold on the mountains we’re climbing. Here’s a brief overview of the ups and downs of the week:

  • Our hope we might springboard off Lev’s tool proved somewhat castles in the air. Lev’s feature extractor was coded in a day. When they went to try to run it again later they couldn’t. Lev suggested we use OpenCV instead.
  • OpenCV seems to be a massive and constantly shifting morass of dependencies.
  • Jojo attended a couple of talks from Franco Moretti and spoke with him afterwards to see if anyone at Stanford was doing anything similar. While he acknowledged the validity of studying text as image, he seems to show no further interest. Bummer, but his loss.

The Details

In development we’ve got a working program for OCR and NLTK (Go Steve!), and we’re making strides in OpenCV (Go Chris!). Lev suggested that we have two different types of picture books for our test corpus — one that’s rich in color, another that’s more gray-scale with more text. These corpus variations will show the range of data values available to future users of TANDEM. Kelly’s working on scanning an initial test corpus now.

BookScanCenter_8

We also have our hosting set up with Reclaim thanks to Tim, as well as a forwarding email domain. Go ahead and send us an email to dhtandem@gmail.com.

In design/UI/UX Kelly has been working on variations of a brand identity, for color schemes, logo, web design elements… all of it (Go Kelly!). TANDEM_Circle GlyphsThe UI is primed to go now that we have hosting for TANDEM. Kelly is currently working on identifying the code for the specific UI elements desired, for an “ideal world” situation. The next steps for design/UI/UX are to pick a final brand image, and apply it to all our outreach initiatives.

In outreach, TANDEM had a good meeting with Lev on Wednesday. He seems to think we’re doing something other DHers aren’t quite doing. We’re not yet convinced that it’s not just because it’s crazy hard. Either way, we’re up for it. Otherwise, Jojo has been working it hard (Go Jojo!) on all outreach fronts. This week we received interest from Dr. Bill Gleason at Cotsen Children’s Library at Princeton, where they’re working on ABC book digitization and seem especially interested in our image analysis. This response is proof of relevance in the field.

In regards to social media, we now have a proper twitter handle, which we will admit happened in the middle of last week’s class thanks to some pressure from Digital HUAC already having one. You can follow us @dhTANDEM. More on Twitter: TANDEM had a couple really useful retweets (hurray Alex Gil, massively connected Columbia DHer!) that generated some traffic on our website (jetpack has us at 138 views so far, which is not a ton, but it’s a start!) and has won us some good DH followers — @NYCDH, @trameproject. We’ve transferred #picturebookshare to the @dhTANDEM account, and inviting our followers to participate, as well as use it as a means to suggest additional items for our test corpus.

THE MINIMUM VIABLE PRODUCT (MVP)

TANDEM_wireframe

MVP version #1

Because TANDEM is leveraging tools that already exist, one very basic minimum deliverable is that TANDEM makes OCR, NLTK, and OpenCV easy to use. Moreover, if TANDEM itself is not easy to use, there is no inherent advantage in using TANDEM over simply installing the existing tools and running them.

TANDEM as this minimum deliverable would solve the issue of having these tools in a web based environment, relieving users of the laborious headache of installing the component elements. Even after installing the component elements a user would likely have to write code to obtain the required output. TANDEM will shield the user from that need to be a programmer. At this minimum deliverable, TANDEM has not wrapped the three together into a single output.

MVP version #2

A second, more advanced minimum viable product would be to have a website on which a person could upload high resolution TIFF files, press a “run TANDEM” button, and receive a .CSV document containing the core (minimum) output.

The minimum output will consist of six NLTK values (average word length, word count, unique word count, word frequency (excluding stop words), bi-grams and tri-grams) and three image statistics for each input page provided by the user. We hope to expand the range of file types that we can support and to improve the quality of our OCR output, as well as build more elaborate modules for feature detection in in both text and illustrations. However,  we contend that demonstrating the comparative values of a couple corpora of picture books will prove that there is relevant information to be found across corpora with heavy image content.

MVP #3

We are shooting for a single featured MVP. A user comes to www.dhTANDEM.com and uploads a folder of image files following the printed instructions on the screen. They are prompted to hit an “analyze” button. After a few moments, a downloadable file is generated containing OCR-ed text, key data points from the OCR-ed text, and key feature descriptors from the overall image. This is purely for an early adopter looking to generate some useful data so that they can continue working on their story and/or data visualization.​

Concerns

Overall things are going swimmingly. But of course there are concerns. This weeks concerns range from:

  • Can we get the OpenCV to do what we need in the time we have available? This seems to be the element that people really want — the visuality of illustrated print.
  • Will be able to scale the project to process the number of pages we would need for users to get the results that would prove TANDEM’s value?

These are, of course, huge questions. But to put it all in perspective: Stephen Zwiebel told Kelly this week that DH Box was held together “by tape” at the time of the final project presentations, and that it has had a lot more time in the past 9 months to become stable. Not to say that we aren’t looking to have a (minimally viable) product come May, but it’s a good feeling to know where other groups were last year. Should we be sharing that widely with the class? Well, we just did. 🙂

THANKS FOR FOLLOWING @dhTANDEM!

a bicycle built by four, for two (text AND image)