With our corpus defined and development goals set, the team is taking a two-pronged approach to the reaching the final project. While Chris and Steve focus on continuing to develop and code the working project, Kelly and Jojo have turned their attention to the work to be done with the corpus. Equally as important as building TANDEM is the ability to show a proof-of-concept and illustrate the value of the output TANDEM generates. While the duties among the team will still bleed as there is still design work that may arise for Kelly, outreach to be done by Jojo, and theoretical questions for Chris and Steve to weigh in on, our focus is much more pointed on particular pieces of achieving a functioning and valuable tool and methodology.
A key milestone was reached this week when the text processing backend coding was completed. It will need to be thoroughly tested which the team expects to complete by 3/31. Additional work requires that the program be merged with the image processing code. This integration step is targeted for completion on 3/24. The current version of the program can be found on Github. The repo contains a number of test data files as well as documentation. The core program is TandemText.py.
The team decided to abandon the Flask web framework in favor of Django, primarily because there is much more local support (from the Digital Fellows) for Django. We were able to switch because we did not have a significant code base built in Flask and much of the work done on Flask may transfer well to Django. Optimistically, the team should be able to get a pilot “Hello World” application running under Django on the Reclaimhosting.com server (with help from Zach Davis and Tim Owens).
Finally, on the development front, the team needs to envision and plan for how we will persist data on the website. Will persistence even be see as a valuable feature by the users? If so, how will store and secure the data? How will we handle requests to amend or edit an existing result set? These decisions are pending, likely to be addressed at the 3/24 class.
This week TANDEM has maintained its twitter activity. Jojo is also working on reaching out to new communities while developing useful skills — she has taken on work at the Tow Center for Digital Journalism and is exploring possible applications of TANDEM there, and she got accepted to Django Girls next weekend and was assigned her team. She looks forward to meeting a number of people across disciplines and fields.
Here are some folks that might be interested in tandem. They’re doing a crowd-sourcing transcription project of seed catalogs because “Seed catalogs are notoriously difficult subjects for Optical Character Recognition software (OCR) to parse (which produces searchable text files of digitized images), so searching the text of online vintage seed catalogs is often problematic. The picturesque fonts and elaborate page layouts so endearingly characteristic of seed catalogs cause the resulting OCR output to be error prone and less than optimal” http://blog.biodiversitylibrary.org/2015/03/help-us-improve-access-to-seed-and.html