Fashion Index Weekly Update

Screenshot_2015-03-10-16-54-23-1

Currently, Instagram carries more than a million images of #sprezzatura.  We should start learning API and MySQL for data mining. To solve the technology parts, we visited Digital fellows at the media lab. My SQL  fundamentally interact with data and establish the connection between the server. One of the digital fellows Even said we need to set up the local copy of My SQL. He also suggested to learn API Python. I will look over Python Library as well.

The developer, Tessa approached Shelley Bernstein at the Brooklyn Museum, but we could not access her. Anyway, her tagging project was very useful.

Limitation: Instagram API does not offer geospatial information. In this stage, we should find third party applications for the mapping.

For coding, we need text wrangler which functions as text editor, program scripts, and code translator. The outreach, Renzo is using Filezilla for coding. Also, Renzo and Minn (Designer) started working on mapping. They installed the program called R and GGmap. Those two programs interact with google map and import images from google map. Also they provide information of longitude and latitude based on the google map.

 

 

State of Your Projects

Hi All:

Below are some notes on the immediate needs of each project that we hope will help focus your work through the next week and a half.

Luke & Amanda

CUNYCast

  • Solidify live streaming via Icecast and broadcast via Airtime paths, and identify security/gate keeping/bandwidth challenges of each.
  • Keep working to master Bootstrap for your site.
  • Consider implementing a git repository and start using it to collaborate.

NYCFashionIndex

  • Go to school on Instagram API. (Tessa)
  • Get MySQL running and start exploring SQL queries to put data into the database and take it back out. (Someone other than Tessa)

TANDEM

  • Merge image/text processing scripts.
  • Think through recommended pivot to Django.
  • Send us your Github repos.

HUAC

  • We are recommending a shift to PHP over Python for scripting in this project.
    • More precisely: get a form action up and running PHP. Get at least as far as showing me what I entered into the search form, possibly even showing a user the API call that such a search should generate.
  • Go to school on what you can get from DocumentCloud, and start to come up with some example HTML pages for what results might look like.

You might want to know about … git

If you’ve got a couple of different folks working on the same code base, you need some version control. Emailing files back and forth is not going to work for very long. Trust me. Git is a distributed version control system that allows multiple users to work on the same code and resolve conflicts (like two people editing the same file at the same time.) Even if it is just you, version control is your friend. This used to work, and then you did … something … and now it doesn’t? Roll back to the version that worked.

Git is a free and open protocol. Anyone can implement it. And there are people who will tell you that what makes git great is that it doesn’t depend on a central hub. Which is true. That is pretty cool. It is also more or less irrelevant to you because you are going to use a central hub. Probably github, that’s what most people use.

Continue reading

You might want to know about … SSH

A number of you have been asking questions about running on your Reclaim server directly. The tool you need for that is SSH, or Secure Shell. I used to have a great SSH tip sheet but it has been removed from the internet. We can talk about that later. In the meantime, I cobbled together a not-half-bad recap of my original tip sheet.

Continue reading

week 6 project update / TANDEM

Development

On the image processing side of things, Chris has identified the syntax for generating our key values. Now we are working toward stitching the pieces together in a way that makes sense for our output. The extreme minimum of computer vision is accessible via OpenCV and while the possibilities are tantalizing, we have continued to keep a direct focus on the key pieces we need to access for the mvp. TANDEM is still on track.

We have also begun to reevaluate our progress. To do so, we created a new list of dev tasks that range from bite-sized to larger steps so we can visualize how much further we have to go. Steve has been doing a great job of keeping track of progress and using git for version control of his scripts.

In addition we successfully implemented a routine to convert PDF to TXT. Input files are screened by type. If they are JPG, PNG or TIFF, they are passed to Tesseract for OCR processing. If they are PDF they are passed to a PDFMiner routine that extracts the text. In each case the program writes TXT files to “nltk_data/corpora/ocrout_corpus” with a name that matches the first order name of the input file. The latest version of the backend code is here: https://github.com/sreal19/Tandem

Web functionality remains problematic. Most effort this week has been merely trying to get through the Flask tutorial.

To end on a positive note, developmentally, good progress has been made with Text Analysis processing. We are computing the word count and average word length for a single page. The program also creates a complete list of words for each input file. In the very near future work will be completed to create a list of unique words and the count of each. The team must make a decision about whether to strip punctuation from the analysis, since many of the OCR errors are rendered as punctuation.

Design/UI/UX

We’ve been working to identify the ideal UX functionalities for javascript. Most of this was fairly straight-forward, such as giving the user the ability to browse local folders & view a progress bar of the upload/analysis. It has been difficult to locate a script to produce error messages. Searching for anything involving “error” in the name retrieves a different type of request, and “progress” only gets to half of the need.

For instance, we had discussed having the ability to let users identity upload/analysis errors by file, either with a prompt on the final screen or with indicator text in the CSV output. Such a feature will provide the user with the ability to go back and fix the error for 1 file, versus having to comb through the entire corpus and re-uploaded. An example of how this would look would be something this, with text & visual cues that indicate that  which file needs review:

an image of suggested UX functionality to identify errors in file uploads

There is some documentation on Javascript progress events and errors, but we need to need to discuss how it could be employed for TANDEM, and whether its necessary for the 0.5 version.

Outreach

Twitter continues to be the primary platform for outreach. While #picturebookshare continues to chime away, we are also now using it to generate research ideas for potential TANDEM users. Fun distant futures for TANDEM might involve the visual trajectories of various aspects of books: visuality of covers or book spines, as well as the visual history of education materials.

Jojo spoke with Carrie Hintz, who has is starting a Childhood Studies track via the English Department, to see if she knew anyone studying illustrated books at the GC. She has no leads yet, but said come the fall she would have a better idea of people interested in TANDEM. Meanwhile, Long LeKhac, an English PhD at Stanford, was giving her a sense of the DH scene there and said he would ask around the DH community beyond Moretti’s lab. Jojo is in the process of devising outreach to text studies experts — Kathleen Fitzpatrick at MLA, Steve Jones — and folks in journalism — Nick Diakopoulos, NICAR and Jonathan Stray, per Amanda Hickman’s suggestion. Keep on keeping on — keep the tweets t(w)eeming.

Digital HUAC- Project Update

This week, our team found the answer to our biggest development hurdle- DocumentCloud. Prior to this discovery, we were trying to figure out how to create a relational database, which would store meta tags of our corpus, that would respond to user input in our website’s search form.

It turns out that DocumentCloud, with an Open Calais backend, is able to create semantic metadata from document uploads and can pull the entities within the text. The ability to recognize entities (places, people, organizations) is particularly helpful for our project since these would be potential search categories. We are also able to create customized search categories through DocumentCloud by creating key value pairs. On Tuesday, we uploaded our 5 HUAC testimonies and started to create key value pairs, which are based on our taxonomy. (Earlier this week, we finalized our taxonomy after receiving feedback on our taxonomy from Professor Schrecker at Yeshiva University and Professor Cuordileone at CUNY City Tech.) In order to create these key value pairs, we had to read through each transcript and pull our answers, like this:

Field Notes & Examples Rand Brecht Disney Reagan Seeger
Hearing Date year-mo-day, 2015-03-10 1947-10-20 1947-10-30 1947-10-24 1947-10-23 1955-08-18
Congressional Session number 80th 80th 80th 80th 84th
Subject of Hearing Hollywood Hollywood Hollywood Hollywood Hollywood
Hearing location City, 2 letter state Washington, DC Washington, DC Washington, DC Washington, DC New York, NY
Witness Name Last Name, First Middle Rand, Ayn Brecht, Bertolt Disney, Walt Reagan, Ronald W. Seeger, Pete
Witness Occupation or profession Author Playwright Producer Actor Musician
Witness Organizational Affiliation Walt Disney Studios Screen Actors Guild People’s Songs
Type of Witness Friendly or Unfriendly Friendly Unfriendly Friendly Friendly Unfriendly
Result of appearance contempt charge, blacklist, conviction Blacklist Contempt charge, but successfully appealed; Blacklist

With DocumentCloud thrown back into the mix, we had to take a step back and start again with site schematics. We discussed each step of how the user would move through the site, down to the click, and how the backend would work to fulfill the user input in the search form. (Thanks, Amanda!) In terms of development, we will need to create a script (Python or PHP) that will allow the user’s input in the search box to “talk” to the DocumentCloud API and pull the appropriate data.

websiteschematics

Amanda mentioned DocumentCloud to us a while ago, but our group thought it was more of a repository than a tool, so our plan was to investigate it later, after we figured out how to build a database. After hounding the Digital Fellows for the past couple of weeks on how to create a relational database, they finally told us, “You need to look at DocumentCloud.” Moral of the story: Question what you think you know.

On the design front, we started working in Bootstrap and have been experimenting with Github. We were able to push a test site through Github pages, but we still need to work on how to upload the rest of our site directory. This is our latest design of the site:

dhuac

THE CUNYCAST COMMONS GROUP IS NOW OPEN TO ALL

Welcome to our world. The CUNYcast Commons Group is now open to all!
Shout it out. #CUNYcast

We started this project in earnest weeks ago. But, looking back to March 1st when we posted our second process report for the DHpraxis class (after we went from a four person team to a three person team), we have definitely made headway. (Bad Shark. Go Away).

We are ready to start opening up the group to others who may be interested: 1. in signing up to produce future CUNYcasts, 2. techie-types who may want to submit their opinions as we build out this platform (Speak now or forever hold your peace). We are currently working in Bootstrap to configure our WordPress site and hope to launch by the week of April 1st.

Yesterday our site template went from this (see below) to next slide. Julia and Joy worked in tandem (get it; ha.. ha- we luv you classmates :)) on several agenda items:

Web_Page

To this:

unnamed

Next we have to wrestle with remedying the code for the header on the new pages. Go to our CUNYcast Commons Group to view this daunting (and hysterical) code.
[or you can download FULL DOCUMENT w/HTML CODE HERE]

James configured widgets and generally dove into Icecast and airtime.

Widgets

We have a custom icon now too:

CUNY_cast_Logo_2

We got a little blog action goin’ on by way of a shout out here too.

Next up, we gotta get the website uploaded to the server, make some adjustments to the pages, and get the flow between airtime and our site ironed out. These are our major goals right now.

Fashion Index Weekly Update

Overall, our team is focusing on updating the webpage and learning new technologies including coding and programming. WordPress doesn’t fit to our project.

NYC_Fashion_Index_Prototype_index

  • Currently, our core goal is to generate HTML. Our team is working on the HTML coding and CSS.
  • Our team chose to use bootstrap and Bootstrap Jumbotron.
  •  

    On the learning process of using grid system (need to measure the accurate ratio)

  •  

    Learned how to link social media icons (Right Click the image and save -> coding in HTML language-> copy and paste the file name on <image>, </image>)  eg: <img src=”icons/tumblr.png” alt=”tumblr icon” />

  • Setting custom styles for font, background color.
  • Screen Shot 2015-02-24 at 5.06.52 AM
  •  

    Trying to look our social media including Instagram, twitter more active.

  •  

    Followed several accounts related to fashion industries and sprezzatura.

  • Screenshot_2015-03-14-19-01-01-1
  •  

    Followed people those who are currently attending Parsons, Pratt, and FIT and already graduated from those schools. Fortunately, our follower number increased from 2 to 15 people.

  •  

    Best examples from the project done in Brooklyn Museum. The project invited uses to add tags below images.

    Style

  •  

    The instagram account is process-based. We are planning to share our progress by a picture-taking method.

  • The website will be content-based. Concentrate on documenting image archives.

 

CUNYcast Update week 5

SHOUT IT OUT WITH CUNYcast!

CUNYcast is moving forward and expanding our knowledge of the technical requirements involved in online radio broadcast. This week major strides were taken in outreach and development.

  • Contact was made with support and specialty knowledge in online radio broadcast technology (Mikhail Gershovich)
  • Reclaim hosting server space was finalized
  • Icecast and Artime were uploaded to server space.

THE MINIMUM VIABLE PRODUCT (MVP)

CUNYcast is a live online radio website offering students an opportunity to stream audio using original content from classes, lectures, and projects. CUNYCast’s aim is to empower a DH guerrilla broadcast community.

CUNYcast will reach out to the GC through an academic commons page that will link users, listeners, and curious DHers to our CUNYcast web presence. The CUNYcast web page will have a space for listeners to listen to the live streaming CUNYcast content. It will have a space where users may learn how to access the CUNYcast live stream and upload their own content. CUNYcast is designed to inform and inspire its users, to facilitate this experience CUNYcast’s web page will house a manual that will empower user’s to add their own content to the CUNYcast live streaming radio and inform them on how they could create their very own digital live stream radio channel. A portion of the manual will help users learn how to create their own audio content if the wish to explore a more polished radio stream format.

As an added bonus the CUNYcast website will have links to educational audio content and pedagogy surrounding teaching practices that utilize audio creation as mode of production.

Technical specifications for MVP

The technical map of CUNYcast lays in the Icecast media server and the Airtime client used to manage media on the media server. This back end structure will be given its public face on our website and our cuny commons presence.projectmap03-02-2015

Outreach: Report of activities to date

How To Succeed Even When You Fail

Spring semester 2015. Our Digital Humanities class broke into teams. We were only mildly anxious. Like the television show, “Shark Tank” which features new pitches for products and services each week, we were convinced our ideas were sound and that we could excel. The thing was, within just a few days we started to drown. Instead of devouring the material and spitting it back out for human consumption, we started sinking in a sea of possibilities. No tech geeks on our team. Just dreamers. That didn’t stop us from grabbing at every idea that seemed to float.

But, wait, our group of four people diminished to only three by week two. Man down. He disappeared and dropped the class (we wished him well). The three of us had to take a good hard look at the CUNYcast concept and decide what would assure our chances of survival. (Think of the music to Jaws playing underneath these words).

We took our overblown idea of a RSS-feed calendar linked into the CUNY system, that would record remotely via an app, after two afternoons of staring at code and realizing that by the time the project was due, we’d maybe have gotten through a couple of introductory tutorials. There was no way any of us would be coding experts in 12 weeks.

We trimmed the fat. Bit back with strength and vigor, and began on the current instantiation of CUNYcast: a live online radio website offering students an opportunity to stream audio using original content from classes, lectures, and projects. Our professors urged us to aim outside the box and empower an entire DH guerrilla broadcast community at the Graduate Center. Reporting in on week 4 and things are going swimmingly. We’ve gelled as a team and we’re optimistic.

We are not afraid anymore We are not afraid anymore – even if we should be.

Development:

This week, the goal was to configure an Icecast Media server in a local environment.Airtime and Icecast were configured on our server when we received the server configuration thanks to Reclaim hosting.

Icecast is (again) a media server. When you have an online radio station, the media server is where the audio/video lives for the duration of the stream, sort of an intermediary between the streamer (host machine) and the watcher (listener). Airtime is sort of a GUI that gives a face to the media server. Not only does it make the media server friendlier, it also makes it prettier. Airtime comes with a calendar that allows shows to be planned in advance.

One interesting thing about media servers, is that if someone has the access information to an Icecast server (ds106 allows their’s to be public, as will we, that’s kind of the point) they can use broadcasting programs to take over the station. If another person tries to take over the station when a show is going on, they’ll be met with an error. Airtime simplifies this with the above-mentioned calendar feature, as it allows users to see when shows are planned, and as such, schedule their planned broadcasts around that. Of course, this also allows for anarchy…

Bugs! The Icecast Server worked perfectly. We were able to access it via broadcasting software (Mixxx) and pick up that broadcast via VLC and browser (the address currently being cunycast.net:8000/live, kind of ugly) . However, Airtime specifically had some trouble connecting to our Icecast server, even after multiple troubleshooting attempts. When transmitting via Airtime, a connection could be established to the Icecast server for roughly ~10 seconds before falling flat, despite Airtime claiming the show was still airing. I hate it when machines lie to me. Anyway, after doing some GoogleFu I came across a thread on the SourceFabric forums (SourceFabric developed Airtime) about this exact problem. The fix stated in the thread claimed that  I needed to restart certain Airtime services via commandline using the “sudo” command. Sounds scary. Because Airtime was installed for us, I was a little worried about messing it up, fearing that I would have to reinstall things that I do not understand. However, we were able to fix the bug more easily, by switching the broadcasting format from OGG Vorbis to simpler MP3 format.

development goals include:

  • Figure out how to interact with Airtime via command line (need help from Digital Fellows here)
  • Bring the backend media server to the front ASAP such that we have a simpler/prettier way for users to tune in.
  • Implement an AutoDJ to play over the station and maintain it when no broadcasts are coming in (this is where we may need to talk to a ds106 person).
  • Determine how incoming users will be able to manipulate/interact with Airtime.

Design:

Slow progress is being made constructing the structure and elements to the CUNYcast web presence using Bootstrap. The pre organized Java and CSS allows for immediate product but there is still a bit to understand about the addition of and linking to media.

The CUNYcast Academic commons site is being designed to mirror the CUNYcast website.

CUNYcast

The guide on how to create websites is being updated to make sure that the CUNYcast manual evolves as the project evolves.