Tag Archives: API

DigitalHUAC group update

DigitalHUAC project update

Search Form Update

After finalizing the taxonomy with our historian experts, we created a public project on DocumentCloud, where we uploaded the five sample testimonies. For each testimony, we input key value pairs based on our taxonomy.

We are still working on the script that will talk to the DocumentCloud API. In the meanwhile, we started working on a search form with HTML only. After making some very basic search forms, we came across a form builder for Bootstrap which allowed us to add more search options very easily. The form builder also provided the html, which we pasted into our test website.

Below is a screenshot:

1

API Script (Form Action) Update

Working with DocumentCloud, we found a) an app that allows users to work with DocumentCloud-documents through a (Django-powered) CMS (built by The Bay Citizen):

https://www.baycitizen.org/blogs/sandbox/djangodocumentcloud-integration-theres/

https://github.com/BayCitizen/django-doccloud

Screen Shot 2015-03-22 at 10.08.15 PM

And b) a Python wrapper built for the DocumentCloud API:

https://github.com/datadesk/python-documentcloud

We looked at other documentation that explains how to post html form values into Python script (e.g., http://stackoverflow.com/questions/15965646/posting-html-form-values-to-python-script)

But are currently working with the Python API wrapper, which required downloading a more recent version of Python, with Pip installed, and then installing the python-documentcloud library:

Screen Shot 2015-03-22 at 9.40.18 PM

Though the initial attempt(s) return the following:

Screen Shot 2015-03-22 at 9.55.58 PM

We are continuing with the following Python-documentcloud tutorial:

http://python-documentcloud.readthedocs.org/en/latest/index.html#

https://media.readthedocs.org/pdf/python-documentcloud/latest/python-documentcloud.pdf

In order to be able to extract text from the HUAC PDFs uploaded in DocumentCloud and return the excerpted text to the user:

http://python-documentcloud.readthedocs.org/en/latest/documents.html

And are meanwhile also playing with getting input from a browser via:

-Web forms in Django:

https://docs.djangoproject.com/en/1.7/topics/forms/

-And by using GET/POST methods inside a Python class index:

http://learnpythonthehardway.org/book/ex51.html

 

 

Data Set Troubles

Hi all,

So it seems like the only way one can get old data from Twitter is to pay for it. There is a site called Topsy that seemed promising at first, because they do let you get data from several years back. However, making any reliable conclusions from that data would be hard, because they only give you ten pages of results, and you can only specify a date range, but not a time range. For the periods of time I am interested in there was so much tweeting going on that ten pages of results only covers like two hours (and there is no controlling which two hours they show you). Plus, I think they are showing “top tweets,” rather than a feed of tweets in the order they happened, so that is another factor limiting their usefulness. Not to mention Topsy doesn’t offer locational data. The people at Topsy support sent me a list of other vendors including Gnip and Datasift, which both cost. money I also looked at Keyhole, which looks great, but again, it costs money to get “historical data” from them.

Unless someone has any ideas about how to get historical data off Twitter without having to pay for it, I think I am going to shift my focus temporarily to working on tweeting surrounding the election, which should be easier, because it just happened. I will need to learn how to use the Twitter API to do that, though, so if anyone knows how, I would much appreciate any pointers you could give me. Also, I will need to figure out what my focus would be here–particularly in light of Wendy Davis’s recent loss of the gubernatorial race. Maybe I could compare this year to past years? Perhaps there was more support for Wendy Davis than for other democratic candidates?

I am still attached to my earlier idea about mapping the locational data, though, and thinking about transforming this into a final project proposal. I think there are probably some organizations in Texas that might consider funding something like this, though I am also interested in broadening the project out to have a wider appeal. Here are some ideas I had for expanding it. (I would welcome other ideas or suggestions):

  • Doing something like HyperCities, where video and other media could be layered over the map of tweets
  • Charting the tweets of conservatives as well as liberals.
  • Charting the change in tone of the tweets over multiple important moments such as the hearings and Wendy Davis’s filibuster
  • Doing a data visualization of the other hashtags that were used in relation to the ones I already know about.