css.php

Author Archives: Elissa Myers

Final Wendy Davis Work

I have been working on this Wendy Davis data for a long time, but realized I had not caught you guys up to speed with it in a while. Here is what I have been doing for the past month with the data.

First, I found the data, thanks to the amazing Kelly Blanchat, who discovered it in a database at the University of North Texas.

Phase #1: JSON to CSV

Then, after I agonized over what to do with JSON files for a really long time. I found a reader for them called Sublime Text, but had trouble pasting that data accurately into a spreadsheet. Then, when I found out how to do that, I had to figure out how to make the data nice and neat, getting rid of all the punctuation that wasn’t needed in the file I would be using for Gephi. I did some of this by hand, and then found a site called Json-to-CSV that would do some of this for you. At first, this program seemed great, but then I realized that my whole file was too big to fit. The progress bar was forever stuck about an inch away from the end, but would not finish. I began to think about cutting down the data somehow, but couldn’t figure out how I would get a representative sample. Furthermore, because (as I figured out pretty quickly) JSON data do not have all the information fields predictably in the same place every time (like the way they show up in Excel) I could not do the data set in pieces, because the program could very well put the fields in different places every time, if it was working with what it considered to be more than one set of data. What I ended up doing was upgrading to their pro edition, which was like $5. In the end, however, I realized that it was the browser that was the problem. It would stall every time I tried to load the data, even though the program itself could technically handle the extra memory. In the end, I pasted the JSON data into several different spreadsheet files. Then, I cut out the irrelevant data leaving only the tweet, hashtag, and location, and pasted them all into the same spreadsheet to get ready for Gephi. Then I finally got a useful CSV.

Phase #2: Gephi

After that, I began the long haul of learning how to use Gephi. I imported my data files, which were constructed to model the relationships between the central node “I Stand with Wendy” and every other hashtag that was used in the dataset. As there weren’t that many other hashtags that were used anywhere near as frequently, it really didn’t make sense to me to model the relationships among these other hashtags. Though Margaret, the librarian who ran the library’s Analyzing Data workshop demonstrated Gephi for us briefly, I hadn’t really played around with it before. It took me a while to figure out how to do anything, because I am a very tactile learner. I kind of have to do things to understand them, and cannot get much from reading instructions, though watching how-to videos sometimes helps. At any rate, it took me a while. The hardest thing to figure out was that there didn’t seem to be any way to restore the graph to its original look after you changed the format. Also, I kept zooming in or out, and sometimes, even when I would center on the graph and zoom back out or in again, I couldn’t find the graph again.

Phase #3: The Great Fuck-Up

After a while, though, I got  something that looked pretty great to me, though because I saved it as a PDF (I didn’t yet know how to work the screen shot feature), it doesn’t look quite as great to me as it did at the time. The big developments at this stage were the decision to use force atlas, mostly because it looked best, and the decision to color code the nodes. I made the nodes blue and the edges red, creating a visualization that looked morbidly but aptly considering the subject like blood oozing, or like Republican nodes shooting from a Democratic center, simulating the geography of Texas itself.

I had taken off the labels to make the visualization clearer, and then decided to put them back on. That was when I noticed something strange. The only node that was substantially bigger than the rest (but smaller than #IStandwithWendy), was #StanleyCup. I thought it was strange that a bunch of abortion rights activists would also go crazy for the Stanley Cup, but I wasn’t discounting the possibility yet. I went back and looked at my original file, and found that the hashtag had only been used once, and my blood ran cold. Apparently, when making the edges file, I had left out the hashtag “American,” probably seeing how similar it was to “America” and thinking I already had it, therefore misaligning the edge weights, putting #StanleyCup one field further up where #StandwithWendy (the imperative as opposed to the declarative) should have been.

Eventually, I was able to laugh about this, but at the time I was pretty pissed off. There probably was a way to make these files through Gephi rather than putting in the weights and assigning the nodes numbers myself that would have precluded me from making such a mistake. I really am the kind of person who has to make mistakes to learn, though. And I learned a lot that day.

Phase #4: Something Kind of Pretty and Not Wrong

After a few more hours of work, I had something that looked a lot better (and was actually accurate this time to boot). Learning how to make the size of the nodes correlate to how many times each one was used really helped make the graph both more meaningful and visually appealing. Learning how to play around with the label font size also helped a lot, making the words correlate to node size as well, and playing around with the size so I could make as many labels as possible visible while still not having so many that viewers couldn’t read them. And I learned how to use the screen shot function, so I could take a decent picture of it. Finally, though I just had to take the labels off because you still couldn’t see anything. For your information, though, some of the most frequently tweeted hashtags (other than “Istandwithwendy”) were “txlege,” “texlege,” “sb5” (the name of the bill), and “standwithwendy.”  Unfortunately, the screen shot function does not do color, or if it doesn’t I couldn’t figure it out. Here it is:

screenshot_Wendy 2

I wasn’t entirely happy with this visualization, because I am not sure it really shows very much. It shows that “IStandwithWendy” is the most frequently used hashtag, and shows that, other than that, most hashtags weren’t used that often (except for the few I just mentioned). It doesn’t show some of the interesting nuances of the dataset, like the fact that some pro-life people were tweeting “IStandwithWendy” sarcastically, and some people were tweeting “IStandwithWendy” as in the restaurant as a joke (albeit one in pretty poor taste).

 

Phase #5: Tableau Public Pie Chart

Because of this drawback, I decided to make a pie chart on Tableau Public that would at least show the relative frequencies of some of the hashtags a little more clearly. I was having a bit of trouble with their export function, so I just took a screenshot, which doesn’t allow you to interact with the graph like you would have bee able to. Here it is:

 

Pie Chart 3

I really want to continue with this project and do the geographic map of tweets that I wanted to        do originally. Thanks, everyone for reading, and for all of the ways in which you have helped me learn DH skills this semester! Have a great holiday!

Data Set Troubles

Hi all,

So it seems like the only way one can get old data from Twitter is to pay for it. There is a site called Topsy that seemed promising at first, because they do let you get data from several years back. However, making any reliable conclusions from that data would be hard, because they only give you ten pages of results, and you can only specify a date range, but not a time range. For the periods of time I am interested in there was so much tweeting going on that ten pages of results only covers like two hours (and there is no controlling which two hours they show you). Plus, I think they are showing “top tweets,” rather than a feed of tweets in the order they happened, so that is another factor limiting their usefulness. Not to mention Topsy doesn’t offer locational data. The people at Topsy support sent me a list of other vendors including Gnip and Datasift, which both cost. money I also looked at Keyhole, which looks great, but again, it costs money to get “historical data” from them.

Unless someone has any ideas about how to get historical data off Twitter without having to pay for it, I think I am going to shift my focus temporarily to working on tweeting surrounding the election, which should be easier, because it just happened. I will need to learn how to use the Twitter API to do that, though, so if anyone knows how, I would much appreciate any pointers you could give me. Also, I will need to figure out what my focus would be here–particularly in light of Wendy Davis’s recent loss of the gubernatorial race. Maybe I could compare this year to past years? Perhaps there was more support for Wendy Davis than for other democratic candidates?

I am still attached to my earlier idea about mapping the locational data, though, and thinking about transforming this into a final project proposal. I think there are probably some organizations in Texas that might consider funding something like this, though I am also interested in broadening the project out to have a wider appeal. Here are some ideas I had for expanding it. (I would welcome other ideas or suggestions):

  • Doing something like HyperCities, where video and other media could be layered over the map of tweets
  • Charting the tweets of conservatives as well as liberals.
  • Charting the change in tone of the tweets over multiple important moments such as the hearings and Wendy Davis’s filibuster
  • Doing a data visualization of the other hashtags that were used in relation to the ones I already know about.

Idea for a Dataset/Project Proposal

I know this post is coming a bit late but I was so inspired by Hypercities that I had to go ahead and write about it. I have been anxiously awaiting our discussion of mapping all semester, and have not been disappointed. I was particularly inspired by the project gathering the tweets occurring during the Egyptian revolution and with Yoh Kawano’s attempt to use GIS to empower Japanese people to acquire knowledge of the radiation in a given area in the wake of the Fukushima disaster. It was a bit of a revelation to me to see scholars taking on projects that will really help people outside of academia, and that have such personal relevance for them.

Reading this also gave me an idea of what I would like to do in DH—possibly for my dataset project or my large project proposal. I would really like to harvest (?) locational data about the tweets sent last August in Texas during a few of the hearings, and protests, related to House Bill 60, as well as Wendy Davis’s filibuster. These bills which came up for debate in the Texas House and the Senate at this time aimed to severely restrict Texas women’s access to abortion and other reproductive care. These measures (which I believe have now been partially put into effect) disproportionately affect the many Hispanic women living in South Texas, where virtually no clinics were able to remain open, and working class women, who cannot afford to take off several days or more to travel the long distance now necessary to get reproductive care. In a state that takes eight hours to drive across width-wise and probably more to drive across length-wise, there are only seven abortion clinics left—all of which are located in Texas’s few and widely scattered urban areas.

As you can see, geography has played a big part in which women have been affected by these laws. However, geography also played a role in whose voices have been heard in the protests, and whose have been silenced. There is a wide perception that Austin is the only area of Texas with a significant liberal population, and that the rest of Texas is hyper-conservative. While this may be true to a large extent, I believe there are pockets of people all over Texas who strongly opposed those measures, and whose voices were marginalized and trivialized when Rick Perry said the protesters did not represent “real” Texans. Though indeed many protesters present were from Austin, there was substantial opposition to the measures on Twitter as well—a medium which, while it certainly is not accessible to all Texans, provided a relatively efficient and  economical alternative to in-person protesting. I think plotting on a map the frequency of use of certain hashtags popular in these events such as #standwithwendy and #comeandtakeit might give voices to these silenced people, dissipating the feelings of hoplessness and isolation that come with being a liberal person in Texas, and providing a convincing illustration of the extent to which the Republican leadership of Texas in promoting measures like these, ignores the wishes not only of liberals in Austin, but of a much wider portion of its constituency. This project was very important to me because I was actively involved at the time and feel the need to help, though I no longer live in Texas.

It seems like I will definitely have to learn about APIs, and maybe some kind of mapping software. This may be a very large project I am aiming toward, but if anyone knows what other skills I might need to get started, I would be very much obliged for your input!!!

Alt-Ac Blog

Hello all,

This does not have much to do with our reading per se, but I think y’all will hopefully appreciate it anyway. I found this blog on the Chronicle of Higher Ed about alt-ac careers–a topic about which I have been curious for a while: http://chronicle.com/blognetwork/researchcentered/2013/08/08/advice-for-aspiring-alt-acs/. The first post gives advice to aspiring alt-acs. Though I am not sure I know more about what alt-ac careers are after reading this, it at least made me feel better about not knowing, explaining that serendipity and chance often play a big role in alt-ac jobs, and that the parameters of what constitutes alt-ac are not rigid or even particularly well-defined. Hope y’all enjoy!

DH and Disability

I am a Victorianist in the English Ph.D. program, and one of my particular interests is in disability studies. Reading one of this week’s articles entitled “Disability, Universal Design, and the Digital Humanities” made me made me realize that though we may consider our conceptions of disability to be more “evolved,” than those of the Victorians, the digital world we have built often excludes people to the same degree the material world used to exclude disabled Victorians. Furthermore, digital media have come to be used for so many tasks that are fundamental to our functioning in the larger world that digital exclusion may be just as debilitating as exclusion from participation in the physical world was for the Victorians.

Many people today, citing such examples as Dickens’s well-known crippled child, Tiny Tim, may think Victorians representations of disability are sappy and cliched. However, their sympathetic nature also implied a high level of concern for the plight of the disabled. The mere extent to which disability was portrayed with feeling in novels–particularly in the novels of such (then) well-known writers as Charles Dickens, Dinah Craik, and Charlotte Yonge testifies to this fact.

Thus, there were also large-scale efforts at improving things for the disabled in the real world of Victorian England–especially for the blind. One prominent example was Samuel Gridley Howe who developed a method of proto-braille by which the deaf-blind girl, Laura Bridgman, learned to read.

Of course Victorian representations of disabled people were also far from uncomplicated. Gender, race, and age all inflected them to a great degree. As disability theorist Martha Stoddard Holmes has pointed out, crippled, aging men were quite often the villains of Dickens’s novels. The roles disabled people could play (in novels even more than in life) were also severely restricted. However, the fact remains that people like Howe recognized that in the Victorians’ hyper-visual culture, a culture of which illustrated periodicals and novels were the primary forms of expression, being able to decode that medium of expression was particularly important.

In our own culture, depictions of disability are often similarly conflicted–laden alternately with sentiment (think inspirational stories and fundraising campaigns) and mistrust (as when people suspect disabled homeless people of really being able to work after all). We also inhabit a world in which physically disabled people sometimes cannot use the tools digital humanists are making. If they are scholars, this lack of access could set their work back years, and if not, could nevertheless set back their personal drive to learn for its own sake–a drive which the humanities is supposed to foster.  I therefore hope that Williams’s call for the development of tools such as  a freely available online captioning tool and software that converts files into digital talking books will be answered. I also think his idea to use crowd-sourcing in order to caption online videos illuminates an important difference between the Victorian age and our own–that even people with little technical expertise could (perhaps should) have a share in the responsibility of extending access to disabled users of digital humanities technologies.