Category Archives: Uncategorized

Where do the ‘others’ fit in?

I’m writing this amidst a whirl of thoughts and tracks. Kathleen Fitzpatrick’s views on authorship and the status of private scholarship are impinging on my decision to write to a paper for the final assignment. On one hand, I am glad I read her while battling the dataset project, on the other, she is making me think where my final paper fits in the evolving landscape of scholarship. I am suddenly not satisfied to leave the paper to seclusion. And I am beginning to see the wisdom of having a blog and our instructors’ encouragement of documenting our ‘play’ with datasets. Even as blogs give ‘voice’, it also seems that the essential output remains writing, which, contrary to my decision to write a paper, I’m not entirely comfortable with. I’m still wondering why thoughts presented in writing alone qualify as scholarship; can’t a painting or music do the same? I know there are brave folk who’ve battled this, Nick Sousanis and his dissertation, written and drawn entirely in comic book format, comes to mind. But none of that is considered mainstream. It seems that  exchange and communication can expand when ‘intermedia’ becomes a reality, moving beyond the notion of ‘interdiscipline’? In the light of DH being a challenger of notions, how ‘other’ forms of expression can be included in scholarship is a thought to ponder.

For further reading on unusual dissertation forms, I invite you to browse through the following

http://www.hastac.org/blogs/cathy-davidson/2014/08/28/what-dissertation-new-models-methods-media

http://www.spinweaveandcut.blogspot.com/

Tomorrow: Data Visualization II Workshop!

Hi folks!

Tomorrow’s Digital Praxis Workshop will be Data Visualization II!

I’m kind of freaking out about it.

This one will really take it to the next level, with an under-the-hood look at creating interactive visualizations using d3.

Interactive designer Sarah Groff-Palermo will demonstrate, explain and walk users through exercises in d3, a JavaScript-based interactive programming framework, and its associated technologies and libraries.

Attendees of the workshop are strongly encouraged to bring their own laptop computers and should have, or create, a github account prior to the session. Note: Sarah works on a Mac – but the workshop will also be accessible and beneficial to users of other operating systems (including the room’s library-provided desktop computers).

Looking forward to seeing the Praxis students there (Mina Rees Library, Concourse level, Room C196.02) tomorrow after the class!

Thanks,
Micki

Data Set Troubles

Hi all,

So it seems like the only way one can get old data from Twitter is to pay for it. There is a site called Topsy that seemed promising at first, because they do let you get data from several years back. However, making any reliable conclusions from that data would be hard, because they only give you ten pages of results, and you can only specify a date range, but not a time range. For the periods of time I am interested in there was so much tweeting going on that ten pages of results only covers like two hours (and there is no controlling which two hours they show you). Plus, I think they are showing “top tweets,” rather than a feed of tweets in the order they happened, so that is another factor limiting their usefulness. Not to mention Topsy doesn’t offer locational data. The people at Topsy support sent me a list of other vendors including Gnip and Datasift, which both cost. money I also looked at Keyhole, which looks great, but again, it costs money to get “historical data” from them.

Unless someone has any ideas about how to get historical data off Twitter without having to pay for it, I think I am going to shift my focus temporarily to working on tweeting surrounding the election, which should be easier, because it just happened. I will need to learn how to use the Twitter API to do that, though, so if anyone knows how, I would much appreciate any pointers you could give me. Also, I will need to figure out what my focus would be here–particularly in light of Wendy Davis’s recent loss of the gubernatorial race. Maybe I could compare this year to past years? Perhaps there was more support for Wendy Davis than for other democratic candidates?

I am still attached to my earlier idea about mapping the locational data, though, and thinking about transforming this into a final project proposal. I think there are probably some organizations in Texas that might consider funding something like this, though I am also interested in broadening the project out to have a wider appeal. Here are some ideas I had for expanding it. (I would welcome other ideas or suggestions):

  • Doing something like HyperCities, where video and other media could be layered over the map of tweets
  • Charting the tweets of conservatives as well as liberals.
  • Charting the change in tone of the tweets over multiple important moments such as the hearings and Wendy Davis’s filibuster
  • Doing a data visualization of the other hashtags that were used in relation to the ones I already know about.

Mapping Fandom

Illust. by Rebecca Sugar.

Illust. by Rebecca Sugar.

I was thinking about how many artists and content creators make use of social media sites specializing in sharing such as tumblr, Deviant Art (which has frankly been waining in popularity), Soup (a newcomer to the scene), and Pixiv (a Japanese outlier), to get their material out there. Some people post on these sites as part of their hobby, for some they are hoping to break into a scene, and others make these sites their very livelihood. These creators range in profession from storyboard artists and showrunners for major television networks to professional comic authors to amateur/hobbyist illustrators. For some their content is their job, for others it is their passion, and for a lucky few it is both.

All of these sites have their own unique interfaces and qualities, but they all provide a very basic means for its users to indicate a level of approval for the content that is posted and then disseminate said content among other viewers. All of these sites also attract a wide range of content creators from many different regions. Off the top of my head, on tumblr I follow an artist in Arizona, two in Germany, a handful in Japan, one in Aruba, a bunch in New York and New Jersey, a sizable chunk in California, two or so in Mexico, one in Spain, an Englishman, and I think an Australian or two. Many of these content creators have followers ranging from tens, to hundreds, and some of them have thousands. Their styles are disparate (though their seems to be a general commonality in their age range), though their followers intersect like hundreds of tiny venn diagrams.

The data I wish to extrapolate is where are the people that are “liking” this content? Is there any correlation between the content posted and where the people are that “like” it? Is there any commonality between where a content creator is from and where her or his fans are from? On a site like tumblr content posters can add tags for those searching for a certain subject, but it’s a rather crude system and when someone reblogs a picture the tags are rarely maintained. I still need to figure out the logistics, but I want to try and map regional data with tags and content creators and see how it all connects.

As an aside, I also saw a curious news item on a Spanish retro-gaming blog (http://www.repisanintendo.cl/post/101105828474/japon-podria-comenzar-a-enfrentar-un-irreversible). The basic gist is that with the rise in interest in retro gaming from various podcasts, collectors, YouTube personalities and and the like, it seems that Japan’s vintage game stores, particularly in Tokyo have been seeing their stock become rapidly depleted by tourist collectors. I can’t put my finger exactly on what data to extract from this or how, but I feel like there’s something fascinating to be found.

 

Desperately Seeking Sustainability

Yale’s Digital Collections Center image

Apologies, Desperately Seeking Susan, for the poor pun.

 

 

 

Hi DHP14.

I wish I were having the dataset success of Liam, Sarah, James, and, I imagine, others (in my vivid imagination you are all succeeding marvelously). Liam, I tried to follow your line of investigation and wound up in a Mallet vortex that left me feeling more out of my depth than before. In order to find a tool that felt a little bit more manageable, I poked around on diRT again. Since the scope of possibilities felt overwhelmingly vast, I looked to the dhcommons directory of projects to see if some might bring me some amazing idea. Beside seemingly active projects (Entity Mapper, Boston Marathon archive, Modernist Versions Project, Pulp), there were many forgotten ends (Forget Me Not’s sadly forgotten guest book, Bulgarian dialectology) or unrealized projects (Kanon Foundation archive) or proposals unlinked to their outcomes (Fordham DH pedagogy). While I grant that this database, an initiative of CenterNet, might not be their primary focus, the seemingly short shelf-life of some of these projects seems relevant to the approaching Tom Scheinfeldt visit. In his webinar on October 14th, he discussed generating funding and the human needs of maintaining these projects both in terms of community and of maintenance.

I googled digital sustainability (I know I’m not the only anxious person). About 50,400,000 results. Jisc, historically Joint Information Systems Committee– now just Jisc, has a guide to sustainability, but it hasn’t relieved my mind much.

Now back to the task I (data)set out to accomplish. If anyone has good suggestions for text analysis tools for the tech-challenged (beyond the Manovich-maligned tag cloud), please point me in the right direction.

-Jojo

 

Also, I really enjoyed this image (even though I’m not talking about sustainability in terms of digital decomposition….)

Kyle Bean, The Future of Books

Data Set: Topic Modeling DfR

Hello Praxisers, I’m writing today about a dataset I’ve found. I’ll be really interested to hear any thoughts on how best to proceed, or more general comments.

I queried JSTOR’s dfr.jstor.org Data for Research for citations, keywords, bigrams, trigrams and quadgrams for the full run of PMLA. JSTOR gives this data upon request for all archived content. To do this I had to request an extension of the standard 1000 docs you can request from DfR. I then submitted the query and received an email notification several hours later that the dataset was ready for download at the DfR site. Both the query and the download are managed through the “Dataset Requests” tab at the top right of the website. It was a little over a gig, and I unzipped it and began looking at the files one by one in R.

Here’s where I ran into my first problem. I basically have thousands of small documents, with citation info for one issue per file, or a list of 40 trigrams from a single issue. My next step is to figure out how to prepare these files so that I’m working with a single large dataset instead of thousands of small ones.

I googled “DfR R analysis” and found a scholar, Andrew Goldstone, who has been working on analyzing the history of literary studies with DfR sets. His GitHub  contains a lot of the code and methodology for this analysis, including a description of his use of Mallet topic modeling through an R package. Not only is the methodology available, but so is the resulting artifact, a forthcoming article in New Literary History. My strategy now is simply to try to replicate some of his processes with my own dataset.

 

This week’s readings

I imagine people found these on last year’s syllabus, but I figured since I didn’t see links for this week on our current syllabus, it would be nice to copy them to a place I frequent more often.

Thanks, Evan, for the R Intro Thursday. And thanks, Daria and Stephen for the links!

-Jojo

Different-Sized Data

I really liked Manovich’s overview of ‘big social data’, and was glad to have read it first of his three readings for the week, to use as a guide. In particular, I was fascinated by two of his paradigms—between ‘surface’ and ‘deep’ data and, in context of information visualization, between description and prediction in explaining what information visualization is (and/or is supposed to do).

What does data visualization do? Is it just descriptive (and if so, how does including a description that is visual improve our comprehension of the thing in question, as compared to or in addition to a description that is purely textual?)? Or predictive, a counterpart to inferential statistics? I’ve always put things into pictures to understand them, and I am crazy about Scott McCloud’s high recommendable book on comics [Understanding Comics], which shows (in comic form) how pictures and text can go together to explain or evoke things like space and time better than just pictures or text alone. So I’m a big fan of this new infographics/data visualization trend.

But I wanted to discuss Manovich’s discussion of sampling and behavior data, though class starts in 5 minutes. To be continued..

And So It Goes Online: Slaughterhouse Five & Hypercities

“I am a Tralfamadorian, seeing all time as you might see a stretch of the Rocky Mountains. All time is all time. It does not change. It does not lend itself to warnings or explanations. It simply is. Take it moment by moment, and you will find that we are all, as I’ve said before, bugs in amber.” –Kurt Vonnegut, Slaughterhouse Five.

 These words were written in 1969, while Vietnam was getting doused in napalm, the UNIX system was developed and the CDC 7600, arguably the first supercomputer, was constructed. All of these events have had repercussions echoing through time, while also becoming the proverbial bug in amber. These ideas of time and space being malleable echo some of the concepts of a Hypercity. I can’t help but wonder if Vonnegut’s Slaughterhouse Five is perhaps an indication that a Hypercity is a sort of eventual societal goal, even if before anyone knew what it was.

 The Fukushima and Tranquil_Dragon portion of Hypercities, with its blow-by-blow recollection of fear and destruction, made me think about how the past on the internet is something we can catalog and return to at any time. In Kurt Vonnegut’s Slaughterhouse Five, the protagonist, Billy Pilgrim, is presented by aliens with the concept that even if an event has passed or a person has died, that moment or person is still always preserved in the past and we can return there any time. The Tweets about acts of human kindness paired with fear, dread, death, in Fukushima recall Billy/Vonnegut’s recollections of Dresden, it’s citizens, it’s beauty, and it’s ultimate destruction at the hand of Allied bombers. Today we have the unfortunate business of figuring out what to do with someone’s Facebook or Twitter account when they have passed on.

 The alien Tralfamadorians also explain time in a way that feels very physical. Their perception of time is a mountain range, while the way we perceive time is akin to being strapped to a train with blinders placed on the side of your head. I find that both views sort of collide when you see people critiquing the use of communication technology today. People are going on Twitter and having conversations that would ordinarily never happen or attending webinars or catching up on the news, but the image that is popularly evoked is one of people just looking down at their phones and blind to the world.

 The internet has given us a way to realize this concept of time as a permanent thing we can look back at in a practical way, but not exactly a way that we can emotionally process. Just as triumphs and past accomplishments are posted on Facebook for friends and loved ones to look over, Twitter feuds become preserved ammunition in massive smear campaigns or at the very least it might make future conversations very awkward. But other pundits, scholars, and thinkers have extrapolated on this subject of preserving everything we do online, and it’s not like we can put the proverbial toys back in the toybox anyways. Barring some kind of cataclysm that destroys all online data as we know it, we just have to get used to having so much of our lives preserved outside the confines of our personal recollections.

“Billy is spastic in time, has no control over where he is going next, and the trips aren’t necessarily fun. He is in a constant state of stage fright, he says, because he never knows what part of his life he is going to have to act in next.”

Of course there is always the juggling of persona that comes with having a life taking an online personality or personalities. Facebook for friends and family, Linkedin for employers, Tumblr for your artistic endeavors, Flickr for your portfolio, Twitter for making catty comments. Billy’s problem was trying to fit in place in time when no one around him can possibly understand what it’s like to constantly jump from one place to the next while also having time be constant, instead of something in the past. Nowadays folks are expected to similarly bounce around places, times, and audiences, but for each audience we have to make sure we are fully committed to our roll even though everyone else is going through the same game of digital musical chairs. Identity is not so much fleeting as it is affixed to a certain point and place. If time is a mountain range, the internet are the trading posts scattered throughout it. We are brought to a point whether we have to wonder if the Hypercity is constructed by us, or if we are the product of the Hypercity. Or is the Hypercity the ultimate crux that we are heading towards and time, place, and information become structured yet fluid? And so it goes…