There’s a lot of money in Twitter archives. Also, a Data-Driven Look at #gamergate


gamergate 2

On September 3rd, #gamergate was the top trending tag on Twitter. This is an impressive feat for several reasons:

1) It was Beyonce’s birthday. #happybirthdaybeyonce was only the SECOND most popular tag.

2) It was in no way related to mainstream media.

3) It’s not a fun hashtag.

The tag #gamergate refers to a discussion happening between game developers, journalists, and enthusiasts following a series of events Erik Kahn concisely explains in GamerGate: A Closer Look At The Controversy Sweeping Video Games.

To state it briefly, ex-boyfriend of game developer Zoe Quinn wrote a blog post claiming Zoe had slept with members of the press for positive coverage on her new game, Depression Quest. Following this, Zoe gets doxxed, (that is, her public information is released online maliciously), and she begins getting harassed. If it were typical harassment, that would be awful, but she receives several death threats, so it’s even worse. Several members of the gaming community stood up to defend Zoe, and bad things begin to happen to them as well. Phil Fish, developer of indie darling, Fez, and owner of Polytron has his Twitter account hacked and the Polytron website is taken over by hackers. This leads to Fish deleting his Twitter account and declaring that he no longer wishes to be associated with the games industry, or its fans. Dan Golding explains in his piece “The End of Gamers“:

Campaigns of personal harassment aimed at game developers are nothing new. They are dismayingly common among those who happen to be women, or not white straight men, and doubly so if they also happen to make the sort of game that in any way challenge the status quo, even if that challenge is only made through their very existence. The viciousness and ferocity with which this campaign occurred, however, was shocking, and certainly out of the ordinary. This was something more than routine misogyny (and in games, it often is routine, shockingly). It was an ugly spectacle that should haunt and shame those involved for the rest of their lives.

Several other publications chimed in, including Kotaku, Gamasutra, and Polygon. Later that week, Host of gaming vlog, Feminist Frequency posts a piece titled Women as Background Decoration. The threats she receives following this are so direct she is forced from her home.

Despite the large number of publications cited so far, a large bulk of the discussion unfolded on Twitter under the #gamergate hashtag. I have compiled a public archive of several thousand tweets using the tag #gamergate using TAGS here:

Public Archive (Please be patient, it’s a large doc!)

TAGS Explorer: A Visual Representation of the Twitter Conversation (please wait for it to load!) 

The conversation was happening so quickly, every time TAGS retrieved Tweets from Twitter, it would freeze and become inaccessible for hours as it compiled the archive. That said, these tweets were collected over the course of two weeks, often hundreds at time, between September 1st and September 10th. Weeks later, and #GamerGate is still receiving roughly 100 tweets per minute. I have met the limit on the size of my archive, but if anyone is well-versed in google docs and spreadsheets, I would love some assistance moving tweets into a new sheet so that I can continue compiling tweets. Please contact me ASAP.



•Related hashtag, #notyourshield, appears 1021 times throughout the 4000 tweets archived. #notyourshield is a tag that is supposed to be used by under-represented members of the gaming community, namely women, minorities, and LGBT, who claim that primary video game media outlets discuss the representation poorly, and often in place of the real issues (such as collusion between game developers and press).

•”SJW” appears 375 times throughout 4000 tweets archived. SJW is short for “Social Justice Warrior” a pejorative term for those who defend the rights of under-represented groups online.

•”Journal” (for journalism and journalist) appears 517 times.

• “fem” appears 350 times.

•”Destiny” appears 140 times. Around the time these tweets were compiled, Destiny was on the verge of being released. Many tweets using this tag expressed that the release of Destiny would not slow the discussion circulating around #gamergate. This ended up being true.

•”Quinn” appears 365 times.

•”Phil Fish” appears 45 times.

•”Polygon” appears 475 times. This is the name of a publication that is outspoken against the doxxing of Zoey Quinn, and has many well known and respected female writers on staff. At one point, many were Tweeting that Polygon was banning users on their discussion boards speaking out against Anita Sarkeesian.

•”Kotaku” appears 291 times. Zoe Quinn was implicated for having relations with a writer who worked for this publication.

• 85% of posters identify as male.

•58% of posters are from the US.

• 68.1% of posts are made from a Desktop computer. 14.2% from an iPhone, 10.9% from Android devices.

•64.6% of posts are re-tweets. 12.5% are replies. 23% are original posts

Issues with Archiving:

Using a data tool, I discovered that as of this post over 775,000 tweets have been tagged with #GamerGate. If my 4000 tweets seemed like a large set of data, I am sorry to disappoint. I found a service called HashTracking that would retrieve the full history for me, but it would cost 2.00$ for every 1,000 tweets… so… more money than I have. Another service, Keyhole, offers a real-time view of the hashtag over several social media platforms at once. They also offer historical archives, beginning at 49$. I am currently waiting on them to send me a quote on the cost of my 775k tweet archive. If it’s not over 200$, I will probably suck it up and pay… but I won’t like it.

That said, KEYHOLE IS AMAZING. If you did not click the link to Keyhole above CLICK THIS NOW. Unfortunately, they only offer a 3-day trial; an actual account starts at 130$/mo. You might notice the link I’ve provided dates between August 30th and September 2nd. Because of this, their keyword spread is much different than mine, with the top related tag being #justgamingjournothings.

The biggest problem with archiving this set of tweets is that it’s such a large, unwieldy, and lively set of data. For three weeks it’s been twisting, turning, busy, and relevant. Clearly others have also had difficulty retrieving and dissecting data from Twitter, which is why services like Keyhole and Hashtracking exist, and charge such high rates. Furthermore, as the #gamergate discussion has been going on for roughly three weeks now, looking at a 4000 tweet snapshot of data may not be good enough. In fact, looking at any dataset smaller than the whole thing might not be good enough. My goal is to capture, compile, and dissect the whole event, and because it has taken weeks to unfold, to capture anything smaller than its absolute form would be an injustice… but who has 1500$ to pay for a set of tweets? A big business maybe, not me. Sure, one could work within the limitations of Google Docs, constantly moving the data into new spreadsheets whenever the need arises… but from August 30th and September 2nd, 100,000 tweets were posted. A spreadsheet fills up in 4000, meaning that in 3 days, you would need to edit your script 20 times to take into account new archives, and there’s no doubt that some data would fall between the cracks considering the archival tool would freeze for hours sometimes when updating, therefore archiving a set of  data this large using free, easy to use tools would be more than just a full-time job, it would be impossible. 

What’s Next?

There are certainly more things that I want to try out with TAGS. I could certainly see how it would be useful to track a smaller conversation, such as our class tag, #dhpraxis14 . I DO plan to find a way to access the entire archive of tweets for #gamergate, as I believe that it might be of some significance later on.

While I do not think this is the place to post my full opinions on the subject, I agree with Dan Golding, who states that the term “gamer” and the connotations behind it are dying. Jesper Juul alludes as early as 2010 in Casual Revolution  (which Steven Jones referenced in ch. 1 of his book, which we read this week), whether they be on a mobile devices or on Facebook, that everyone plays games now. We don’t need a word for “gamer” in the same way that we do not need a word for one who reads books or watches films. Creating this kind of terminology has the possibility to create lines of division between those who “do” and those who “do not”. Either way… don’t we need a pristine, complete archive of that? Should I pass around a donation jar?

8 thoughts on “There’s a lot of money in Twitter archives. Also, a Data-Driven Look at #gamergate

  1. Jojo Karlin (she/her/hers)

    James. Wow. I am so glad you posted this. It’s a conversation I might not have seen except maybe several degrees and many months removed. I feel as though the “eversion” Steve Jones is talking about is really visible in this doxxing (which sounds terrifying).
    I look forward to hearing more of your findings!

  2. Chris Vitale

    What you have exposed here is a commodification of social data. The interesting pivot point here is that your look into #gamergate is reliant upon a data set that has been given value by the advertising and data analysis industry. Although you may be looking to build an archive for the purpose of archiving for later research and analysis, the truth of the matter is this data is a resource for those who want to sell this particular community a product or type of content.

    It is striking to think about the amount, complexity, and nature of data sets sitting on servers at Data Management Platforms. One particular company has the ability to curate editorial content to the specific user based off of a complex system of tags and personalized profiles. Tracking pixels follow users around the web taking information like topic preference, time spent, device, location, time of day, etc. This builds out a complex user profile that takes into account that persons habits. Those profiles are then sold to major advertising brands, publications, or ad agencies for a ridiculous profit on the grounds that the advertising they put in front of those users is hyper-targeted and personalized.

    While it is borderline scary how detailed these profiles are, there is great academic value in the mapped behaviors and habits of this sort. The problem is, access to this data requires deep pockets.

    This brings forward a deeper appreciation for things like open source data. While one person may not be able to create this archive in it’s totality, that person initiating the task can see the project through to it’s completion at the hands of willing contributors. Analysis and organization of that data could take the project even further.

    I will say, what you’ve done here is awesome. I enjoyed it thoroughly.

  3. sydnee wagner

    Hi James,

    This is really great work, and I definitely think it is worth archiving. I’ve been casually keeping tabs on the whole gamergate chaos, and find it really telling how many of the perpetrators feel it necessary to flood articles about gamergate with the same generic comments about how it isn’t a feminist/misogyny issue, and how it’s all about “ethics” (vague and never delved into) and proper journalism. The “gamer” community (gamer in quotations because I, too, feel like the title is dying) is definitely a microcosm of our larger society, but things like gamergate only prove that if any community needs reflection on its misogyny (and racism, homophobia, etc etc) I would definitely think that one.

    On a light hearted note, I wonder what kind of statistics you would find for twitter user photos and how many of them are wearing fedoras…

  4. (Martha) Joy Rose

    Great blog. Great feedback. Thanks for identifying the harassment piece of the conversation. #ReallyScary re: “common among those who happen to be women, or not white straight men, and doubly so if they also happen to make the sort of game that in any way challenge the status quo.”

    I also appreciated Chris’s that data collection is driven by the almighty $dollar$, not for academics to archive game world issues. – MJR

  5. StallChaser

    I’ve been following this as someone who has a morbid fascination in internet trainwrecks. This one’s a doozy. I’d be careful about drawing conclusions about anyone based on the frequency of words. There’s also context. Simply finding a word like “harass” doesn’t say anything about whether it’s for or against. Sarcasm will also mess up results. You can’t really draw any conclusions without actually reading the tweets, and until someone codes an AI capable of interpreting all the nuances in language, this will be true.

    I’ve seen a lot more toxicity from the anti-gamergate people than from gamergate itself. You should check out the gamergate hashtag on twitter if you don’t believe me. Heavy use of words like “SJW” come from the lack of any other concise label for the opposition. Meanwhile, those trolling the hashtag use labels like “neckbeard” and “manchild”, despite the existence of “gamergate” or even GG for when the character limit is an issue. Anti-GG extremists have also attacked NotYourShield users, claiming they’re white males, even after they’ve posted photos showing otherwise. That goes very much against the actual principles of social justice, which is why I refuse to acknowledge them as SJWs. It’s inaccurate. People on both sides have been harassed, threatened, and doxxed by extremists on the other side, and that’s never a good thing.

    I’ve pretty much tossed my arms up in the air at this point. There are some malicious sites I no longer visit, but that’s pretty much the extent of what I’m doing. People will go to the sites that align to their tastes, and the market will sort it all out. I think games media will come out of this better than it went in, but that’s just me being a hopeless optimist, I guess. 🙂

  6. Kelly Blanchat


    Great post! I would urge you not to spend too much money, but Godspeed!

    A few things come to mind here:
    1) The credibility of a group that is capable of wide-reaching harassment, and the credibility of another group that gets facts – Beyonce’s birthday! – wrong. Her birthday is actually September 4th, with the album “4” representing both her and Jay Z’s (“December 4th a star was born”) birthdays. BUT I digress….

    2) How such conversations change as a result of typos or the purposeful inclusion / exclusion of participants? I believe it was 4chan users that coined the popular typos “pwned” & “ZOMG”. Such instances are the result of the standard layout of a keyboard (P near O and shift near Z) yet such accepted typos weren’t the norm when communication was typewriters. The speed of the internet allows for such change, but also an underlying sense of mockery. “You just got pwned.” Etc etc. With that in mind, have you considered looking at how #gamergate has evolved? How overlapping hashtags may represent a new conversation? Typos of #gamergate that may have added to the conversation, tho silently? Could a new iteration of #gamergate take over?

    Currently, Margaret Atwood is participating in a century long novel project. It’s pretty interesting, & you can read it here: 
    In the new ps article she talks about the evolution of language over a century, and at the end gives a story about a lost language that is decoded and transcribed by an American anthropologist. I see overlaps in many ways here with your piece: the tracking, decoding, and appropriation of a conversation. 
    Maybe we can also be called digital anthropologists?

    Kelly Blanchat 

    Ps: I recommend not replying to posts on a iPad within WordPress itself. Toggling between the Margaret Atwood article and the post I lost my reply, and had to rewrite. OOPS. Learn from my technological mistake.

Comments are closed.