Data is Beautiful and Finding Datasets.

Hello all,

It seems like a lot of interesting data projects are well under way and many of them look great! For those who are undecided on what to research, or what could even be considered research, I wanted to point out an extremely popular sub-reddit called /r/DataIsBeautiful.

Here is an example of an image that might be found:

 

Even more interesting than the visuals might be the discussion in the comments as people attempt to dissect the data. One commenter on this image hypothesizes:
“The PhD data is more of a representation of academia life than anything. That’s why the salary is so low for 25-29 (people are just graduating at this point), and why the salary shoots up after 35 when the assistant professors get tenure.” –Link to comment.

For those unfamiliar with Reddit, the numbers on the side represent the popularity of the post, with the highest numbers indicating a more popular topic/comment. If you are looking for the most popular posts ever, you can go here  to see the most popular pieces ever submitted (be sure to change the time period to “all time”).

The reason I thought I’d post this is because you can really have a set of data about anything, even very specific things. One post mentioned the size of chickens, and how big they’ve been recently. Here’s another that’s sort of interesting:

Tweets retrieved during World Cup, USA vs Germany

For those still lost on where to acquire datasets, I went to /r/datasets and found a couple of interesting links including a custom Google Search for Datasets as well as Quandl. Some datasets found on Quandl can be integrated directly into popular programs such as Matlab, Python, R, Stata, and so on. Also, there is a Microsoft Excel plugin if that’s your jam. For those more on the technical side, Quandl also offers an API.

Anyway, hopefully some of this will be inspiring and/or useful to those who are still unsure about where to go. Let me know!

5 thoughts on “Data is Beautiful and Finding Datasets.

  1. JULIANA SON

    Thanks for the tip! Seeing the immense world of raw data sets made me realize the importance of data visualization even more. I came across of some data sets that are available but not necessarily accessible for everyone. For instance the NYPD Stop and Frisk data (http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.shtml) which requires SPSS to view. I don’t have SPSS so I was unable to view it. I kept digging and saw that the NYCLU made some of the data available in excel. It wasn’t pretty. You had to look up what the values meant (for instance, race was given a numeric value of 1-4 http://www.nyclu.org/files/SQF_Codebook.pdf), there were more than 30 columns, etc. But at least the data was viewable for the majority of the computer literate population. (I’m assuming that SPSS isn’t as widely available as excel.) However it is the visualization of data that makes the message clearer and more accessible. This is something that I’ve realized a while ago (it’s one of the reasons why I joined the dh program), but going through this data set exercise and experiencing the limitation of access was a nice reminder of the importance of this type of work, especially in the social justice context.

  2. Chris Vitale

    /r/dataisbeautiful is one of the most fascinating subreddits.

    I would also note that the related links in the subreddit are very useful. Some highlights include:

    MapPorn – Quite self-explanatory and safe for work – http://www.reddit.com/r/mapporn

    DataVizRequests – A forum-esque environment where people help each other with tools/data viz questions – http://www.reddit.com/r/DataVizRequests

    Infographics – More humanities centered projects than DataIsBeautiful – http://www.reddit.com/r/Infographics

  3. Elissa Myers

    Seems like a very interesting subreddit! I like that there are graphs about chickens. Thanks also to Chris for the DataVizRequests suggestion. Could really come in handy as I am now working with Gephi!

  4. Mary Catherine Kinniburgh

    Don’t get me started on chicken size and factory farming!

    But seriously, this is a really wonderful resource.

    I’ve avoided Reddit like the plague since forever, mostly due to its reputation as being a hostile environment towards women and a general hotbed of ick. However, that makes me appreciate the conversations around Reddit in our class so much more, because it’s allowing me to develop a fuller picture of the site that I might be too intimidated to do otherwise.

    All the more reason why building smaller communities, like our DH Praxis 14 here, can help navigate not only the vast information out there, but its general accessibility to different types of users who perceive certain spaces as off-limits–rightly or not.

    Thanks again!

Comments are closed.