Author Archives: Sissi Liu

Merry White Christmas~~

Dear Digitalists,

I have to say, this course is absolutely one the most fascinating courses I have ever taken (and I’m finishing my PhD—so I’ve probably taken the greatest number of courses here!). And I feel lucky to have met you all—you were such an inspiring group! Also a big round of applause to our two amazing professors—thank you for masterminding this seminar (a year ago I believe?); your pedagogical conceptions and curriculum designs are truly visionary.

Christmas is only a few days away, and I thought of posting something fun and Christmasy that is also related to my final project “Production of Desire, Consumption of Pleasure, and Creation of National Identity: Broadway Musicals as Cartography of US Sociocultural Values, 1920s-2010s.” In that spirit, why not run a data analysis of White Christmas, the Broadway musical adapted from a movie musical by Irving Berlin? It is by no means my favorite musical; in fact it is a pretty cheesy saccharine piece (with its own adorable moments). But so what? Christmas is all about eating candies and having some damn feel-good fun! So here we go:


What I’d like to see is which words stand out as topics/key words in this musical. Having been told that Mallet is best at handling topic modelling, I spent one afternoon teaching myself how to use Mallet.

I start by installing both Mallet and Java developer’s kit. Then I pull the data (all the lyrics of the 18 songs in White Christmas) into one folder under Mallet, so it’s ready to be imported. I run Mallet using the Command Line and type in commands such as “bin\mallet import-dir –help” to test it. Then I import the data and command the Mallet to create a file called “tutorial.mallet.”


Then Mallet does its job and picks out the key words:



I make another command to open this file, and by typing in this command “bin\mallet train-topics –input tutorial.mallet –num-topics 20 –output-state topic-state.gz –output-topic-keys tutorial_keys.txt –output-doc-topics tutorial_compostion.txt” I ask the Mallet to find 20 topics, and it generates 3 documents:
1. Topic-state.gz
2. Tutorial composition
3. Tutorial keys


The first one is a compressed file that outputs every word in the corpus of my input and the topic it belongs to. And here is what it looks like after extraction:


The second one is the breakdown, by percentage, of each topic within each original text file.


The third file shows what the top key words are for each topic.


I clean up the data, and the result looks like this:



Now since Mallet is known for generating a slightly different result each time, I have to try it at least twice. In my second try, I use “optimize-interval” to lead to better results.


What this does is it indicates the weight of that topic! (Under item 8, “0.04778” is the “weight” of the topic “white,” followed by key words such as “bells” “card” “snow” and “sleigh.”)


This topic-modelling process sounds really simple, but it in fact takes quite some time to familiarize with. This is a try-out example of one musical; for a larger corpus of musicals, Mallet’s power should be more evident.

As for the musical data analysis of my project, I’m thinking of combining Schenkerian analysis with automating chord progression using idiomatic analysis. It is a musicological approach rather than audio signal processing. However, I’m not shutting down the latter option, since it might turn out to be more comprehensible to the general public—our eventual target audience. Also a shout-out, musicians in the group (I know there are several), come talk with me!

Merry Christmas everyone! (Looking at these key words makes my mouth covet sweetness; now where is my Twix?! …. nom nom…)


A Blog v.s. A Book, or Why I was Loath to Blog

As I was cleaning up my digital archives the other day, I saw many snippets lying around in a folder called “DH Blogging.” All of them are proto-blogs; ideas I started, but never ended up posting to a blog. I suddenly realized that ever since my PhD “career” at GC, I have been resistant to the idea of blogging.

Time and time again I started typing down thoughts for a blog; time and time again I stopped and thought, “Nah, these ideas are not in good shape yet…. The research is not in-depth as it should be…. It is not academic enough for publishing online…. Forget it.”

“Not academic enough”—that’s what has been preventing me from doing it. My rigorous academic training turned me into an equally rigorous judge of my scholarly output. Whatever I write has to be original, thoroughly researched, substantively thought through, carefully developed, well-polished until it is close to “publication level.” Anything lesser than that should present itself to the public. (Other than that, my Eastern Asian upbringing that deprecates communicating ideas before they are well-formed is not a great help.)

Yes, evidently, I’ve been TRAINED to resist blogging.

And I’m not alone.

The education I received has prepared me to write academic BOOKS—a twentieth-century way to evaluate scholarly accomplishment, though I started my PhD in 2009. And I was totally incognizant of it until I read Kathleen Fitzpatrick’s amazing book Planned Obsolescence. In the book she argues that blogging, or writing that is open rather than closed, is equally powerful and valuable—if not more so—than writing a book.

A book is a self-contained product; a blog reflects an on-going open conversation.

A book focuses on the moment of completion, whereas blogging emphasizes the process of writing, discussing, revision, and updates.

A book suggests originality and individual intelligence; blogging represents collaborative effort.

The fixation on originality of the text has been attacked by poststructuralist thinkers since as early as the 1960s. Roland Barthes, famous for his “Death of the Author” aphorism, argues that nothing is ever original, and the text is merely “a fabric of quotations, resulting from a thousand sources of culture.” (1967) Julia Kristeva advocates for “intertextuality” and suggests that even the most ostensibly “original” of texts is in fact filled with references to other texts. (1986) In the same vein, Fitzpatrick argues that the academic voices are never fully individual and scholarship has always been collaborative, as authors have always been in an ongoing conversation. She posits that in a highly interconnected world, a higher value should be placed on the sharing of information than on the individual authorship or ownership of particular texts. She suggests that we will need to let go of what we have come to understand as the individual voice, and to “remix,” “mashup,” and “curate” significant ideas that are already in the existent texts, instead of remaining focused on the illusion of originality of texts.

Blogging is therefore one of the most efficient ways to disseminate knowledge, as it produces texts that are no longer static, but fluid, alive, and contributing to a network of texts that enables ideas to flow.

In a recent conversation with an older friend of mine, who never published a single blog his entire life, raised another key issue: “Books last long. Websites get defunct God knows when.” He is not entirely wrong. In the chapter “Preservation” Fitzpatrick addresses the misconception that digital preservation has to do with the ephemeral quality of digital products, and points out that digital artifacts actually tend to last much longer than books. She astutely argues that digital text preservation requires the development of socially organized preservation systems, because the problems we encounter in the digital preservation are caused by our social practices and social understanding of the use of digital artifacts, rather than technical issues.

Moreover, Fitzpatrick points out the future of the book probably lies in what she calls “multimodal texts,” a mixture of images, audio, video, and other forms of data, which will enable the “author” and the “reader” to interact in new ways.

Highly informed and inspired by her book, I do buy into her conceptualization of the future scholarship as an ongoing conversation that is collaborative and open-ended. However, I can’t help but be a little dubious about the limitation of this paradigm. She successfully used MediaCommons, a community-filtered web platform, to invite comments on Planned Obsolescence, which became a part of her ongoing drafting and revising process.  However, at least part of the reason why participants of the conversation had the means to contribute, is probably because the subject matter “publishing, technology and the future of the academy” constitutes a metatopic—a critique on the academia per se. A less “meta,” lesser known and researched topic, such as the oft-neglected theoretical writings of Giovanni Maria Artusi (Italian composer, 1540-1613)—most famous for his polemic against the music of Claudio Monteverdi, would probably not work well in a community-filtered web platform.

Anyway. Here you go. Hello first blog. Not much scholarly contribution, but an effort to at least partly fulfill what Fitzpatrick summarizes as the three features of blogging: “commenting, linking, and versioning.” Comments, links, “versions” are all welcome!