Word Cloud, etc. Part 2

Here are some responses to the word cloud and corpus I released earlier…

John B. submits the above image, created from just a list of episode titles, using Tagxedo. (I used Wordle for mine.) He also points out that Tagxedo contains more customization filters and tools for creating word clouds, which is good to know. Thanks, John!

Rubrick correctly points out that “It should, I feel, be a crime to refer to ‘words that show up once and only once’ without using the awesome linguistic term for such words, hapax legomenon.

Apparently there are ~6100 such words in the corpus, and they have helpfully been extracted by Jonathan B. here. How boring a writer I am to have only used ‘breakdancing’ once in nine years!

Elytsvil accurately notes that the corpus I released is missing a lot of punctuation, which Oh No Robot uses as meta-markers. I did streamline the textfile somewhat (eliminating URLs, strings of punctuation, and the ubiquitous ‘In which’) to try and force the word cloud into something approaching relevance, but the point is duly noted. Here is a completely unredacted ONR data export.

Finally, Shmibs extracted a list of the longest words in the corpus, and some of them are pretty great:

nervousenergynervousenergynervousenergy (from the alt-text on #016)

abcdefghijklmnopqrstuvwxyz — from #598

mmmellllltiiiinnnnngggggg — from #247

procrastihibernation — title of #614

radiogrammephonimat — a transcriber-added detail to #401

biepinzingerunting — from #715

biepbiepbieperzung — also from #715

telegrameutophium — a transcriber-added detail to #336

relationshaaaooww — from #588

yardeyardeyaryar — from the alt-text (which I wrote) on #199 (a guest comic)

superdorkasaurus — from the alt-text on #662

dunderschnauzen — from #515

glondxhatzoljlg — from #680

What an erudite collection of completely invented words and/or sounds!

800 Episodes Word Cloud

On the occasion of Wondermark’s eight hundredth episode, I thought I would celebrate by looking at a complete corpus of words used in Wondermark, and creating a cloud from them (similar to my existing tag cloud of subject matter):

“Huh,” I thought to myself, “I suppose it is unsurprising that the most common words used in a large sample of comics probably closely resembles a list of common words found in the language in general.”

So no great discoveries here, unfortunately. It’s further complicated by the fact that the text I’m using as a corpus is an export of my Oh No Robot database, which contains user-submitted transcriptions of all my comics, which themselves often contain transcriber-invented character names and extensive scene descriptions — both of which are great, but which somewhat muddy the dataset. The heavy incidence of the words “man” and “woman” in the cloud, for example, are probably due to transcriptions reading something like:

Man: I have started a bean farm.
Woman: We’ll be millionaires!
Man: Not if flies eat the crops first.
Woman: Time to invest heavily in pesticides.

In that sample transcription, the words “man” and “woman” both appear twice as frequently as any other word, despite not occurring in the dialogue at all.

It’d be neat to see, instead of a brute word-frequency cloud, something like a collection of statistically improbable phrases, or words that show up in Wondermark once and only once…things like that. I wonder what interesting things could be mined from the data? If you’d like to play around with the corpus yourself, dirty as the set is, here’s the text file I used. If you derive anything neat, let us know!

Looking Back at Y’haug’f’than


(Flickr photo by Tau Zero)

Over the weekend, the Wondermark calendar marked the eldritch holiday Y’haug’f’than, the day long foretold when forces beyond the imagination or comprehension of mankind crest a horizon of madness and slowly wind down the minutes remaining for all of human existence.

I’ve long felt that Y’haug’f’than is becoming a bit commercial of a holiday, so in an effort to respect the roots of the tradition by subjecting myself to deliberate pain, I underwent hernia surgery:

What about you? Leave a comment and tell us how you commemorated the weekend the air turned to ashes and our still-screaming flesh was melted from our brittle bones!

(Next holiday we will observe: February 30: Imaginary Day)

Wondermark will be offline on January 18th.

UPDATE: Thank you for all the supportive messages during the blackout. The NY Times reports that several dozen lawmakers have publicly shifted their opinions on the bills today, and when I called and talked to staffers for my two senators and my representative, nobody sounded too thrilled about the bills in their present form. So, we won? For now, anyhow, until the spotlight moves onto something else and the lobbyists go right back to work. The press that the blackout has generated is great, but we mustn’t forget that this kind of nonsense gets pushed through Congress all the time.


Wondermark will shut down on January 18th as part of the national Internet strike protesting two bills under consideration in the U.S. Congress, H.R.3261 (the “Stop Online Piracy Act”, or SOPA) and S.968 (“Protect IP Act”, or PIPA).

You may have heard about the general strike (among the sites taking part are Wikipedia, Reddit, and Boing Boing), or about SOPA and PIPA. If you haven’t, here’s the general idea:

Massive entertainment companies, mainly movie studios, have presented these bills to Congress as a means to curb online piracy of their content. But the bills are written in a very broad, dangerous way.

To combat the problem of movie piracy, SOPA and PIPA give the government the power to firewall the entire U.S. internet.

This is like planting dynamite under a busy highway bridge in order to catch fleeing burglars, then handing the trigger to someone who hates cars.

What’s likely to happen?

• What burglars there are, will take another route. (SOPA/PIPA do not target pirates, but rather sites that link to alleged piracy. Real pirates can easily sidestep the restrictions.)

• Law-abiding business trucks, scared of the dynamite, will ALSO take another route. (The huge legal and financial burden of compliance with the new law will discourage startups, stifling independent businesses based in the United States.)

• The dynamite is likely to go off whenever the trigger person sees anybody who looks slightly suspicious — burglar or not. (Claims of “piracy” could be used as a weapon against websites to silence them for competitive or political reasons.)

Despite the fact that nobody in Congress can agree on health care, the budget, or anything else, bought-and-paid-for politicians from both sides of the aisle have lined up to defend these bills. It’s pretty disgusting. Movie piracy is simply not more important than the safety and integrity of the entire Internet, which is my whole livelihood.

Don’t take it from me, though. You can read more about the details of SOPA and PIPA here:

• An Overview of SOPA/PIPA [Infographic]
A technical examination of SOPA and PROTECT IP [Reddit]
• Why Canadians Should Participate in the SOPA/PIPA Protest

As of this writing, a Senate vote on PIPA is planned for January 24, and the House of Representatives intends to continue markup on SOPA in February. So the time to act is right now.

What’s the point of the strike?

It’s to say, “Imagine what the Internet could look like if this level of censorship were legal.” Sites you visit every day could be blocked at the DNS level, making them essentially unreachable. If this is your first introduction to SOPA and PIPA, the strike is to let you know that it’s a real problem, and to solicit your help on behalf of internet users and content creators everywhere.

I rate myself highly cynical when it comes to the government. I’ve had firsthand experience with being completely blown off when I’ve actually taken the time to write to my members of Congress. Once, I wrote letters to both my senators, opposing an issue that both of them were in favor of. I received a form letter back reading “I, too, am in favor of this issue! Thank you for your support!”

So I know that it’s farfetched to claim that our members of Congress will even listen to us. But here’s the thing: these bills need to become toxic. They need to become political nitroglycerin. And that has everything to do with the lawmakers’ perceptions of public opinion.

If our members of Congress learn that supporting bills like this gets them a ton of angry phone calls, maybe they’ll think twice. We hope by calling them en masse, we can at least get some of them to realize there’s more to the issue than the bullet points they’ve been spoon-fed by lobbyists.

If they know that opposing bills like this gets them a ton of supportive phone calls, that’s food for thought as well. So if your representatives are opposed to SOPA/PIPA, call them too and let them know you’ve noticed, and that you appreciate it.

If your representative is undecided, that’s fine, because that means they haven’t cast their lot yet, or don’t realize how big an issue it is. This is your chance to let them know that the proper course of action is opposition.

Since Reddit will be down for the strike as well, I’m sure they won’t mind if I copy some info from a thread there about how to call your Congressperson:

From my experience: A staffer answers the phone. Say, “Hey, my name is [full name] and I live in [City], [State] [Zip] and I just wanted to express my opposition to a bill that [Congressperson] will be voting on soon: the Stop Online Piracy Act [if Representative] / the Protect IP Act [if Senator].

“There are extreme flaws and loopholes in the bill that could seriously harm the freedom of individuals, impact small businesses, and silence political speech on the Internet, and I wanted to ensure that [Congressperson] is aware of how dangerous this bill could be for his/her constituents. His/her willingness to go along with it is extremely surprising, considering his/her strong pro-liberty, pro-small-business beliefs,* and I’d like to ask him/her to reconsider his/her position.”

In my instance the staffer was really polite and said she would forward the message on to my senator, and recommended I request to schedule a meeting via my senator’s website. In the meeting request, I just basically repeated what I said above. Don’t worry what may happen if your meeting request gets accepted, I can’t imagine it would.

* I embellished this sample script a bit. But you catch more flies with honey, etc.

If you’re on the Internet as much as I am, you’re probably tired of hearing everybody talk about this, and secretly believe that the threat is hugely overblown. To be honest, I hope it is.

But I am also intensely curious to know if our democracy actually works. Can a legion of individuals contacting their members of Congress actually change minds? Or are we all just ignorant chickens bobbing about in our coop while the farmers do as they please? I don’t know that I’ve ever seen a more concerted, focused effort to test the theory that we in the U.S. actually live in a representative democracy.

I urge those of you in the U.S. to please call your representative and your senators today, and those of you outside the U.S. to help spread the word in other ways — because where the U.S. goes, the rest of the world may follow soon enough.

• Where Do Your Members of Congress Stand on SOPA and PIPA?
• Find contact info for your members of Congress

Wondermark returns Thursday with a new comic for your trouble. It will contain…jokes

Happy St. Whinge’s Day!

If you consult your 2012 Wondermark Calendar, you know that today is St. Whinge’s Day! Have you been watching for children with caps in their hands? I spent a good fifteen minutes outside this morning, ranting and raving about the ills of the world and jangling quarters in my pocket, but all I got was a dirty look from an old woman.

Still, I think this is a grand opportunity to whinge, whine and moan about bad luck, the unfairness of the universe, or petty injustices that otherwise would have to go repressed. For the next few hours, this is a safe space to be whiny. Here, I’ll start:

  • I got some strawberries from the farmer’s market just on Sunday and they are already all brown and soft! I suspect that the ones underneath the top layer were going bad before I even bought them. OH ST. WHINGE PRESERVE US!
  • I have been locked in a nonsense nitpicky battle with the IRS for seven months over the fact that our publishing company (Machine of Death) is partly owned by a Canadian! We are trying to run a small business and create jobs but we have been hitting absurd roadblocks every step of the way. OH ST. WHINGE PRESERVE US!
  • I visited an art-supply store and parked in a space with ten minutes left on the meter. I forgot to check the time when I went in, then got engrossed inside the store deciding whether to buy the bigger tube of paint ($8) or the smaller one ($5). I did complicated mental math trying to decide whether I’d really use all of the bigger tube, or whether the smaller, cheaper one would do. In the end my agonizing deliberation about whether I could safely save $3 earned me a $65 parking ticket. OH ST. WHINGE PRESERVE US!

Now you! Leave a comment with your own complaint — and let me be the first to say, “Well, that’s just incredibly bad luck, that is!”

(Next holiday we will observe: January 28: Y’haug’f’than)