Here are some responses to the word cloud and corpus I released earlier…
John B. submits the above image, created from just a list of episode titles, using Tagxedo. (I used Wordle for mine.) He also points out that Tagxedo contains more customization filters and tools for creating word clouds, which is good to know. Thanks, John!
Rubrick correctly points out that “It should, I feel, be a crime to refer to ‘words that show up once and only once’ without using the awesome linguistic term for such words, hapax legomenon.”
Apparently there are ~6100 such words in the corpus, and they have helpfully been extracted by Jonathan B. here. How boring a writer I am to have only used ‘breakdancing’ once in nine years!
Elytsvil accurately notes that the corpus I released is missing a lot of punctuation, which Oh No Robot uses as meta-markers. I did streamline the textfile somewhat (eliminating URLs, strings of punctuation, and the ubiquitous ‘In which’) to try and force the word cloud into something approaching relevance, but the point is duly noted. Here is a completely unredacted ONR data export.
Finally, Shmibs extracted a list of the longest words in the corpus, and some of them are pretty great:
nervousenergynervousenergynervousenergy (from the alt-text on #016)
abcdefghijklmnopqrstuvwxyz — from #598
mmmellllltiiiinnnnngggggg — from #247
procrastihibernation — title of #614
radiogrammephonimat — a transcriber-added detail to #401
biepinzingerunting — from #715
biepbiepbieperzung — also from #715
telegrameutophium — a transcriber-added detail to #336
relationshaaaooww — from #588
yardeyardeyaryar — from the alt-text (which I wrote) on #199 (a guest comic)
superdorkasaurus — from the alt-text on #662
dunderschnauzen — from #515
glondxhatzoljlg — from #680
What an erudite collection of completely invented words and/or sounds!