Recently in the Computational linguistics Category

Obscene intensifiers (probably NSFW)

| 6 Comments

Today's xkcd comic strip shows a graph of frequency of usage, for a variety of adjectives, of the intensifiers “fucking” and “as shit.”

It's a cute graph, and some of the adjectives are kind of entertaining. I like the phrase “fucking apropos,” for example.

However, a lot of the instances of the phrases in question don't actually consist of intensifiers modifying adjectives at all—especially the instances of “fucking,” because there are a great many random-word spam pages containing that word in which it isn't used as an intensifier.

[A day after publishing this entry, I rephrased the above sentence for clarity, and added a couple of mentions of adverbs below.]

For example, try doing a Google search for ["fucking stochastic" -xkcd]. (The “-xkcd” part is to skip all the instances that were created today in response to the comic.) That search currently tells me there are about 28 results, which is presumably the number Munroe used for the comic. But in fact, if you click through to page 2, you see only 16 results.

Of those sixteen, eight (including all six on the second page of results) are porn spam pages that happen to have the words “fucking” and “stochastic” next to each other. (Including the amusing phrase “fucking stochastic frontier models with spatial component.”)

Two more results are for the phrase “XANAX is that fucking stochastic” on a Xanax spam page.

And five are for phrases in which “fucking” doesn't modify “stochastic” (that is, where it's an adjective rather than an adverb): “Fucking stochastic shite”, “FUCKING STOCHASTIC PROCESSES” (2 identical instances), “Fucking stochastic life-support system” (2 identical instances).

So it turns out that before this comic, there was actually only one instance on the entire web of “fucking stochastic” in which “fucking” modified “stochastic”:

The times, they are achanging, but whither and how, that is beginning to look fucking stochastic.

I imagine similar things are true of other “fucking” items on the list, but I'll stop now.

I realize that I'm partly nitpicking the comic's phrasing; does it really matter whether “fucking” is modifying an adjective or just appears in the same phrase with the adjective? The numbers are interesting either way.

But I'm also disagreeing with the comic's methodology, because the word “fucking” appears on so many web pages that I think the noise will drown out the signal for a lot of the less common adjectives, resulting in the numbers not really giving useful information about how real people use language.

Then again, the whole idea of using Google results numbers to calculate linguistic answers is a little dubious. (For example, notice how the number went from 24 to 16 when I went from page 1 to page 2; and notice that nearly half of the results were effectively duplicates.)

(I posted a slightly different version of this entry as a comment in the xkcd forum, then realized it would make a good entry here.)

Added later: a comment about adverbs from a friend made me realize that this would be a simpler way of saying much of what I said above:

Munroe is assuming that, in all instances of “fucking stochastic” on the web, “fucking” is an adverb. But in fact, there's only one instance where it's actually an adverb; in all the other instances, it's either an adjective or part of a random collection of words.

About this Archive

This page is an archive of recent entries in the Computational linguistics category.

Ciphers/Secret Writing is the previous category.

Dictionaries is the next category.

Find recent content on the main index or look in the archives to find all content.

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.04