« Coming Out Day | Main | Yay for Red Cab of Oak Park! »

Building a better spambot

| No Comments

I've been remiss in deleting spam from my moderated-comments queue in the past month-plus; as of yesterday, there were over 13,000 comments in the queue. I've been plowing through them—only 6500 or so to go!—and rescuing the few that aren't spam; apologies to those of you whose comments have been languishing.

Anyway, as usual when I'm deleting comment spam, I can't help but think that I could write a way better spam-comment generator than the spambots are using.

I guess it depends on one's goal. If I were writing a spambot, I would want its comments to look as much like real comments as possible, to make comment moderators have to spend a lot of time trying to figure out whether they were real comments or not.

When there are, say, two thousand comments in a five-day period that all say variations on “Lindsay Lohan Goes to Rehab Today” (in a blog that has never once mentioned Lindsay Lohan until now), it takes me less than a second per comment to skim through a hundred at a time and zap them all.

I suppose the people operating the spambots don't care about this kind of thing. They want to make it hard for automated spam-recognition systems to detect the spam, so they have systems that replace words and phrases in a given piece of spam with synonyms, but I guess they don't really care whether the spam comments look like reasonable comments to a human.

But I've heard it suggested that we'll get to strong AI via the struggle between spammers (to get their spam posted) and spam-detection systems. And I don't think the spammers are really doing their part here.

Anyway, I'm not gonna post my ideas about how to make comment spam read more like real comments, on the unlikely chance that a spambot writer might (a) not have thought of these ideas, and (b) decide to implement them. But my ideas aren't anything revolutionary; they're based partly on stuff I've known about since college.

So as soon as some spambot writer does start caring about realistic comments, my workload in fighting spam is going to go way up.

Post a comment