The indexing problem

I don't often remember to mention the fiction at SH. You can assume that I like every week's fiction, and I think some of what we've published so far this year has been even better than usual, but I don't generally like to single out individual stories for particular praise; at least, I don't like to make a habit of it, because I don't want it to sound like stories I don't praise aren't good. But I wanted to talk a little about some tangential stuff that came up regarding our latest story.

This week's fiction is the conclusion of a two-parter, "The Final Solution," by K. Mark Hoover, author of last year's "Slugball," which was one of the few action/adventure stories we've printed. "Final Solution" is not an action story, though it has action-story elements; it's rich and thoughtful and intense and powerful, and I like it a lot. But it may've been a mistake to run it across two weeks. A friend posted to an email list last week something to the effect that she was a little reluctant to read the second half, because the first half was so coy about avoiding saying a certain name that she was expecting the story to lead up to a twist ending in which it turned out the "Target" was not at all who the reader has been led to believe it is.

I had a hard time figuring out how to respond without giving away the real ending. I eventually noted two things: (1) that avoiding using a particular name or term can sometimes be subtlety rather than misdirection (as in my vampire story that carefully went out of its way to avoid using the term "vampire," even though I hoped it would be obvious to readers that the characters were in fact vampires); and (2) that I've been thinking of this story as "the Hitler story." I should perhaps have added that (3) none of us SH fiction editors is terribly fond of the "ha ha we fooled you!" twist-ending approach to fiction; such things tend to annoy us. Which isn't to say that we never publish stories with surprising endings; just that I like endings to derive naturally from character motivations and actions and interactions, rather than being ways to make the reader feel like an idiot.

I won't say more about the story here; you should go read it. But I will make clear, because it's relevant to other stuff below, that the word "Hitler" doesn't appear anywhere in the story, but that neither the editors nor (I think) the author intended there to be any question in readers' minds about who the Target was.

So I was thinking about that this morning, and it occurred to me that this kind of thing is another aspect of The Indexing Problem. How do you create a useful and relevant index to a work? More generally, how do you create a means of extracting the relevant speck of informational gold from an ever-growing morass of muddy data?

This problem has been given a lot of thought in traditional back-of-book indexes. For example, in creating a good index for a book, the indexer should index not only all the important terms that appear in the book, but also all the important concepts that appear in the book, including terms for those concepts that don't actually appear in the book.

Clearly, a human indexing a book that contained "The Final Solution" would want to list it under Hitler (among other entries). But search engines can only go by the text they're fed. If you query Google in a couple of months for a set of terms like "Pommer Inn" and "Schicklgruber," you might find this story; but if you search for "Hitler," you won't.

(Of course, the humans who create the Web page in question can add keywords to it, in a meta tag, to help get around the problem. I've written about that before, in a column on indexing. But I'm not sure Google pays any attention to keywords; they've been abused too much.)

All this ties in with some stuff that Cory Doctorow was talking about a week ago, after Kelly Link's reading in Berkeley. (I seem to be incapable of remembering to talk about social events that I attend in this here journal. Suffice it to say the reading was really cool, I finally bought Kelly's book, and chatting with folks afterward was cool and fun too.) Cory was saying (roughly paraphrased; I may have it wrong) that attempts at top-down AI design are doomed; that true AI, if it happens at all, will spring from bottom-up growth, much as Google has become the search-engine of choice on the Web because it harnesses the opinions of everyone who has a Web page, rather than being controlled by a few editors trying to impose order on a vast and chaotic system the way that (say) Yahoo is. An intriguing notion, but I'm not sure I think it's entirely a good thing; Google provides a certain democratization of tastes, in that the pages that come up first are most likely to be the most popular ones (which doesn't necessarily mean they're the best). I obviously think there's still room for arbiters of taste—which is to say editors—who sort through the chaff and present what they consider the best material. Of course, Google takes that into account too; the popular "editors" (such as people who run popular weblogs, like Cory with boingboing) carry more weight for Google than unpopular editors.

All very interesting and meta-feedback-loop-ish. But I'm not convinced that it solves this aspect of the indexing problem: unless someone like me posts a page that refers to Hitler and Mark's story in the same place, Google will never associate the two.

. . . Does the fact that I've brought up Hitler mean (by vague corollary to Godwin's Law) that this topic is played out?

