Not an atoz, but somewhat atozzical nonetheless

Your Humble Blogger happened to pick up a book called A Guidebook to Learning: for a lifelong pursuit of wisdom, by Mortimer Adler, in which he complains about alphabetiasis, that is, the overreliance on alphabetical rather than conceptual organization. There are some interesting things, and it’s possible I will decide to finish the book, but my reaction to the first handful of pages was mostly perplexity. Had this man never heard of hypertext? It’s true that lots of things are still alphabetized, but surely you can’t talk about arrangement and categorization without talking about hyperlinks.

It turns out that the book was published in 1986, so in all probability Mr. Adler had not, in fact, heard of hypertext, although the concept was not new, and on reflection it seems odd that somebody would have written a whole book about categorization in the mid-eighties without talking to somebody who was working on hypertext or something like it. Still, it would be a couple of years before I heard of hypertext, and I assume the same is true of most people. Less than twenty years, then, for it to have upended everything so completely.

For instance, Mr. Adler complains that our great universities print their catalogs in alphabetical order, rather than in an order that more genuinely reflects and guides its readers and their search for education. I’m pretty sure that even at the time, university catalogs were not primarily alphabetical, although it’s possible that one of the prominent levels of hierarchy was. That is, first the division into schools: the Medical School’s courses kept separate from those of the Law School, the Business School, the undergraduate College, the Art School and any other such division. Then, possibly, a division within the larger schools, so that the undergraduate courses would be divided into the Sciences, the Humanities, and the Social Sciences. This division would not be alphabetical, of course. With, say, the Humanities, though, the departments might well be listed alphabetically, with Art and Art History before Music before Religion, etc, etc. Within the department, though, I believe most universities arranged courses by number, and those numbers are assigned with an eye to the things Mr. Adler is on about, although there are other administrative things that come into play, as well.

Now, of course, although the Universities do print some catalogues as marketing tools, people look at their courses largely on-line, and can search by various things (instructor, time, department, number of credits, price of texts) that may have something to do with what Mr. Adler was on about and may not. We expect our data to be in tables, we expect those tables to be sortable, and to be able to filter or search them. Order is not fixed, so alphabetiasis is not even relevant.

But something did occur to me, and this is connected with this alphabet business and the recent release of the 6th edition Shorter Oxford English Dictionary. The SOED has about half a million definitions. Which seems like a lot. And it takes up two volumes and weighs 5.8 squintillion pounds, or 13 brazillion kilograms. And it’s a cool thing, and all, but here’s my idea: would it be useful for a OUP or a MW or an American Heritage to put out a set of dictionaries that would be separated by frequency of use? Let’s say that the first volume would have, say, twenty-five thousand words. That’s not a lot; it’s actually a bit smaller than a “compact” or “desk” or “pocket” dictionary. On the other hand, it would have 25,000 words. That would be volume one; volume two would have 100,000 words. That’s about half-way between a “pocket” and a “collegiate”. But it would not have any of the words that were in volume one. Volume 3 would be another 100,000 or so, bringing the three-volume set up to “collegiate” level; again, it would consist of words not in the first two books. Volume 4, then would be a big fellow, 250,000 words, bringing you up to the level of the SOED or an Unabridged. And, if you want, there could be a Volume 5, covering all the really obscure stuff that is in the OED but not the SOED.

The point is that if you are looking something up, you can probably guess what volume it’s in. If you are right, and again, I’m assuming you will be most of the time, it will be quicker to look it up, because there will be substantially fewer words; V3 would be half the size of a collegiate, for instance, and V4 half the size of an unabridged. Even if you guessed wrong, it might still be nearly as quick. And, of course, having fewer words, the typeface of the early volumes could be bigger, which would be nice since really, you’re going to be using V2 most of the time anyway, aren’t you?

Actually, I’m curious whether that’s true. I have no idea. OUP has the Corpus, which means that it would be trivially easy, it seems to me, to sort words by frequency of (written) use; they don’t display that information on-line, so I can’t test it.

Is this a good idea? Probably not. For one thing, I hardly ever look things up in a print dictionary anymore, anyway, and that’s likely true for a lot of other people. Still, there’s a sense in which I’d like to own the SOED, but I would not want to use it to look up how to spell pabulum. I grew up in a house with a small paperback dictionary, a collegiate dictionary and an unabridged dictionary. All the words from the first were in the second; all the words from the first and second were in the third. Maybe that is the easiest way to do it, but I wonder.

Tolerabimus quod tolerare debemus,

2 thoughts on “Not an atoz, but somewhat atozzical nonetheless

  1. Chaos

    There’s something to be said for applying standard caching algorithms to human text lookups. (“Okay, and then, when you do go to Volume 3 for “marzipan”, you cut out the “mar” page and put it on your coffee table in case you need to look up any other words on that page soon.) But many humans like consistency, and are probably inclined to be annoyed by cache misses.

    Reply
  2. Jed

    I use MW11 online. Sadly, not all the words I need are in MW11, so sometimes I have to resort to MW3 (unabridged). Sadly, MW3 hasn’t been updated as recently as MW11, so MW11 is a more authoritative source for current usage.

    So my lookup algorithm goes: (1) Search for the word in MW11. (I have a bookmarklet to make this easy.) In about 3 seconds with a fast connection (or much longer with a slow connection), I get a response page. If I get a page saying the word isn’t in the dictionary, then: (2) Search for the word in MW3. (Using another bookmarklet, because MW11 doesn’t have a button you can click to look up the same unknown word in another dictionary. I keep meaning to file a bug.)

    My point here is that looking up the same word in two different places is annoying, even though it hardly takes any extra time at all. I want to go to one authoritative source and have it give me the info I want.

    And I suspect I would feel the same about a multi-volume dictionary that’s organized in such a way that I can’t be totally certain which volume to try first. (One nice thing about separating into volumes by alphabet: you can be sure which volume to try first.)

    Another complication is that there are a lot of words–many but not all of them homomorphs–that have multiple meanings with different degrees of frequency. For example, you might think the noun “rock” would appear in volume 1, and you’d be right if you wanted to know that a rock is a stone. But if you wanted to know that “rock” (with a different etymology but the same spelling and pronunciation) means “the wool or flax on a distaff,” would you have to look in volume 2? Conceptually and paradigmatically, it seems like that meaning belongs in volume 2; but practically, it would be hugely annoying to have multiple identical-looking words spread out across multiple volumes. And similarly, what about a common noun with an uncommon verb form?

    You could certainly fix this by just declaring that all meanings and homomorphs of a given word appear together in the earliest possible volume, but that does add to the ambiguity/uncertainty of which volume to look in first.

    And, really, I don’t think I would know what volume to look in for a lot of words that wouldn’t be in the first volume. I can guess that “venery” wouldn’t show up in the first volume, but would it be in the second or third? No idea.

    Finally, I think that the lookup-time problem (in a paper dictionary) may not need this much solution. If I use a binary-search approach to looking up a word, then doubling the size of the dictionary means only one extra step of the lookup process; multiplying the size by 20 means only 4-5 extra steps. Most people don’t do exact binary search, but I suspect almost nobody does linear search. I imagine most people turn to (or near) the first letter of the word and start flipping back and forth through pages until they get close. So I suspect that if you ask people to look up a word in a 500,000-entry dictionary, it would take them 2-3 times as long as looking up a word in a 25k-entry dictionary, rather than 20 times as long.

    I guess the flaw in my argument is physical size and unwieldiness, though. (Which volume does “unwieldiness” appear in?) Luckily for me, I’m out of time.

    Off to eat breakfast.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.