Wanted: Author-name pronunciation database


Dear Lazyweb,

For the past few years, I've been thinking that it would be cool to have an online database of pronunciations of sf author names.

Information about author name pronunciation spreads slowly and inaccurately through the sf community and beyond. For example, I've lost count of the number of people I've heard mispronounce Neil Gaiman's name, and I've heard multiple conflicting statements about how to pronounce Vernor Vinge's and Joan Vinge's last names.

It might seem obvious that Wikipedia would be the right place for such information, given that they already include lots of pronunciations. But it's hard to find verifiable published sources of information about name pronunciations. I suspect that an awful lot of Wikipedia pronunciations are added by people who think they know the pronunciation, without any source at all.

So I wanted to build the definitive information source for sf author name pronunciation. And it seemed clear to me that the best way to do that would be to record each author saying their name aloud, and post those recordings online, along with IPA transcriptions.

But doing the recording myself would take a prohibitive amount of work and time, and would likely mean very limited coverage, given that I only go to a couple of cons a year, almost always in the US.

So the next obvious step was to throw the system open to contributions from others. Authors could even record themselves and email me the pronunciations, or upload them directly to the website.

And that's where it's foundered every time I've thought about this. Because once I'm allowing contributions from other people, how do I verify them? How do I know that the recording is actually of the author in question, and that the author is okay with our using it?

At around that point in the thought process, I give up and go think about something else.

But I'm always a little sad about it, because I do think it would be a useful resource.

So I'm hereby giving up on implementing this myself, but tossing the idea out to the crowd. Has anyone done something like this? Would anyone like to take the project on? Does anyone have ideas about how to verify? Does anyone have ideas about better ways to do the whole thing?

It occurs to me that once such a system was in place, it could be usefully expanded in various ways. For example, sf contains many many character and place names with non-obvious pronunciations. (For example, I still think of "Smaug" as being pronounced like "smog," and have to correct myself whenever the name comes up.) If I had a site containing author's pronunciations of their own names, it might as well include authors' pronunciations of some of their character and place names as well.

And, of course, it could be expanded to non-sf authors; plenty of those have names with non-obvious pronunciations. But that becomes a much much bigger project; I would be inclined to limit it to sf authors at first.

(Where by "sf," as usual, I mean "speculative fiction," including fantasy and science fiction and related material. Sure, horror too, why not.)


It seems like it would be self-verifying after a certain size. Most SF authors both are web-savvy and come into contact with fans on a fairly regular basis; if an erroneous or malicious pronunciation were posted, either the author would ego-scan themselves and find out or a fan would proudly pronounce the author's name at the next convention, get corrected, and post a rant.

I don't know if I agree that most sf authors are web-savvy. A fair number of them are, but there's a lot of low-tech people in sf.

But regardless, the correction mechanism remains hazy to me. Say two different people post pronunciations; one of them claims to be the author. How does the site maintainer know which is right, without doing a lot of time-consuming research and asking friends of friends and so on?

One answer is to just bail on that question. Take the Wikipedia approach, let anyone edit, and don't worry about the fact that some number of entries will just be wrong some amount of the time. If there's disagreement over an item, then it'll change back and forth over time. If the author posts the correct pronunciation and some overzealous fan comes along and mistakenly "corrects" it, then oh well, it'll get corrected back some day, if the author doesn't get too annoyed and give up.

That's a valid approach, and in the end it may be the only scalable approach. But I'd rather have a system that always provides definitive accurate information.

And that anyone-can-edit approach might mean it would be best handled by Wikipedia after all; they've already got the infrastructure in place. On the one hand, that's good, 'cause it means not having to build another system. On the other hand, it doesn't allow for expanding in the direction of author recordings of their names, nor of recordings of names of characters and places. And less-famous authors might not show up there.

On the other other hand, doing it through Wikipedia does mean it's more likely to get done (a lot of pronunciations are already there), as opposed to my vaporware proposal here, which may never get implemented or maintained.

I'd rather have a system that always provides definitive accurate information.

Wouldn't we all? :) But the only way to have that would be with 100% author buy-in.

If you ask people to post audio recordings of authors, the only false information will be from malicious actors (as opposed to the well-meaning but misinformed, who are legion). Unless there's a concerted troll attack, they will be a minority — people who actually take the time to fake a recording of an author's voice — and (above a certain penetration) should be outnumbered by people who've actually spoken to the authors. At some level of certainty, you could mark entries VERIFIED, and leave the others UNVERIFIED, which is better than nothing.

You could ask people to post video, which would be much harder to fake, and search for YouTube videos of authors at readings to verify voice-only posts.

Wikipedia is set up to handle audio files, btw; it would be easy to link to pronunciation WAVs after the IPA (in addition to the IPA auto-pronouncer).

Re definitive/accurate: on a small scale, it's easy: I do all the work, getting personal statements from individual authors, so I know for sure that everything is accurate. But that would mean necessarily limiting the size and scope of the system.

So maybe it's just a tradeoff: definitiveness vs scalability.

If you ask people to post audio recordings of authors, the only false information will be from malicious actors

That's a very good point. I'm not terribly worried about malicious actors, for the reasons you mentioned (though also a good point about troll attacks), so focusing on audio recordings would indeed promote accuracy.

At some level of certainty, you could mark entries VERIFIED, and leave the others UNVERIFIED, which is better than nothing.

Definitely a good point; I hadn't thought about that, and I like it.

Wikipedia is set up to handle audio files, btw

True, but do they have any interest in audio that purports to be spoken by the subject of the article? I think trying to combine the recordings-of-authors aspect of this project with Wikipedia's rules and approaches might result in conflicts; for example, I don't think Wikipedia cares about verifiability of the audio files matching the IPA pronunciations they give. But I'm kind of muddled about this; maybe I should look into Wikipedia's policies about audio.

In the meantime, your suggestions are making me more interested than I previously was in making this a separate non-Wikipedia project. I'm not likely to have time to implement it anytime soon, but it's definitely sounding more practical than I had previously thought it was. So thanks!

Not entirely on-topic, but I wanted to add that just because you have a recording of the person saying his or her own name, doesn't necessarily mean that you have a definitive pronunciation. Our Previous Vice-President had grown up saying his last name to rhyme with weenie, which is how is father said it. Then, when he went into politics, he adopted the pronunciation that rhymes with zany, because it's the more common pronunciation and the one people expect, and a politician is probably better off not starting off a ton of relationships by correcting people. And also, you know, weenie. When he became Veep, he evidently made a half-hearted attempt to switch back, but it didn't take.

But, yes, I think it would be great to have that database, and the associated how-do-you-pronounce-h'k'Tharg one. Although certainly with the latter, I reserve the right to pronounce them how I think of them, so there.


Glad my comments were helpful. For unimpeachability, I think video recordings would be pretty easy to crowd-source — everybody's cell phone takes movies now (even mine!). I guess that'd limit things to authors who go to cons and do readings, mostly, but that seems to be de rigeur these days.

