Author gender algorithm

I've developed what I think is a fairly accurate (but entirely useless to most of the world) algorithm for a computer to automatically guess the gender of an author who submits to SH, but only if the author has a very strongly gendered first name.

It goes like this:

Look up the author's first name in our database. If at least five authors with that name have submitted to us, and all of them have the same gender ID, and that gender ID is either "male" or "female," then assign that gender ID to this new author.

Which works really well if the first name is, say, "Karen" or "Joe." It doesn't work at all if the first name is, say, "Pat" or "A. J." or "Jed" (that last because fewer than five authors with that name have submitted to us).

This algorithm obviously isn't useful to humans; humans fluent in a given language/culture already do something vaguely similar. But for a computer with no culturally instilled ideas about a name's gender, it's pretty useful.

The only reason it was worth the trouble to implement this is that until now, labeling an author as having a particular gender took an extra step when I was entering info about the author's first story. If the author's first name is strongly gendered (and quite a lot of them are, in English-speaking countries anyway), then I assigned the obvious gender; if not, I usually left it as "unknown." But now there's enough info in the database that the system can do that automatically for me; in a sense, the database has "learned" which names are strongly gendered, by being given a couple thousand examples.

So it's not nearly as interesting or cool as an algorithm that doesn't rely on being given a carefully hand-created known-accurate data set. But it's much more accurate (in its very limited domain) than such algorithms as the Gender Genie or Geoff's Gender Guesser (links are to my entries about those).

One area where my algorithm falls short is that I don't know anything about name gender in most languages. So author names that aren't Western European or Japanese are unlikely to have a gender in the database, even if someone from the author's culture would immediately know the author's gender based on name alone. (Well, and I do know the genders of some very common and well-known names in Arabic and such.) I could rely on baby-name lists on the web, but I'm always a little dubious about those. So if any of you are experts in gendered-naming conventions in languages other than the ones I listed, lemme know.

2 Responses to “Author gender algorithm”

  1. jacob

    This isn’t really relevant to this purpose, but here’s another interesting name gender guesser. It’s based on what the name sounds like, which has the added advantage that it works pretty well on made-up names like “Frodo” or “Galadriel”.

  2. Shmuel

    I could help with Hebrew and Yiddish names, but I doubt you’d have much call for that service.

    (Rule of thumb: most women’s names in Hebrew end with an “uh” sound, usually transliterated as either “a” or “ah”; most men’s names don’t. There are exceptions — Rochel and Chananya spring to mind — but if you have nothing else to go by, this’ll do it more often than not.)


Join the Conversation