Another gender guesser

Geoff's Gender Guesser attempts to determine whether a given name is more likely a male name or a female name, by analyzing Google results.

(And I'm tickled to see a note on his comments page from the Google team saying that they're increasing the daily query limit for his software: "We wouldn't want anyone who's trying to guess their gender to run out of queries.")

It uses a simple algorithm: it compares the number of results for "Mr. name" to the number of combined results for "Mrs. name" or "Ms. name" or "Miss name".

This approach is pretty good at guessing the gender of common strongly gendered American names, but it does produce some odd results. For example, the system considers Coffee to be a fairly common boy's name—Mr. Coffee, donchaknow.

And of course this algorithm results in a lot of last names being counted. It doesn't know that James is pretty much an exclusively male name, because if you search for "Mrs. James," the results include "Mrs. James Devereux" (a woman married to a man named James Devereux) and "Mrs. James' birth" (where James is her last name). Searching for "Ms. James" is unlikely to end up with a husband's name, but MS can also stand for "Master of Science" or "manuscript" or just someone's initials.

But for such a simple algorithm, it does seem to give fairly reasonable results a fair bit of the time. Anyone have any thoughts on other purely textual gender markers (in English or other languages) that one could use in a very efficient Google-based gender guesser? Note that gendered pronouns aren't necessarily any use, 'cause they generally won't appear next to a name.

I went to the Guesser's most masculine names page to find out which names are strongly male. It turns out that Yashwant is, by this algorithm, the most masculine name on the web, well over 1000 times as likely to be a male name as to be a female name. Also on the list: Splodge, at #4. (For those unfamiliar with the word, it's a mostly British term meaning "splotch.")

I find it interesting that on the "most masculine names" list, none of the top 50 names are common Western male names (while quite a few of the top 50 "most feminine names" are relatively common Western female names). I also find it interesting that the top ten most common names (regardless of gender) are all common Western male names; and that many common Western male names have a relatively low gender factor, meaning that many of them appear prefixed by a female title nearly as often (within an order of magnitude, say) as with a male title.

Comments are closed.