{"id":18435,"date":"2022-03-19T14:14:58","date_gmt":"2022-03-19T21:14:58","guid":{"rendered":"https:\/\/www.kith.org\/words\/?p=18435"},"modified":"2022-03-19T14:14:58","modified_gmt":"2022-03-19T21:14:58","slug":"word-lists-for-writing-computer-word-games","status":"publish","type":"post","link":"https:\/\/www.kith.org\/words\/2022\/03\/19\/word-lists-for-writing-computer-word-games\/","title":{"rendered":"Word lists for writing computer word games"},"content":{"rendered":"\r\n<p>This started out to be a post about writing a computer version of a word game, but ended up focusing mostly on computerized word lists.<\/p>\r\n<hr width=\"25%\" \/>\r\n<p>Wordle got me thinking about a vaguely related (but not the same) word game called Fives that I learned as a kid. I wrote about <a href=\"https:\/\/www.kith.org\/words\/1997\/06\/01\/v-five\/\">Fives<\/a> in a 1997 Words & Stuff post, in which I also mentioned that I had always wanted to turn it into a computer game.<\/p>\r\n<p>And now I have! But so far it\u2019s just a text-only game to run on the command line. The core of it turned out to be easy to write; I wrote the whole game in about a hundred lines of Perl code, without trying to be particularly brief\/compact. I wrote it in Perl because I figured it would be quick and easy to write that way; but I neglected to take into account the fact that I haven\u2019t written much Perl code in a long time, so I ended up having to look up the Perl syntax for all sorts of basic things.<\/p>\r\n<p>Anyway, it works now, and I\u2019ll probably put it on GitHub sometime soon. But I don\u2019t really expect people to download and run the command-line version. So the next phase of the project is to translate it into JavaScript and make it display nicely in a web browser. I don\u2019t expect that to be terribly difficult, but I figured I might as well get the logic working (with the command-line Perl version) before dealing with the layout\/UI issues that the web version will require.<\/p>\r\n<hr width=\"25%\" \/>\r\n<p>One issue that I\u2019ll have to deal with before releasing it publicly is finding a freely licensed word list.<\/p>\r\n<p>It turns out that it\u2019s a good idea for a word game like Wordle to include two lists of five-letter words (or whatever other kind of words match the game\u2019s criteria):<\/p>\r\n<ul>\r\n  <li>One list of as many legitimate five-letter words as possible. When a player enters a guess, the program checks their guess against this list to see whether the guess is a valid five-letter word. If they type <i>GGGGG<\/i> as a guess, then the program can determine that that\u2019s not a valid word, at least not for the purposes of the game.<\/li>\r\n  <li>Another, much shorter, list of <em>common<\/em> five-letter words. When the program needs to pick a word to be the answer for the current game, it picks from this list. (Every word on this list, of course, has to be on the other list as well.)<\/li>\r\n<\/ul>\r\n<p>So, for example, if a player wants to enter the word <i>BAVIN<\/i> as a guess, it\u2019s nice to let them do so; that\u2019s a legitimate word, listed in some dictionaries. But it\u2019s a pretty obscure word, so you probably don\u2019t want the program to pick it as the answer; players who don\u2019t know the word would find it frustratingly hard to figure out.<\/p>\r\n<p>But in order to include those two lists of words, you need to acquire them. One way to do that would be to license a word list from a company that provides them; some word game apps do this. Another way would be to use an existing freely usable list.<\/p>\r\n<p>Many UNIX-derived operating systems include a freely usable word list, as a text file. For example, in macOS, the <code>\/usr\/share\/dict<\/code> directory includes a couple of word lists. The <code>web2<\/code> file in that directory contains about 235,000 words from <cite>Webster\u2019s Second International<\/cite>, which was published in 1934 but the copyright has lapsed. So that list of words is in the public domain. And it\u2019s easy enough to extract all five-letter words from that list.<\/p>\r\n<p>(Or all five-letter words with no repeating letters, for use in Fives.)<\/p>\r\n<p>But there are three problems with using that list, for my purposes:<\/p>\r\n<ul>\r\n  <li>It doesn\u2019t include inflected forms of words. So, for example, it doesn\u2019t include <i>BAKED<\/i> or <i>BAKES<\/i>.<\/li>\r\n  <li>It doesn\u2019t include words coined since 1934.<\/li>\r\n  <li>It doesn\u2019t indicate how common each word is. So it\u2019s useful for the full guess-validation list, but I would have to extract common words manually to create the possible-answers list.<\/li>\r\n<\/ul>\r\n<p>For the inflections issue, I came up with a kludgy workaround: I extracted all the <em>four<\/em>-letter words and added <i>S<\/i> to the end of each. (And could similarly add <i>D<\/i> to the ends of four-letter words ending in <i>E<\/i>.) This kludge works surprisingly well, creating lots of legitimate five-letter plurals; but it means that my full list now also includes lots of strings that aren\u2019t words, such as <i>INKYS<\/i>.<\/p>\r\n<p>For the lack-of-recent-words issue, I suspect that there aren\u2019t all that many five-letter words with no repeating letters coined in the past 90 years. But I also suspect that modern players would find it frustrating to guess such a word and be told that it\u2019s not valid.<\/p>\r\n<p>For the commonness issue, manually pulling out the common words isn\u2019t that hard; I started to do it, and it went pretty quickly. But that approach requires making a lot of decisions about what counts as common.<\/p>\r\n<p>So I went looking for other options.<\/p>\r\n<p>So far, the most promising-looking option I\u2019ve found is <a href=\"http:\/\/wordlist.aspell.net\/\">SCOWL<\/a>, though I need to look into it a bit more. It also has the advantage of scoring words by how common they are. <\/p>\r\n<p>Even if I go with SCOWL\u2019s list, or something similar, I\u2019ll still need to manually look through the possible-answers list before I publish the game. For example, I don\u2019t want the game to pick various common insults as answers. But I think that using the SCOWL list as a starting point will make various things easier.<\/p>\r\n\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[41,81],"tags":[],"class_list":["post-18435","post","type-post","status-publish","format-standard","hentry","category-games","category-software"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/18435","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/comments?post=18435"}],"version-history":[{"count":4,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/18435\/revisions"}],"predecessor-version":[{"id":18439,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/18435\/revisions\/18439"}],"wp:attachment":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/media?parent=18435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/categories?post=18435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/tags?post=18435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}