{"id":18625,"date":"2023-09-01T08:22:18","date_gmt":"2023-09-01T15:22:18","guid":{"rendered":"https:\/\/www.kith.org\/words\/?p=18625"},"modified":"2023-09-01T08:22:18","modified_gmt":"2023-09-01T15:22:18","slug":"stylometry-authorship-identification-and-forensic-linguistics","status":"publish","type":"post","link":"https:\/\/www.kith.org\/words\/2023\/09\/01\/stylometry-authorship-identification-and-forensic-linguistics\/","title":{"rendered":"Stylometry, authorship identification, and forensic linguistics"},"content":{"rendered":"\r\n<p>I recently encountered a 1998 <a href=\"http:\/\/linguafranca.mirror.theinfo.org\/9807\/crain.html\">article about Donald Foster<\/a>, the \u201cforensic linguist\u201d who was known, in the 1990s, for using computer techniques to identify the authors of various texts. Among other things, he made a controversial claim that Shakespeare had written a particular poem; he correctly identified Joe Klein as the author of <cite>Primary Colors<\/cite>; and he correctly identified Ted Kaczynski as the author of the Unabomber manifesto.<\/p>\r\n<p>This article, by Caleb Crain, appears to have been originally published in <cite>Lingua Franca<\/cite> in 1998. I also found a <a href=\"https:\/\/steamthing.com\/files\/1998-7.LF.Crain.Bards-fingerprints.pdf\">PDF copy<\/a> of the original article; I suspect that both copies were made without the author\u2019s permission, but I don\u2019t know for sure.<\/p>\r\n<p>At any rate, it turns out that Foster\u2019s later career didn\u2019t go so well.<\/p>\r\n<p>The Crain article quotes Foster as saying, \u201cAll I need to do is get one attribution wrong ever, and it will discredit me not just as an expert witness in civil and criminal suits but also in the academy.\u201d<\/p>\r\n<p>But the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Donald_Wayne_Foster\">Wikipedia article about Foster<\/a> says that in 2002, he conceded that Shakespeare didn\u2019t write that poem after all. At that point, Foster wrote: \u201cNo one who cannot rejoice in the discovery of his own mistakes deserves to be called a scholar.\u201d<\/p>\r\n<p>Wikipedia also notes that Foster made conflicting statements about the JonBen\u00e9t  Ramsey case, and in 2003 he misidentified the person who made the 2001 anthrax attacks. He was sued over that last mistake, and Wikipedia says he hasn\u2019t been active in criminal investigations since that lawsuit was settled in 2007.<\/p>\r\n<p>I find the ideas of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Stylometry\">stylometry<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Stylistics\">stylistics<\/a> intriguing\u2014the study of style, especially attempts to determine authorship based on such study. Here are a few links that I collected in 2014 but never got around to turning into a post:<\/p>\r\n<ul>\r\n  <li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Adversarial_stylometry\">Adversarial stylometry<\/a>, in which an author attempts to obfuscate their authorship. See especially the 2012 article \u201c\u201d<a href=\"https:\/\/www1.icsi.berkeley.edu\/~sadia\/papers\/adversarial_stylometry.pdf\">Adversarial Stylometry: Circumventing Authorship Recognition to Preserve Privacy and Anonymity<\/a>,\u201d by Michael Brennan, Sadia Afroz, and Rachel Greenstadt.<\/li>\r\n  <li><a href=http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=5315\">Rowling and \"Galbraith\": an authorial analysis<\/a>, by Ben Zimmer (2013).<\/li>\r\n  <li><a href=\"https:\/\/programminghistorian.org\/en\/lessons\/introduction-to-stylometry-with-python\">Introduction to stylometry with Python<\/a>; even if you don\u2019t look at the code, this gives some info about a couple of specific approaches to attempting to identify authorship.<\/li>\r\n<\/ul>\r\n\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[134,86],"tags":[],"class_list":["post-18625","post","type-post","status-publish","format-standard","hentry","category-style-and-stylometry","category-statistics"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/18625","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/comments?post=18625"}],"version-history":[{"count":3,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/18625\/revisions"}],"predecessor-version":[{"id":18630,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/18625\/revisions\/18630"}],"wp:attachment":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/media?parent=18625"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/categories?post=18625"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/tags?post=18625"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}