Stylometry, authorship identification, and forensic linguistics

I recently encountered a 1998 article about Donald Foster, the “forensic linguist” who was known, in the 1990s, for using computer techniques to identify the authors of various texts. Among other things, he made a controversial claim that Shakespeare had written a particular poem; he correctly identified Joe Klein as the author of Primary Colors; and he correctly identified Ted Kaczynski as the author of the Unabomber manifesto.

This article, by Caleb Crain, appears to have been originally published in Lingua Franca in 1998. I also found a PDF copy of the original article; I suspect that both copies were made without the author’s permission, but I don’t know for sure.

At any rate, it turns out that Foster’s later career didn’t go so well.

The Crain article quotes Foster as saying, “All I need to do is get one attribution wrong ever, and it will discredit me not just as an expert witness in civil and criminal suits but also in the academy.”

But the Wikipedia article about Foster says that in 2002, he conceded that Shakespeare didn’t write that poem after all. At that point, Foster wrote: “No one who cannot rejoice in the discovery of his own mistakes deserves to be called a scholar.”

Wikipedia also notes that Foster made conflicting statements about the JonBenét Ramsey case, and in 2003 he misidentified the person who made the 2001 anthrax attacks. He was sued over that last mistake, and Wikipedia says he hasn’t been active in criminal investigations since that lawsuit was settled in 2007.

I find the ideas of stylometry and stylistics intriguing—the study of style, especially attempts to determine authorship based on such study. Here are a few links that I collected in 2014 but never got around to turning into a post:

Join the Conversation