{"id":1317,"date":"2003-07-23T08:43:47","date_gmt":"2003-07-23T15:43:47","guid":{"rendered":"http:\/\/www.kith.org\/journals\/jed\/2003\/07\/23\/1317.html"},"modified":"2003-07-23T08:43:47","modified_gmt":"2003-07-23T15:43:47","slug":"spam-filter-effectiveness","status":"publish","type":"post","link":"https:\/\/www.kith.org\/jed\/2003\/07\/23\/spam-filter-effectiveness\/","title":{"rendered":"Spam filter effectiveness"},"content":{"rendered":"\n<p>Saved up my incoming spam for a couple days, a couple days ago, and then looked it over to see how good a job Pair's spam filter was doing.<\/p>\n<p>Over the course of 48 hours, I received 375 pieces of spam total.  (Which is pretty remarkable given that less than a year ago I was still getting only about 5 pieces of spam a day.)<\/p>\n<p>Pair's filter, based on SpamAssassin, correctly marked 347 of those as spam.<\/p>\n<p>My own filters caught an additional 15 pieces of spam&#8212;mostly by the \"everything addressed to Alex is spam\" rule, which I implemented a few months back after months of verifying that it was true by hand.<\/p>\n<p>Pair's filter incorrectly marked 1 non-spam message as spam, but it was a special case: a comment from my journal which quoted my previous posting about the spam filter, using keywords that the spam filter identified as spam.  Which means that non-spam mail about spam is likely to be falsely identified as spam, but I don't receive much of that.<\/p>\n<p>In addition, during that period I received 13 pieces of spam that neither Pair's filters nor mine identified as spam.  Fortunately, Pair's filters mark not-quite-spam with abbreviated information about the criteria that the message did match; if, as Irilyth suggested the other day, I can customize the values that Pair assigns to each of those criteria, I should be able to get most of those items over the threshold.  Some, though, mostly the plain-text spam with a few links in it, got lower spam scores than some of my regular non-spam mail, so there'll always be a few pieces of spam that Pair isn't going to catch.  Still, even if I just leave the filter exactly as it is, it appears to be correctly filtering over 92% of all spam, and the one false positive was an anomaly.  (Though I did receive a couple of false positives on more ordinary messages over the previous few days.)<\/p>\n<p>I meant to, but neglected to, look through my non-spam mail during that period to see how much of it came close to triggering Pair's spam filter.  I might be able to just lower the counts-as-spam threshold to 3 instead of 4, but that might increase the false-positive rate.  Not sure.  Further experimentation is clearly called for.<\/p>\n<p>Oh, and in case anyone cares, the highest scoring piece of spam I got during that period received a 37.1 score (where 4 is enough to mark it as spam).  Impressive.<\/p>\n\n","protected":false},"excerpt":{"rendered":"<p>Saved up my incoming spam for a couple days, a couple days ago, and then looked it over to see&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1317","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/posts\/1317","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/comments?post=1317"}],"version-history":[{"count":0,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/posts\/1317\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/media?parent=1317"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/categories?post=1317"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/tags?post=1317"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}