Monday, August 11, 2003

Spam Filtering

Paul Graham, for those of you who don't know, was the gentleman who first proposed the Baysean spam filter in his article, A Plan for Spam.

He has recently written a new paper, Filters that Fight Back. He proposes in this paper that the next generation of spam filters should, on detecting a spam, follow the links in the body of that spam and download a few copies of the web pages that that they point to. If enough people were using filters that did this, it would have a number of positive effects. Firstly, spammers bandwidth usage would skyrocket, likely costing them lots of money. Secondly, this would probably be enough to crash (or at least slow down so much that they might as well have crashed) the spammer's website, denying people everywhere those extra three inches.

This is an interesting idea. The first thing I thought after reading it was: bandwidth. Depending on whose estimate you read, spam accounts for around 40-60% of all the email on the Internet. This is a huge chunk of bandwidth, and it's starting to choke the servers of some small to medium sized ISPs. Writing a fighting filter would only make this problem worse.

Yes, eventually this technique would probably make spam unprofitable. If spam did become unprofitable, spammers would start to go out of business and the bandwidth devoted to sending and fighting spam would tail off. From the ISP's point of view, relaying a web page from a
server to a customer is a much lower-overhead operation than relaying an email. So maybe my concern is unjustified. Mr. Graham does respond to this very issue in his FFB FAQ, but to me his response seems a bit flippant.

Bandwidth concerns or not, when such a filter becomes an option, I'll use it.