Your article might have also mentioned ifile http://www.ai.mit.edu/~jrennie/ifile It does Bayesian analysis of mail messages and has been around for years. I use it for filtering spam. Also, although Razor & Pyzor (I haven't used Pyzor) are slow during dynamic message retrieval, suspected spam can be filtered in the background while definite non-spam can be read by the recipient. (I do this "batching" thing nightly, so I may not receive an e-mail for up to 24 hours. But, it's better than receiving spam.) I don't know about Pyzor, but Razor uses "fuzzy matching" (nilsimsa) using a moving hash (sort of like rsync & the gdiff format). Although I haven't looked at your trigram code, I imagine that it might be a similar idea (probably the fuzzy matching code is far more complex than the trigram stuff, 'though). Regardless, a very interesting article! I didn't know about Pyzor, but when the next version of Debian comes out, I'll probably use it.