[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
spam filtering
- To: http://www.gnosis.cx/~mertz
- Subject: spam filtering
- From: http://dummy.us.eu.org/robert (Robert)
- Date: Tue, 1 Oct 2002 11:12:02 -0400
Your article might have also mentioned ifile
http://www.ai.mit.edu/~jrennie/ifile
It does Bayesian analysis of mail messages and has been around for years.
I use it for filtering spam. Also, although Razor & Pyzor (I haven't used
Pyzor) are slow during dynamic message retrieval, suspected spam can be
filtered in the background while definite non-spam can be read by the
recipient. (I do this "batching" thing nightly, so I may not receive an
e-mail for up to 24 hours. But, it's better than receiving spam.) I
don't know about Pyzor, but Razor uses "fuzzy matching" (nilsimsa) using a
moving hash (sort of like rsync & the gdiff format). Although I haven't
looked at your trigram code, I imagine that it might be a similar idea
(probably the fuzzy matching code is far more complex than the trigram
stuff, 'though).
Regardless, a very interesting article! I didn't know about Pyzor, but when
the next version of Debian comes out, I'll probably use it.