[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

spam filtering



Your article might have also mentioned ifile 

http://www.ai.mit.edu/~jrennie/ifile

It does Bayesian analysis of mail messages and has been around for years.
I use it for filtering spam.  Also, although Razor & Pyzor (I haven't used
Pyzor) are slow during dynamic message retrieval, suspected spam can be
filtered in the background while definite non-spam can be read by the
recipient.  (I do this "batching" thing nightly, so I may not receive an
e-mail for up to 24 hours.  But, it's better than receiving spam.)  I
don't know about Pyzor, but Razor uses "fuzzy matching" (nilsimsa) using a
moving hash (sort of like rsync & the gdiff format).  Although I haven't
looked at your trigram code, I imagine that it might be a similar idea
(probably the fuzzy matching code is far more complex than the trigram
stuff, 'though).

Regardless, a very interesting article!  I didn't know about Pyzor, but when
the next version of Debian comes out, I'll probably use it.



Why do you want this page removed?