[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

spam filtering

To: http://www.gnosis.cx/~mertz
Subject: spam filtering
From: http://dummy.us.eu.org/robert (Robert)
Date: Tue, 1 Oct 2002 11:12:02 -0400

Your article might have also mentioned ifile 

http://www.ai.mit.edu/~jrennie/ifile

It does Bayesian analysis of mail messages and has been around for years.
I use it for filtering spam.  Also, although Razor & Pyzor (I haven't used
Pyzor) are slow during dynamic message retrieval, suspected spam can be
filtered in the background while definite non-spam can be read by the
recipient.  (I do this "batching" thing nightly, so I may not receive an
e-mail for up to 24 hours.  But, it's better than receiving spam.)  I
don't know about Pyzor, but Razor uses "fuzzy matching" (nilsimsa) using a
moving hash (sort of like rsync & the gdiff format).  Although I haven't
looked at your trigram code, I imagine that it might be a similar idea
(probably the fuzzy matching code is far more complex than the trigram
stuff, 'though).

Regardless, a very interesting article!  I didn't know about Pyzor, but when
the next version of Debian comes out, I'll probably use it.

Follow-Ups:
- Re: spam filtering
  - From: David Mertz, Ph.D.

Prev by Date: Re: Need simple Perl code to parse html pages
Next by Date: Re: spam filtering
Previous by thread: Re: Need simple Perl code to parse html pages
Next by thread: Re: spam filtering
Index(es):
- Date
- Thread