[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: spam filtering

To: http://dummy.us.eu.org/robert (Robert)
Subject: Re: spam filtering
From: http://www.gnosis.cx/~mertz (David Mertz, Ph.D.)
Date: Tue, 01 Oct 2002 12:14:59 -0400
Cc: http://www.gnosis.cx/~mertz
In-Reply-To: <http://www.robert/~HFHv+NZVj3cS0z6ovwQsGQ>
Organization: Gnosis Software
Reply-To: http://www.gnosis.cx/~mertz

http://dummy.us.eu.org/robert (Robert) wrote:
|Your article might have also mentioned ifile
|http://www.ai.mit.edu/~jrennie/ifile

Yeah, maybe.  But it didn't look as easy to setup in the context of my
corpora statistics as were the little tools I used.  I guess I could
have listed ifile in the Resources, but there are literally dozens of
different implementations of Bayesian filtering, and I didn't want to
list them all.

I included Bogofilter in Resources because, from its description, it did
some work on improving the lexing algorithm.  But even there, I -tested-
with a quick Python program.  I don't think the results would differ
that much... and my point was to look at the conceptual area, not the
individual tools.

|don't know about Pyzor, but Razor uses "fuzzy matching" (nilsimsa) using a
|moving hash (sort of like rsync & the gdiff format).  Although I haven't
|looked at your trigram code, I imagine that it might be a similar idea
|(probably the fuzzy matching code is far more complex than the trigram
|stuff, 'though).

Not really the same.  Nilsimsa is interesting, as is the general concept
of a fuzzy hash.  But my use of trigrams has nothing to do with hashing.
The set of trigrams in a message is simply a bunch of data points for my
tool.  Just trigrams instead of words, like I wrote.  There's nothing
particularly complex or interesting about my trigram tool... except the
fact it actually *works* rather well.

--
    _/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: Postmodern Enterprises _/_/_/
   _/_/    ~~~~~~~~~~~~~~~~~~~~[http://www.gnosis.cx/~mertz]~~~~~~~~~~~~~~~~~~~~~  _/_/
  _/_/  The opinions expressed here must be those of my employer...   _/_/
 _/_/_/_/_/_/_/_/_/_/ Surely you don't think that *I* believe them!  _/_/

Follow-Ups:
- Re: spam filtering
  - From: Robert
- Re: spam filtering
  - From: Robert

References:
- spam filtering
  - From: Robert

Prev by Date: spam filtering
Next by Date: Re: spam filtering
Previous by thread: spam filtering
Next by thread: Re: spam filtering
Index(es):
- Date
- Thread