[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: spam filtering



 > From: http://www.gnosis.cx/~mertz (David Mertz, Ph.D.)
 > Date: Wed, 02 Oct 2002 16:00:39 -0400
 >
 > |I tried downloading your code, but it wouldn't work.  What version of
 > |python do you use?
 > 
 > This is funny, did you not get my prior message?

No.  I'm not sure what happened.  I wonder if I should worry...

 > ------------------------------------------------------------------------
 > To: http://dummy.us.eu.org/robert (Robert)
 > Subject: Re: spam filtering
 > Date: Tue, 01 Oct 2002 17:59:02 -0400
 > 
 > |bogofilter is still in its infancy.  It has potential.  (I admit that
 > |ifile ends up filtering on some funky "words" -- it ends up that words
 > |like "<table" and "<font" are indicators of spam for my 5000 spam
 > |messages
 > 
 > Actually, '<table' seems like a perfectly good "word".  What I'd rather
 > discount is something like 'MtYKC46lkd' (I took that from your GPG
 > signature; it was surrounded by "+" bytes, which might identify it as a
 > word to some lexers).  The point of a good lexer is to eliminate "words"
 > that will never occur anywhere else, not necessarily to get the things
 > that you would look up in a dictionary.  Well, maybe also to count as
 > single words special strings like URLs.
 > 
 > I don't know what ifile does about this...

It definitely doesn't handle URLs.  But it's smart about pairing down
excessively repeating words and excessively few words.  (At least, it
seems that way.)

 > but since I wasn't quickly
 > able to tell what it did from its web page, it quickly fell off my "need
 > to mention it" list.
 >
 > |That's interesting.  You've piqued my interest.  I tried downloading your
 > |code, but it wouldn't work.  What version of python do you use?
 > 
 > I used 2.2.  That's needed since I used generators.  It would be easy to
 > change the code not to do this though.
 > 
 > What do you mean by "wouldn't work" though.  If the problem was the
 > "yield" keyword, the error should have been awfully straightforward.  If
 > the problem was something else... well, who knows.

% ./spam-test.py
  File "./spam-test.py", line 15
    product *= p
             ^
SyntaxError: invalid syntax

I think I'm running Python 1.5.  (Debian Linux is far behind, unfortunately.)

 > As you can see, I didn't exactly try hard to make the code polished or
 > reusable.  I don't think its bad... but basically I wrote it to test a
 > hypothesis rather than to create a general purpose tool.  Not that it
 > would be hard to write something more complete...
 > 
 > |Finally, just a note: your message was marked as spam by my filter because
 > |there are spaces at the end of your Message-ID: line.
 > 
 > Yuck!  This isn't ifile that did this, is it?

No.  I use SpamBouncer (http://spambouncer.org) and it has what I believe
to be a bug.  I told the author about it.

 > (it doesn't sound like a
 > Bayesian thing).  In any case, that's a really terrible filter
 > criterion.  Not that I'm sure why my ID had spaces... my mailer doesn't
 > normally do that... but maybe I accidentally added something in the
 > header area.  Still, sure sounds RFC2822 friendly to me (and not even
 > something I've ever noticed spammers doing... why would they bother?).
 > 
 > Yours, David...
 > 
 > --
 >     _/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: Postmodern Enterprises _/_/_/
 >    _/_/    ~~~~~~~~~~~~~~~~~~~~[http://www.gnosis.cx/~mertz]~~~~~~~~~~~~~~~~~~~~~  _/_/
 >   _/_/  The opinions expressed here must be those of my employer...   _/_/
 >  _/_/_/_/_/_/_/_/_/_/ Surely you don't think that *I* believe them!  _/_/



Why do you want this page removed?