> From: http://www.gnosis.cx/~mertz (David Mertz, Ph.D.) > Date: Wed, 02 Oct 2002 16:00:39 -0400 > > |I tried downloading your code, but it wouldn't work. What version of > |python do you use? > > This is funny, did you not get my prior message? No. I'm not sure what happened. I wonder if I should worry... > ------------------------------------------------------------------------ > To: http://dummy.us.eu.org/robert (Robert) > Subject: Re: spam filtering > Date: Tue, 01 Oct 2002 17:59:02 -0400 > > |bogofilter is still in its infancy. It has potential. (I admit that > |ifile ends up filtering on some funky "words" -- it ends up that words > |like "<table" and "<font" are indicators of spam for my 5000 spam > |messages > > Actually, '<table' seems like a perfectly good "word". What I'd rather > discount is something like 'MtYKC46lkd' (I took that from your GPG > signature; it was surrounded by "+" bytes, which might identify it as a > word to some lexers). The point of a good lexer is to eliminate "words" > that will never occur anywhere else, not necessarily to get the things > that you would look up in a dictionary. Well, maybe also to count as > single words special strings like URLs. > > I don't know what ifile does about this... It definitely doesn't handle URLs. But it's smart about pairing down excessively repeating words and excessively few words. (At least, it seems that way.) > but since I wasn't quickly > able to tell what it did from its web page, it quickly fell off my "need > to mention it" list. > > |That's interesting. You've piqued my interest. I tried downloading your > |code, but it wouldn't work. What version of python do you use? > > I used 2.2. That's needed since I used generators. It would be easy to > change the code not to do this though. > > What do you mean by "wouldn't work" though. If the problem was the > "yield" keyword, the error should have been awfully straightforward. If > the problem was something else... well, who knows. % ./spam-test.py File "./spam-test.py", line 15 product *= p ^ SyntaxError: invalid syntax I think I'm running Python 1.5. (Debian Linux is far behind, unfortunately.) > As you can see, I didn't exactly try hard to make the code polished or > reusable. I don't think its bad... but basically I wrote it to test a > hypothesis rather than to create a general purpose tool. Not that it > would be hard to write something more complete... > > |Finally, just a note: your message was marked as spam by my filter because > |there are spaces at the end of your Message-ID: line. > > Yuck! This isn't ifile that did this, is it? No. I use SpamBouncer (http://spambouncer.org) and it has what I believe to be a bug. I told the author about it. > (it doesn't sound like a > Bayesian thing). In any case, that's a really terrible filter > criterion. Not that I'm sure why my ID had spaces... my mailer doesn't > normally do that... but maybe I accidentally added something in the > header area. Still, sure sounds RFC2822 friendly to me (and not even > something I've ever noticed spammers doing... why would they bother?). > > Yours, David... > > -- > _/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: Postmodern Enterprises _/_/_/ > _/_/ ~~~~~~~~~~~~~~~~~~~~[http://www.gnosis.cx/~mertz]~~~~~~~~~~~~~~~~~~~~~ _/_/ > _/_/ The opinions expressed here must be those of my employer... _/_/ > _/_/_/_/_/_/_/_/_/_/ Surely you don't think that *I* believe them! _/_/