Hi. Did you post this code? I'd be interested in seeing it. Thanks. > Michael Grant http://www.grant.org/~mg-dcc1 > Sun, 17 Mar 2002 08:52:10 +0100 (MET) > > I'm new to this list. I must admit that I've had the idea of using > fuzzy checksums to spot spam for years. Recently, I started working > on something to do this, then a couple days ago, a friend pointed me > at the dcc project. Oh well, it figures, someone had to have had the > same idea! > > I have made some interesting headway on my own fuzzy functions. I had > a brief look at the fuz1 and fuz2 in the source. fuz1 seems to be > based around md5. I was never able to get enough fuzz out of using > md5 myself, even doing md5 sums per line and such. > > What I found that worked surprisingly well was simply to take the > root-mean-squares of the space separated words on each line converted > to numbers in messages. I'm happy to share the code. Should I post > it here or what? > > I also ran some tests to see how many false positives I would catch > based on my old email. For me, it was about 1 in 150,000 and I have > to say that the 1 message did resemble quite a bit one of the spams in > my spam file. > > Michael Grant