[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fuzzy check sums



Hi.  Did you post this code?  I'd be interested in seeing it.

Thanks.

 > Michael Grant http://www.grant.org/~mg-dcc1 
 > Sun, 17 Mar 2002 08:52:10 +0100 (MET) 
 > 
 > I'm new to this list.  I must admit that I've had the idea of using
 > fuzzy checksums to spot spam for years.  Recently, I started working
 > on something to do this, then a couple days ago, a friend pointed me
 > at the dcc project.  Oh well, it figures, someone had to have had the
 > same idea!
 > 
 > I have made some interesting headway on my own fuzzy functions.  I had 
 > a brief look at the fuz1 and fuz2 in the source.  fuz1 seems to be
 > based around md5.  I was never able to get enough fuzz out of using
 > md5 myself, even doing md5 sums per line and such.
 > 
 > What I found that worked surprisingly well was simply to take the
 > root-mean-squares of the space separated words on each line converted
 > to numbers in messages.  I'm happy to share the code.  Should I post
 > it here or what?
 > 
 > I also ran some tests to see how many false positives I would catch
 > based on my old email.  For me, it was about 1 in 150,000 and I have
 > to say that the 1 message did resemble quite a bit one of the spams in 
 > my spam file.
 > 
 > Michael Grant



Why do you want this page removed?