[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
fuzzy check sums
- To: Michael Grant <http://www.grant.org/~mg-dcc1>
- Subject: fuzzy check sums
- From: http://dummy.us.eu.org/robert (Robert)
- Date: Tue, 2 Apr 2002 12:18:30 -0500
- Folder: folders/o1.work
Hi. Did you post this code? I'd be interested in seeing it.
Thanks.
> Michael Grant http://www.grant.org/~mg-dcc1
> Sun, 17 Mar 2002 08:52:10 +0100 (MET)
>
> I'm new to this list. I must admit that I've had the idea of using
> fuzzy checksums to spot spam for years. Recently, I started working
> on something to do this, then a couple days ago, a friend pointed me
> at the dcc project. Oh well, it figures, someone had to have had the
> same idea!
>
> I have made some interesting headway on my own fuzzy functions. I had
> a brief look at the fuz1 and fuz2 in the source. fuz1 seems to be
> based around md5. I was never able to get enough fuzz out of using
> md5 myself, even doing md5 sums per line and such.
>
> What I found that worked surprisingly well was simply to take the
> root-mean-squares of the space separated words on each line converted
> to numbers in messages. I'm happy to share the code. Should I post
> it here or what?
>
> I also ran some tests to see how many false positives I would catch
> based on my old email. For me, it was about 1 in 150,000 and I have
> to say that the 1 message did resemble quite a bit one of the spams in
> my spam file.
>
> Michael Grant