[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fuzzy check sums



Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I didn't post the code, nobody ever got back to me.  What do you think,
shall I post it to the list or what?  Having read the list the last few
days, it's more of a questions list and not a dev list so I wasn't sure
if it was the correct venu.  Are you one of the developers of dcc?

I've attached 2 files, essence4.c and essence6.c.  The main difference
is that essence6 is sensitive to line breaks.  It's been several weeks
since I looked at this stuff, I sure hope I'm sending you the correct
files!  I did most of my testing with essence6.  I have 9 different
functions I played with, so there's some room for confusion.

You'll need to get the gnu arbitrary percision math lib, libgmp from
your favorite gnu server.

I'd be happy to put one or both of these sums into a form that can be
included into the dcc project if there's interest.  At the moment, they
read stdin and print out a relatively long number on stdout.  Try adding
or deleting a line from the imput file and running it again.  The output
should be the same or very similar down to nearly the final digits.
Obviously these fuzzy sums work best with large files and few mods.  The
good news is that you can always chop the number at a certain number of
digits to make them fuzzier.  You'll have to do that anyway since dns
limits the overall length of a name which can be searched for.  Also
need to compact the number to use all available bits.  I'm perfectly
willing to do this, I just haven't yet since I was first experimenting
with fuzzy functions first.

Let me know what you think.

-Mike

p.s. also, please don't repost this with my email address, use
http://www.grant.org/~mg-dcc instead, cheers.


Content-Type: text/plain; charset=us-ascii;  name="essence4.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;  filename="essence4.c"

/* essence.c -*- compile-command: "cc essence.c -lgmp -o essence" -*- */
/* Copyright (c) 2002 Michael Grant <http://www.grant.org/~MGrant> */
/* digest = root mean square of space separated words treated as numbers */
/*   sqrt( W^2 + W^2 + W^2 + ... )  */

#include <stdio.h>
#include <string.h>
#include <gmp.h>

main(int argc, char **argv) {
  FILE *fp;
  char s[1024];
  char *cp, *tp;
  mpz_t n, sum;
  char *csum = NULL;

  if (argc > 1) {
    fp = fopen(argv[1], "r");
    if (fp==NULL) {
      perror(argv[1]);
      exit(1);
    }
  } else {
    fp = stdin;
  }

  mpz_init(n);
  mpz_init(sum);

  while (fgets(s, sizeof(s), fp) != NULL) {
    for (cp=strtok(s," /t/n/r"); cp && *cp; cp=strtok(NULL," /t/n/r")) {
      mpz_set_ui(n, 0);        /* n = 0 */
      for (tp=cp; *tp; tp++) {
	mpz_mul_2exp(n, n, 8);                  /* n = n<<8 */
	mpz_add_ui(n, n, (unsigned long)*tp);   /* n = n + *tp */
      }
      mpz_mul(n, n, n);        /* n = n**2 */
      mpz_add(sum, sum, n);    /* sum = sum + n */
    }
  }
  mpz_sqrt(sum, sum);
  printf("%s\n", mpz_get_str(csum, 10, sum));
}

Content-Type: text/plain; charset=us-ascii;  name="essence6.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;  filename="essence6.c"

/* essence.c -*- compile-command: "cc essence.c -lgmp -o essence" -*- */
/* Copyright (c) 2002 Michael Grant <http://www.grant.org/~MGrant> */
/* digest = root mean square of space separated words treated as numbers */
/*    sqrt( (W+W+W...W)^2 + (W+W+W...W)^2 + ... ) */

#include <stdio.h>
#include <string.h>
#include <gmp.h>

main(int argc, char **argv) {
  FILE *fp;
  char s[1024];
  char *cp, *tp;
  mpz_t n, sum_words, sum_lines;

  if (argc > 1) {
    fp = fopen(argv[1], "r");
    if (fp==NULL) {
      perror(argv[1]);
      exit(1);
    }
  } else {
    fp = stdin;
  }

  mpz_init(n);
  mpz_init(sum_words);
  mpz_init(sum_lines);

  while (fgets(s, sizeof(s), fp) != NULL) {
    mpz_set_ui(sum_words, 0);        /* sum = 0 */
    for (cp=strtok(s," /t/n/r"); cp && *cp; cp=strtok(NULL," /t/n/r")) {
      mpz_set_ui(n, 0);        /* n = 0 */
      for (tp=cp; *tp; tp++) {
	mpz_mul_2exp(n, n, 8);                  /* n = n<<8 */
	mpz_add_ui(n, n, (unsigned long)*tp);   /* n = n + *tp */
      }
      mpz_add(sum_words, sum_words, n);    /* sum = sum + n */
    }
    mpz_mul(sum_words, sum_words, sum_words);        /* sum = sum**2 */
    mpz_add(sum_lines, sum_lines, sum_words);        /* lines += sum */
  }
  mpz_sqrt(sum_lines, sum_lines);
  printf("%s\n", mpz_get_str(NULL, 10, sum_lines));
}







Why do you want this page removed?