[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: clustering



 > From: Leonid Leibman <http://www.gmail.com/~lleibman>
 > Date: Thu, 31 Mar 2005 20:07:05 -0500
 >
 > Conceptually clustering and disambiguation is the same. Basically one
 > can think of a concept as represented by a group of items (pages for
 > example) pertaining to this concept.  Such groups can of course
 > overlap (same thig can be about cars and about money). Finding good
 > representative clusters and properly "projecting" to the relevant ones
 > is disambiguation (or one mechanism of it). This is what I mean.

Oh, OK.  Links_2_Links's clustering did not disambiguate at all -- there was
lots of ambiguity in concepts.

 > glimpse was fast searching but slow indexing -- that took a while (for .5 gig).

Actually, search is pretty slow, too.

 > Wu Manber doesn't do DFA at all. It's actually a much simpler concept
 > but it doesn't work well with wildcards. I did look up the manual and
 > it indicates that it works with regular expressions to some extent
 > (the extent I'm sure being things like |-ing or small char classes).
 > Well, maybe I know a different algorithm...??? I saw the original
 > Wu-Manber article. Are we talking about the same thing?

Maybe something different.  The version here says that "a regular
expression must match words that appear in the index for glimpse to find
it".  But, it is full-fledged regular expressions.  There is also
wildcards ('#') which is like Google's '*' (star) operator.  There is some
option to force glimpse to do a full search and not use the index (which
is slower) so that you can do agrep-type searches on your corpus.  In that
case, regular expressions are most useful.

 > Thanks for WebGlimpse.

Sure.

 > Leonid




Why do you want this page removed?