[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: clustering

To: Leonid Leibman <http://www.gmail.com/~lleibman>
Subject: Re: clustering
From: http://dummy.us.eu.org/robert (Robert)
Date: Thu, 31 Mar 2005 15:37:32 -0800
Keywords: http://www.gmail.com/~lleibman

 > From: Leonid Leibman <http://www.gmail.com/~lleibman>
 > Date: Thu, 31 Mar 2005 17:13:56 -0500
 >
 > I actually installed glimpse and tried it. It is amazingly fast. It
 > doesn't support real regular expressions and is probably based on Wu
 > Manber...
 > My God, it is DEVELOPED by Wu and Manber. Mmm.

It does support regular expressions.  And approximate matching.  Look up
the manual.

Yes, Uri Manber, the designer of the uber-approximate matching algorithm
that combines regular expressions and approximate matching in a single
DFA.  It's mind-bogglingly complicated (I tried to read the paper when I
was working on that FAQ parsing project).

 > Of course just querying an index isn't good enough for an
 > "intelligent" software since it will still be too slow. There must be
 > an interface to the index itself. I'll look it up.

There's WebGlimpse.

 > Another thing is that such an indexing mechanism is most likely very
 > lossy as far as "clustering" goes. Maybe not.

glimpse is not lossy at all.  That's why it's "slow".  (Perhaps it's
faster for you 'cause you're only doing a few megabytes.  Trying to index
several gigabytes gets problematic.)  But, it doesn't cluster at all.
ifile does classification based on words via naive bayes.

 > If you find that free source search engine info, please send it to me.

OK.

 > Or anything that attempts to do disambiguation...

The open source search engine does not do disambiguation.  I don't know of
any open source code that does this...  Actually, I don't know of any
proprietary, either, now that I think about it.

 > Leonid

References:
- Re: clustering
  - From: Robert

Prev by Date: Re: clustering
Next by Date: Re: clustering
Previous by thread: Re: clustering
Next by thread: Re: clustering
Index(es):
- Date
- Thread