[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: clustering
- To: Leonid Leibman <http://www.gmail.com/~lleibman>
- Subject: Re: clustering
- From: http://dummy.us.eu.org/robert (Robert)
- Date: Thu, 31 Mar 2005 15:37:32 -0800
- Keywords: http://www.gmail.com/~lleibman
> From: Leonid Leibman <http://www.gmail.com/~lleibman>
> Date: Thu, 31 Mar 2005 17:13:56 -0500
>
> I actually installed glimpse and tried it. It is amazingly fast. It
> doesn't support real regular expressions and is probably based on Wu
> Manber...
> My God, it is DEVELOPED by Wu and Manber. Mmm.
It does support regular expressions. And approximate matching. Look up
the manual.
Yes, Uri Manber, the designer of the uber-approximate matching algorithm
that combines regular expressions and approximate matching in a single
DFA. It's mind-bogglingly complicated (I tried to read the paper when I
was working on that FAQ parsing project).
> Of course just querying an index isn't good enough for an
> "intelligent" software since it will still be too slow. There must be
> an interface to the index itself. I'll look it up.
There's WebGlimpse.
> Another thing is that such an indexing mechanism is most likely very
> lossy as far as "clustering" goes. Maybe not.
glimpse is not lossy at all. That's why it's "slow". (Perhaps it's
faster for you 'cause you're only doing a few megabytes. Trying to index
several gigabytes gets problematic.) But, it doesn't cluster at all.
ifile does classification based on words via naive bayes.
> If you find that free source search engine info, please send it to me.
OK.
> Or anything that attempts to do disambiguation...
The open source search engine does not do disambiguation. I don't know of
any open source code that does this... Actually, I don't know of any
proprietary, either, now that I think about it.
> Leonid