[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: clustering
- To: leonid leibman <http://profiles.yahoo.com/lleibman>
- Subject: Re: clustering
- From: http://dummy.us.eu.org/robert (Robert)
- Date: Thu, 31 Mar 2005 09:09:46 -0800
- Keywords: http://profiles.yahoo.com/lleibman
> From: leonid leibman <http://profiles.yahoo.com/lleibman>
> Date: Thu, 31 Mar 2005 07:43:52 -0800 (PST)
>
> Hi, Robert -- I was reading recently about clustering
> search engines (like clusty). Do you know what
> technology is behind it?
No, not specifically. I think it clusters by words and may even use an
encyclopedia. (Clusty used to be vivisimo and, at the time, I was very
impressed. It's still pretty neat, but don't know exactly how it works.)
> In general what's your opinion about the state of the
> art in clustering (if any :)? Google says that it's
> not using it (yet) since it is of limited use.
In the clusty sense, I think it is of limited use. But, I think combining
recommendation-type systems (where users have certain interests) and
search engines could be really great. http://www.directhit.com was
working in this direction before askjeeves acquired it and quashed that
part completely.
> links_2_links clustering (disambiguation) was kind of weak
> and I was trying to work on some ideas to improve it
> but I'm realizing that the state of the art may have
> improved quite a bit since then.
Links_2_Links had almost no disambiguation. It's an extremely hard problem and
there may be research written about it, but I know of no specific
technology which addresses disambiguation.
> Also, do you know of any simple data
> extraction/classification freeware/shareware?
I don't know what you mean by that. My spam filter uses ifile which uses
Naive Bayes for its classification. It is word-based (although I augment
that by combining word pairs through a separate program). There are a
number of open source machine learning libraries. I think I remember that
I was most impressed with Torch.
> On a different topic, as far as interfaces to search
> go I'm imagining that one can have drag and drop
> boxes. Say 3 ("Good match", "Irrelevant match" and
> "Undesirable match"). As a first step the user enters
> keywords. Then he can place the results in the boxes
> and the priorities of the results (and thus the
> results you'll see) will change accordingly. This is
> possible if there is a good clustering software behind
> the scenes. Does this sound like a good idea to you?
Sure. 'Though Amazon, Netflix, and Movielens use a simple star-based
rating system which may be faster for most users than dragging items. I
don't know.
> Leonid
>
> Leonid Leibman