[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: tag/feature/attribute/dimension correlation
- To: Alex <http://www.gmail.com/~alex.>
- Subject: Re: tag/feature/attribute/dimension correlation
- From: Robert <http://dummy.us.eu.org/robert>
- Date: Sun, 27 Aug 2017 08:24:21 -0700
- Keywords: our-San-Jose-phone-number<
> From: Alex <http://www.gmail.com/~alex.>
> Date: Sat, 26 Aug 2017 21:07:27 -0400
>
> Ah, I get it now. I looked up TLSH, that helped. Are you just thinking of
> using it for finding if an email has an approximate match with a known
> spam/ham (in a new way)?
No. I do use TLSH and Nilsimsa for deduplicating the data during
training. It has made a dramatic improvement in the quality of the
bayesian model.
I'm thinking I should try to get Jing to remove co-occurrent (is that a
word??) features since SVM has to have such a limited set of features due
to memory constraints during training. (There are only a few thousand
features -- even your existing Python code would work for this purpose.)
> What was the solution to the to do list problem? Something with Markov
> chains, I think?
Yes. That was your suggestion and I began doing some coding using that.
Of course, I don't have time to finish it.
> On Aug 26, 2017 20:55, "Robert" <http://dummy.us.eu.org/robert> wrote:
> > BTW, I only thought to look at your code after I was growing frustrated at
> > having to rearrange my big todo list and wishing that I had a program
> > which rearranged my todo list automatically and wishing that you would
> > write it for me :-).