This is a program which will capture common word pairs (which may themselves be concatenated). The purpose of this program is to pick up where ifile (the Naive Bayes message classifier) comes up short. ifile only parses single space-delimited words. Often, phrases can also be significant in determining the category that a message belongs to. For example, "get rich" can be an indicator of spam. Together with ifile, statistically significant phrases bubble their way to the top.

The phrases.cc program may be obtained here.

Robert's Free Software

Date Last Modified: Thu Feb 15 16:01:22 UTC 2007