Robert's Junkicide (Anti-SPAM) Procmail Script

This procmail script will kill/delete any mail messages which are suspected of being SPAM or unsolicited e-mail of any kind while minimizing false positives (i.e., messages marked as spam but are OK). (It is very similar to Undertaker and similar in spirit to Software House on the Coasts of Orion's procmailrc. It also uses techniques outlined in Scott Vintinner's Fairly-Secure Anti-SPAM Gateway, except at the individual customizeable level.) First, spambouncer (use the FILTER=yes option) or spamassassin (use "rewrite_subject 0", "auto_learn 0", "report_header 1", "use_terse_report 1", "defang_mime 0" in your user_prefs) is used to mark up the message. Then, a whitelist is used to let in known addresses. If this fails, various blacklists and borrowed techniques (including the Habeus watermark, razor, DCC, bogofilter, ifile) are used to determine if the message is "nasty"; only after the message is marked non-nasty, customizeable "whitelist" keywords (including company names and friend's names) are used along with some machine learning (again via ifile) to determine if the recipient has true interest in the mail. It also deals with sneakemail.com embedded addresses and Postini tags properly. It is released under the GPL.

To use this software, you'll need to do the following:

Ensure you are running The SPAM Bouncer or spamassassin on your ISP's machine, or are otherwise preprocessing your mail using it. Use FILTER=yes if you using SPAM Bouncer on the same machine as junkicide. The SPAM Bouncer and spamassassin both create markers which indicate that a piece of mail is suspect and this script leverages those markers. (Note that The SPAM Bouncer catches more spam and has more detailed markers than spamassassin and is the preferred filter.)
Have your trusted e-mail aliases stored in $HOME/.mailaliases and/or $HOME/.mailrc and/or $HOME/bin/resendmail.friends.list and/or $HOME/folders/o1.work. (See the PATHS variable in the script.) The format is unimportant. What is important is that each line have at least one e-mail address of someone you trust.
Have your sent messages in $HOME/folders/o and/or $HOME/folders/o1.work. (See the PATHS and PATHS1 variables in the script.) Besides having the e-mail address of the person that you sent the message to, it's important that the "Subject:" field of the messages you sent be present. "Subject:" must be at the beginning of the line.
Have the list of spammers and/or spam domains in $HOME/bin/junkicide.nasties. (See the NASTIESFILE variable in the script.) An example of this file might be:
```
 onelist.com
.#onelist.com
.onelist.com
@#onelist.com
@onelist.com
[#onelist.com
[onelist.com
errors-256970-0-bizop4u=puffandstuff.com@onelist.com
```
Note that spaces at the beginning of lines are significant and should be added to further filter out SPAM. You can start out with these list of nasties, but sources of spammers and spam/blacklist domains lists that should be added regularly to this list can be found here.
Modify MAINADDR to be your main e-mail address.
Modify ROOTADDR to be your alternative e-mail address.
Modify TRASHDIR to be a directory where temporary files can be stored by you and nobody else. Note: the TRASHDIR name can only have period, alphabetic characters, and digits in it. (There is a complicated security reason why this must be.)
Modify LOWPRIORITY to be a file where you want any suspected but not definitive SPAM mail messages should be placed. This file should be examined occassionally to see if any useful mail was received; delete after examination.
Modify PROCMAIL_TRASH to be a file where you want any designated SPAM to be placed; /dev/null may be acceptable after you have tested the script.
Modify FULLNAMES to be your full name (can be a regular expression).
Modify WORKDOMAINS to be the domain name of your company (before the .com part).
Modify COMPANIES to be the name of your company (if any; can be a regular expression).
Modify KEYWORDS to be an egrep-style regular expression of keywords which, when present, will allow the message to reach you. An example might be:
KEYWORDS="butterflies|poetry"
Ensure the following programs are accessible and executeable:
- formail
- grep
- fgrep
- egrep
- sed
- echo
- xargs
- wc
- head
- perl
- expr
- printenv
- tee
- sh
- strings
- bc
- ls
- awk
Stick a INCLUDERC=junkicide.proc somewhere in your existing .procmailrc. Any solicited e-mail/non-SPAM will pass through. The unsolicited e-mail/SPAM will be put into $PROCMAIL_TRASH.

(Optional.) Include this segment of code before you include "sb.rc" (the spambouncer) or before you filter through spamassassin, but after you set your THISISP (note that THISISP is no longer used by spambouncer, but this code segment still depends upon it):

FORGEDFROM=0
FORGEDRECEIVED=0
:0w
*$!^(from|return-path:)  *<?[a-z][-_.a-z0-9=+]*((@(([a-z][-_a-z0-9]*\.)?$THISISP|\[\]|localhost))?( |$|>))
{
	:0w
	*! ^from:.*[^-_.a-z0-9][a-z0-9][-_.a-z0-9=+]*@[-_.a-z0-9]+\.[a-z]?[a-z][a-z]($|[^-_.a-z0-9])
	{ FORGEDFROM=1 }
	:Ew
	*^received:.*[^-_.a-z0-9@]from ()
	*$^from:.*[^-_.a-z0-9@][a-z][-.a-z0-9]*(@(([a-z][-_a-z0-9]*\.)?$THISISP|localhost)([ >(]|$)|$)
	{ FORGEDFROM=1 }
}
:Ew
{
	:0hw
	HEADER=|formail -c|perl -ne 'print;last if(/^Received:.*by  *([a-z][-_a-z0-9]*\.)?'"$THISISP"'/);exit(1) if (/^$/);'
	:0w
	*^received:.*[^-_.a-z0-9@]from ()
	*$!HEADER??^received:.*[^-_.a-z0-9@]from  *([^ ]+  *\(HELO *)?([a-z][-_a-z0-9]*\.)?$THISISP
	{ FORGEDRECEIVED=1 FORGEDFROM=1 }
}

:0w
*FORGEDFROM??1
{
	VERBOSE=on
	:0w
	*FORGEDRECEIVED??1
	{
		FROM="foo@bar.com"
		:0w
		*HEADER??^received:.*[^-_.a-z0-9@]from  *\/[^ ]+\.[^ ]+
		{
			FROM="somebodybogus@$MATCH"
		}
		:0hfw
		|formail -i Return-Path: -i Reply-To: | grep -v '^From '
	}
	:Ew
	{
		:0hw
		*^return-path:  *<?[a-z0-9][-_.a-z0-9=+]*@[-_.a-z0-9]+\.[a-z]?[a-z][a-z]($|>)
		FROM=|formail -zx Return-Path:|sed -e 's,^<,,;s,>$,,'
		:Ehw
		FROM=|formail -zx 'From '|sed -e 's, .*,,'
	}
	:0hwf
	|formail -R From: X-forged-from: -a "From: $FROM" -i Reply-To:
	VERBOSE=off
}

The above code will get rid of forged return addresses; spambouncer does not check forged addresses completely before referring to its NOBOUNCE, LEGITLISTS, and ALWAYSBLOCK files. Spamassassin does not check addresses which are forged as the e-mail address of the user (i.e., the whitelist won't work as well). (Note that this has only been tested thoroughly with qmail, but not with sendmail, postfix, exim, or smail.)

(Optional.)Set SNEAKEMAILNAME to be your sneakemail.com username.
(Optional.) Get a copy of razor. Once razor-check and razor-report are installed, it will improve spam recognition. (Razor uses fuzzy matching.)
(Optional.) Get a copy of DCC. Once dccproc is installed, it will improve spam recognition.
(Optional.) Have the list of subject lines that you're interested in (one per line) in $HOME/bin/resendmail.subject.list. (See the SUBJLIST variable in the script.) This can be built by passing your sent mail through this procmail code segment:
```
:0whc:$HOME/bin/resendmail.subject.list.lock
|formail -zx Subject | sed -e ':1;s,^R[eE] *: *,,;s,^F[Ww] *: *,,;s,R[eE] *\[[1-9][0-9]*\] *: *,,;t1' >> $HOME/bin/resendmail.subject.list
```
(Optional.) Have the list of the full names of trusted people in $HOME/bin/junkicide.names and/or in $HOME/bin/resendmail.names.list. (See the NAMESLIST1 and NAMESLIST2 variables in the script.) Don't use simple first names -- use full names to stop more SPAM. $HOME/bin/resendmail.names.list can be built by passing your sent mail through this procmail code segment:
```
:0whc:$HOME/bin/resendmail.friends.list.lock
|formail -c -zx To: | sed -e 's,[ 	],,g;s/"$[^[" ]*$[^"]*" *<ticket[^>]*@sneakemail\.com>/\1/g;s/[^,]*<$[^>]*$>/\1/g;s,$[^(]*$([^)]*),\1,g;s/,/\
/g' | (fgrep @ || true) >> $HOME/bin/resendmail.friends.list
```
(Optional.) Have the subject lines of messages that you've sent in the past be in $HOME/folders/archive/subject_line where subject_line is the lowercased subject of the message you sent with underscores (_) substituted for spaces. (I have a program which will generate this from sent mail messages which can be made available if there is interest.)
(Optional.) Get a copy of bogofilter. (It uses improved Bayesian methods which is allegedly better than ifile.) You'll need to teach it SPAM. On a weekly basis, ideally, you should feed $TRASHDIR/junkicide.spam (see the TEACHFOLDER variable in the script) to bogofilter -s. (Be sure to delete $TRASHDIR/junkicide.spam after this or at least delete it occassionally.) Also, all mail that you send and all valid non-spam that you receive should be fed to bogofilter -n. Note that you need teach bogofilter both spam and non-spam, otherwise it will not know the difference. My .bogofilter/wordlist.db file is publicly available and can be grabbed here; to this, you should train your nonspam by bogofilter -n. A spamlist.db can created by feeding this archive of spam into bogofilter -s.
(Optional.) Get a copy of ifile. (It uses Bayesian methods.) As with bogofilter, you'll need to teach it SPAM, usually on a weekly basis. To train ifile so it knows the difference between spam and non-spam, you can start with my .idata file, if you'd like, and add all your sent email as "nonspam" to this.
(Optional.) Get a copy of spambayes. As with bogofilter, you'll need to teach it SPAM, usually on a weekly basis. To train spambayes so it knows the difference between spam and non-spam, you can start with my spambayes database, if you'd like.
(Optional.) Get a copy of spamprobe. As with spambayes, you'll need to teach it SPAM, usually on a weekly basis. To train spamprobe so it knows the difference between spam and non-spam, you can start with a tar file of my spamprobe database, if you'd like.
(Optional.) Get a copy of dspam. As with spamprobe, you'll need to teach it SPAM, usually on a weekly basis.

This software has only been tested under Debian 2.2 GNU/Linux using qmail.

Here's the script: junkicide.proc

Robert's Tiny Software Repository for Procmail

Date Last Modified: Sat Aug 15 16:20:42 UTC 2009