This procmail script will
kill/delete any mail messages which are suspected of being SPAM or
unsolicited e-mail of any kind while minimizing false positives (i.e.,
messages marked as spam but are OK).
(It is very similar to
Undertaker
and similar in spirit to
Software House on the Coasts of Orion's procmailrc. It also uses techniques outlined in
Scott Vintinner's Fairly-Secure Anti-SPAM Gateway, except at the individual customizeable level.)
First,
spambouncer
(use the FILTER=yes option)
or spamassassin
(use "rewrite_subject 0", "auto_learn 0", "report_header 1",
"use_terse_report 1", "defang_mime 0" in your user_prefs)
is used to mark up the message. Then, a whitelist is used to let in known addresses. If this
fails, various blacklists and borrowed techniques (including the
Habeus watermark,
razor, DCC,
bogofilter,
ifile) are used to determine if the message
is "nasty"; only after the message is marked non-nasty, customizeable
"whitelist" keywords (including company names and friend's names) are used
along with some machine learning (again via ifile) to
determine if the recipient has true interest in the mail. It also deals
with
sneakemail.com embedded addresses
and Postini tags
properly. It is released under the
GPL.
To use this software, you'll need to do the following:
- Ensure you are running
The
SPAM Bouncer or spamassassin
on your ISP's machine, or are otherwise preprocessing
your mail using it. Use FILTER=yes if you using
SPAM Bouncer on the same machine as junkicide.
The
SPAM Bouncer and
spamassassin
both create markers which indicate that a piece of mail
is suspect and this script leverages those markers. (Note that
The
SPAM Bouncer catches more spam and has more detailed markers than
spamassassin
and is the preferred filter.)
- Have your trusted e-mail aliases stored in
$HOME/.mailaliases and/or
$HOME/.mailrc and/or
$HOME/bin/resendmail.friends.list and/or
$HOME/folders/o1.work.
(See the PATHS variable in the script.)
The format is unimportant.
What is important is that each line have at least one e-mail address
of someone you trust.
- Have your sent messages in
$HOME/folders/o and/or
$HOME/folders/o1.work.
(See the PATHS and PATHS1 variables
in the script.)
Besides having the e-mail
address of the person that you sent the message to, it's important
that the "Subject:" field of the messages you sent be present.
"Subject:" must be at the beginning of the line.
- Have the list of spammers and/or spam domains
in $HOME/bin/junkicide.nasties.
(See the NASTIESFILE variable in the script.)
An example of
this file might be:
onelist.com
.#onelist.com
.onelist.com
@#onelist.com
@onelist.com
[#onelist.com
[onelist.com
errors-256970-0-bizop4u=puffandstuff.com@onelist.com
Note that spaces at the beginning of lines are significant and should
be added to further filter out SPAM. You can start out with these
list of nasties, but sources of spammers and
spam/blacklist domains lists that should be added regularly to this
list can be found
here.
- Modify MAINADDR to be your main e-mail address.
- Modify ROOTADDR to be your alternative e-mail address.
- Modify TRASHDIR to be a directory where temporary
files can be stored by you and nobody else. Note:
the TRASHDIR name can only have period, alphabetic
characters, and digits in it. (There is a complicated security reason
why this must be.)
- Modify LOWPRIORITY to be a file where you want
any suspected but not definitive SPAM mail messages should be placed.
This file should be examined occassionally to see if any useful mail
was received; delete after examination.
- Modify PROCMAIL_TRASH to be a file where you want
any designated SPAM to be placed; /dev/null may be
acceptable after you have tested the script.
- Modify FULLNAMES to be your full name (can be a
regular expression).
- Modify WORKDOMAINS to be the domain name of your
company (before the .com part).
- Modify COMPANIES to be the name of your company (if
any; can be a
regular
expression).
- Modify KEYWORDS to be an egrep-style
regular
expression of keywords which, when present, will allow the message
to reach you. An example might be:
KEYWORDS="butterflies|poetry"
- Ensure the following programs are accessible and executeable:
- formail
- grep
- fgrep
- egrep
- sed
- echo
- xargs
- wc
- head
- perl
- expr
- printenv
- tee
- sh
- strings
- bc
- ls
- awk
- Stick a INCLUDERC=junkicide.proc somewhere in your
existing .procmailrc. Any solicited e-mail/non-SPAM will
pass through. The unsolicited e-mail/SPAM will be put into
$PROCMAIL_TRASH.
- (Optional.)
Include this segment of code before you include "sb.rc" (the
spambouncer)
or before you filter through
spamassassin,
but after you set your THISISP (note that
THISISP
is no longer used by spambouncer, but this code segment still depends
upon it):
FORGEDFROM=0
FORGEDRECEIVED=0
:0w
*$!^(from|return-path:) *<?[a-z][-_.a-z0-9=+]*((@(([a-z][-_a-z0-9]*\.)?$THISISP|\[\]|localhost))?( |$|>))
{
:0w
*! ^from:.*[^-_.a-z0-9][a-z0-9][-_.a-z0-9=+]*@[-_.a-z0-9]+\.[a-z]?[a-z][a-z]($|[^-_.a-z0-9])
{ FORGEDFROM=1 }
:Ew
*^received:.*[^-_.a-z0-9@]from ()
*$^from:.*[^-_.a-z0-9@][a-z][-.a-z0-9]*(@(([a-z][-_a-z0-9]*\.)?$THISISP|localhost)([ >(]|$)|$)
{ FORGEDFROM=1 }
}
:Ew
{
:0hw
HEADER=|formail -c|perl -ne 'print;last if(/^Received:.*by *([a-z][-_a-z0-9]*\.)?'"$THISISP"'/);exit(1) if (/^$/);'
:0w
*^received:.*[^-_.a-z0-9@]from ()
*$!HEADER??^received:.*[^-_.a-z0-9@]from *([^ ]+ *\(HELO *)?([a-z][-_a-z0-9]*\.)?$THISISP
{ FORGEDRECEIVED=1 FORGEDFROM=1 }
}
:0w
*FORGEDFROM??1
{
VERBOSE=on
:0w
*FORGEDRECEIVED??1
{
FROM="foo@bar.com"
:0w
*HEADER??^received:.*[^-_.a-z0-9@]from *\/[^ ]+\.[^ ]+
{
FROM="somebodybogus@$MATCH"
}
:0hfw
|formail -i Return-Path: -i Reply-To: | grep -v '^From '
}
:Ew
{
:0hw
*^return-path: *<?[a-z0-9][-_.a-z0-9=+]*@[-_.a-z0-9]+\.[a-z]?[a-z][a-z]($|>)
FROM=|formail -zx Return-Path:|sed -e 's,^<,,;s,>$,,'
:Ehw
FROM=|formail -zx 'From '|sed -e 's, .*,,'
}
:0hwf
|formail -R From: X-forged-from: -a "From: $FROM" -i Reply-To:
VERBOSE=off
}
The above code will get rid of forged return addresses; spambouncer
does not check forged addresses completely before referring to its
NOBOUNCE, LEGITLISTS, and ALWAYSBLOCK files.
Spamassassin does not check addresses which are forged as the e-mail
address of the user (i.e., the whitelist won't work as well).
(Note that this has only been tested thoroughly with qmail,
but not with sendmail, postfix, exim, or
smail.)
- (Optional.)Set SNEAKEMAILNAME to be your
sneakemail.com username.
- (Optional.)
Get a copy of razor.
Once razor-check and razor-report
are installed, it will improve spam recognition. (Razor uses
fuzzy matching.)
- (Optional.)
Get a copy of DCC.
Once dccproc is installed, it will improve spam
recognition.
- (Optional.)
Have the list of subject lines that you're interested in (one per line)
in $HOME/bin/resendmail.subject.list.
(See the SUBJLIST variable in the script.) This can
be built by passing your sent mail through this procmail code segment:
:0whc:$HOME/bin/resendmail.subject.list.lock
|formail -zx Subject | sed -e ':1;s,^R[eE] *: *,,;s,^F[Ww] *: *,,;s,R[eE] *\[[1-9][0-9]*\] *: *,,;t1' >> $HOME/bin/resendmail.subject.list
- (Optional.)
Have the list of the full names of trusted people
in $HOME/bin/junkicide.names and/or in
$HOME/bin/resendmail.names.list. (See the
NAMESLIST1 and NAMESLIST2 variables
in the script.) Don't use simple first names -- use full names to
stop more SPAM. $HOME/bin/resendmail.names.list can
be built by passing your sent mail through this procmail code segment:
:0whc:$HOME/bin/resendmail.friends.list.lock
|formail -c -zx To: | sed -e 's,[ ],,g;s/"\([^[" ]*\)[^"]*" *<ticket[^>]*@sneakemail\.com>/\1/g;s/[^,]*<\([^>]*\)>/\1/g;s,\([^(]*\)([^)]*),\1,g;s/,/\
/g' | (fgrep @ || true) >> $HOME/bin/resendmail.friends.list
- (Optional.)
Have the subject lines of messages that you've sent in the past be
in $HOME/folders/archive/subject_line where
subject_line is the lowercased subject of the message you sent
with underscores (_) substituted for spaces.
(I have a program which will generate this from sent mail messages
which can be made available if there is interest.)
- (Optional.)
Get a copy of bogofilter.
(It uses
improved
Bayesian methods which is allegedly better than
ifile.)
You'll need to teach it SPAM. On a weekly basis, ideally, you
should feed $TRASHDIR/junkicide.spam (see the
TEACHFOLDER variable in the script) to
bogofilter -s. (Be sure to delete
$TRASHDIR/junkicide.spam after this or at least
delete it occassionally.) Also, all mail that you send and all valid
non-spam that you receive should be fed
to bogofilter -n. Note that you need teach bogofilter
both spam and non-spam, otherwise it will not know the difference.
My .bogofilter/wordlist.db file is publicly available and can be grabbed
here; to this, you should train your
nonspam by bogofilter -n. A spamlist.db can
created by feeding this
archive of spam into bogofilter -s.
- (Optional.)
Get a copy of ifile.
(It uses
Bayesian methods.)
As with bogofilter, you'll need to teach it SPAM,
usually on a weekly basis.
To train ifile so it knows the difference between spam and non-spam,
you can start with my .idata file, if you'd like,
and add all your sent email as "nonspam" to this.
- (Optional.)
Get a copy of spambayes.
As with bogofilter, you'll need to teach it SPAM,
usually on a weekly basis.
To train spambayes so it knows the difference between spam and non-spam,
you can start with my spambayes database,
if you'd like.
- (Optional.)
Get a copy of spamprobe.
As with spambayes, you'll need to teach it SPAM,
usually on a weekly basis.
To train spamprobe so it knows the difference between spam and non-spam,
you can start with
a tar file of my spamprobe database,
if you'd like.
- (Optional.)
Get a copy of dspam.
As with spamprobe, you'll need to teach it SPAM,
usually on a weekly basis.
This software has only been tested under
Debian 2.2 GNU/Linux
using qmail.
Here's the script: junkicide.proc
Date Last Modified: Sat Aug 15 16:20:42 UTC 2009