Stopping Incoming Spam

As a mail server administrator, you probably want to minimize the amount of spam your system processes. Numerous techniques for combating spam have been developed over the years. Unfortunately, none is perfect. All spam-fighting techniques miss some incoming spam (so-called misses) or misclassify some legitimate e-mail as spam (so-called false positives). You can select one or more spam-fighting techniques to minimize one or both of these types of errors, and you may be able to adjust a spam criterion to decrease one type of error at the expense of increasing the other. It's usually better to let a spam through than to delete a legitimate e-mail, so most antispam tools are weighted in favor of producing misses compared to false positives.

This section begins with a summary of the most popular techniques used to fight spam. It continues with more information on two of the most popular techniques: blackhole lists and Procmail filters.

A Rundown of Spam-Fighting Techniques

Spam-fighting techniques range from tools designed to stop spam before it reaches your mail servers to those that scan message contents in an effort to identify spams based on the language they contain. As a general rule, spam-fighting tools fall into one or more of several categories:

Blackhole Lists Spammers tend to send spam from the same computers time after time. The spammer may control the IP address or may be abusing a server that's misconfigured as an open relay, meaning that it will relay from any computer to any other computer. In any event, this characteristic means that your mail server can reject mail based on its originating IP address. This technique generally uses one of dozens of blackhole lists, which are accessible from servers that use the DNS protocols in a unique way. If the blackhole list's DNS server returns a special name for an IP address, that address is on the blackhole list and the mail may well be a spam. Check http://www.declude.com/iunkmail/support/ip4r.htm for a very long list of blackhole lists and a brief description of the criteria each uses for listing IP addresses.

Mail Server Pattern Matches Some mail servers support pattern matching tools that can be used to spot suspicious data in mail headers or even in message texts. For instance, if your system regularly receives mail with a forged return address of [email protected], you can have the mail server refuse such mail, independently of the IP address of the sending system. Exim has very extensive pattern matching tools, and Postfix's tools are not far behind in this respect.

Post-Server Pattern Matches Many mail configurations use a program known as Procmail (http://www.procmail.org) to add filtering capabilities to e-mail after the fact. Procmail uses a pattern-matching file built on recipes, which are descriptions of patterns and actions to be taken in response to those patterns. You can use Procmail to filter message headers or bodies. You can also use Procmail to "glue together" other spam-fighting tools.

Spam Databases One unusual spam-fighting tool is Vipul's Razor (http://razor.sourceforge.net). This system uses an online database of known spam messages. The database includes a checksum for each message. Your computer computes a checksum for each message it receives and looks for that checksum in the database. If a match is found, the message is presumed to be spam and can be discarded or flagged. If no match is found, the message is either legitimate or is a very recent spam. (Vipul's Razor relies on rapid addition of spams to its database.)

Statistical Filters The latest spam-fighting craze is statistical (or Bayesian, after the statistical rule they most commonly employ) spam filtering. Statistical spam filtering took off after Paul Graham described an effective statistical filter in his "Plan for Spam" essay (http://www.paulgraham.com/spam.html). This field is rapidly changing and includes both stand-alone filters, such as Graham's original filter and Bogofilter (http://sourceforge.net/proiects/bogofilter/), and filters implemented in individual mail clients.

Each of these techniques has its strong and weak points. Blackhole lists and mail server pattern matches can block spam before the bulk of the spam is transmitted to your system, which can conserve bandwidth and CPU time. On the other hand, these techniques tend to be fairly indiscriminate; they produce a lot of false positives. (In-server pattern matches can be tuned, of course; a conservative set of rules may produce few false positives, but it probably won't catch much spam, either.) The remaining techniques require that spams be received in full before they can be analyzed, so they can't help to conserve your network bandwidth. Statistical filters and pattern matches can be tuned very well for individual users, but when applied to a large site, they're more likely to miss spam or produce false positives, because different individuals receive different legitimate and spam e-mail. Spam databases can be very effective, but they are also subject to "poisoning"—a widely distributed but legitimate e-mail (such as a legitimate mailing list's posting) can be added to the database in error or in malice, causing problems for many users.

For these reasons, many sites employ several antispam techniques. For instance, a conservative blackhole list or set of in-server pattern matches might block some of the most egregious spam, while Procmail filters or a statistical filter can be used to block more spam. These filters might be supplemented by filters customized for individual users. One very popular tool that attempts to combine many spam-fighting techniques, but that emphasizes pattern matches, is SpamAssassin (http://spamassassin.org).

Using Blackhole Lists

Blackhole list configuration varies greatly from one mail server to another, but all of the major mail servers support this method of fighting spam. In all cases, you need the address of a blackhole list's server to begin, so consult the list at http://www.declude.com/iunkmail/support/ip4r.htm, or some other list, to locate one.

If you're using sendmail, you must add a line to your m4 configuration file and rebuild the sendmail.cf file in order to use a blackhole list. The line in question looks something like this:

FEATURE(dnsbl, blackhole.list.address', 'Rejected - see I blackhole.list.website')

This line tells sendmail to use the blackhole list whose server is located at blackhole.list.address. It also includes a message in bounced mail to check http://blackhole.list.website for further information. This message can be important for users whose legitimate mail runs afoul of the spam filter. These users can read the website and send mail from another account or complain to their own mail administrators, who should be in a position to take corrective measures.

The Postfix blackhole list configuration involves two lines. The first sets the address or addresses of blackhole list servers and the second tells Postfix to use those servers:

maps_rbl_domains = blackhole.list.address smtpd_client_restrictions = reject_maps_rbl

Note The smtpd_client_restrictions option can take additional values, such as reject_unknown_client, to implement additional types of antispam measures. Consult the Postfix documentation for details.

Exim provides extensive support for blackhole lists. The most important exim.conf option is rbl_domains, in which you list the blackhole list server:

rbl_domains = blackhole.list.address

Ordinarily, an entry like this causes the server to reject mail from sites in the blackhole list. You can append the string /warn to the address, though, to cause Exim to add a warning header to the mail rather than reject it outright. Other filters, such as a Procmail filter, might use this header to flag a message for stricter spam-detection processing. Consult the Exim documentation for details on additional blackhole list options.

Using Procmail Filters

Most Linux mail servers either use Procmail by default or can be configured to do so by setting a configuration file option. If you follow the instructions outlined in the next few paragraphs and find that Procmail isn't working, you can try creating a .forward file in your home directory that contains the following line:

YpafMo/procmail"

Replace /path/to with the name of the directory in which the procmail binary resides. If even this doesn't work, you may need to consult the documentation for Procmail or for your mail server. Once Procmail is in the picture, the system reads the global /etc/procmailrc configuration file and the .procmailrcfile in users' home directories. These files contain Procmail recipes, which take the following form:

[conditions]

action

Warning The system-wide /etc/procmailrc file is usually read and processed aaroot. This fact means that a poorly designed recipe in that file could do serious damage. For instance, a typo could cause Procmail to overwrite an important system binary rather than use that binary to process a message. For this reason, you should keep system-wide Procmail processing to a minimum and instead focus on using -/.procmailrc to process mail using individuals' accounts.

Each recipe begins with the string :0. Various flags may follow, as summarized in Table 25.4. You can combine these flags to produce more complex effects. For instance, using flags of HB causes matching to be done on both the message headers and the body. The lockfile is the name of a file that Procmail uses to signal that it's working with a file. If Procmail sees a lockfile, it delays work on the affected file until the lockfile disappears. Ordinarily, a single colon (:) suffices for this function; Procmail then picks a lockfile name itself. You can specify a filename if you prefer, though.

Table 25.4: Common Procmail Recipe Flags

Flag

Meaning

H

Matching is done to the message headers. (This is the default.)

B

Matching is done to the message body.

D

Matching is done in a case-sensitive manner. (The default is a case-insensitive match.)

c

Matching is done on a "carbon copy" of the message. The "original" is passed on for matching against subsequent recipes. This flag is generally used within nesting blocks (described shortly).

w

Procmail waits for the action to complete. If it doesn't complete successfully, the message is matched against subsequent recipes.

W

The same as a flag ofw, but it suppresses program failure messages.

The conditions in a Procmail recipe are essentially ordinary regular expressions, but each conditions line begins with an asterisk. Most characters in a regular expression match against the same characters in the message, but there are exceptions. For instance, a caret (A) denotes the start of a line, a dot (.) matches any single character except for a new line, and the combination of a dot and an asterisk (.*) denotes a string of any length. A regular expression may include a string in parentheses, often with a vertical bar (|) within it. This condition denotes a match against the string on either side of the vertical bar. A backslash (\) effectively undoes special formatting in the following character; for instance, to match an asterisk, you would specify the string \*. An exclamation mark (!) reverses the sense of a match, so that a recipe matches any message that does not meet the specified criteria. Each recipe can have no, one, or more conditions. (Using no conditions is usually done within nesting blocks or for backing up messages when you experiment with new recipes.) If a recipe includes several conditions, all must match for the recipe to apply. The Procmail man page describes these regular expressions in more detail.

Finally, a Procmail recipe ends with a single line that tells it what to do—the action. An action line may be any of several things:

A Filename Reference Procmail stores the message in the named file in mbox format. To store messages in the maildir format, append a slash (/) to the end of the filename. For spam fighting, one effective but drastic measure is to store spam in /dev/null. This action effectively deletes the spam.

An External Program If the action line begins with a vertical bar (|), Procmail treats the line as a program to be executed. You can use this feature to pass processing on to another tool, such as a statistical spam filter.

An E-Mail Address An exclamation mark (!) at the start of a line denotes an e-mail address; Procmail sends the message to the specified address instead of delivering it locally.

A Nesting Block An action line that begins with an open curly brace ({) denotes a nested recipe. The nested recipe takes the same form as any other recipe, but it is used only if the surrounding recipe matches the message. The nested recipe ends with a close curly brace (}).

As an example, consider Listing 25.2, which demonstrates many of the features of Procmail recipes. I have found variants of these recipes to be effective at blocking many spams, but of course your experience may differ.

Listing 25.2: Sample Procmail Recipes # Don't apply recipes to postmaster

*!ATo:.*[email protected](pangaea\.edu|smtp\.pangaea\.edu) {

# Block mail with more than five spaces in the Subject: header,

# unless it's from the local fax subsystem :0

*!AFrom: [email protected]\.pangaea\.edu \(Fax Getty\) / dev/null

# Pass mail with bright red text through a custom spam blocking script :0 B

|/usr/local/bin/spam-block "mail with bright red text"

*!A(To|Cc):.*(pangaea\.edu|[email protected]\.com) ![email protected]

Note Listing 25.2 indents recipes within the nesting block. This practice improves readability, but isn't required.

Listing 25.2 includes four recipes. Three of them are embedded within the fourth:

• The surrounding recipe matches any To: header that does not include the string [email protected] or [email protected]. This recipe uses the open curly brace ({) character to cause the included recipes to be applied only if the mail is not addressed to postmaster. The intent is to protect the postmaster's mail from the antispam rules. After all, users might forward spam to the postmaster account to complain about it, and such complaints should not be ignored.

• A great deal of spam includes five or more consecutive spaces in the Subject: header. The first true spam rule discards such messages. This rule matches all messages with five or more spaces in the Subject: header except for messages that are from the fax subsystem on the fax.pangaea.edu computer, which presumably generates nonspam fax delivery reports that would otherwise match this criterion. This recipe discards the spam by sending it to /dev/null.

The second spam rule matches all messages that contain Hyptertext Markup

Language (HTML) text that sets the font color to ffOOOO—that is, bright red. A great deal of spam uses this technique to catch readers' eyes, but in my experience, no legitimate mail uses this technique. This recipe passes the spam through a special program, /usr/local/bin/spam-block, which is not a standard program. I use it in Listing 25.2 as an example of using Procmail to pass a message through an outside program. This example includes a message string that's passed to the outside program along with the spam.

• The final spam rule matches messages that do not contain the local domain name (pangaea.edu) or a special exception username ([email protected]) in the From: or Cc: headers. A rule like this one can be very effective at catching spam, but it can be dangerous when applied system-wide. The problem is that mailing lists, newsletters, and the like may not include the recipients' names in these headers, so the rule needs to be customized for all the mailing lists and other subscription e-mail a recipient receives. Doing this for a large site is a virtual impossibility. This rule will also discard most mail sent to a recipient using a mailer's blind carbon copy (BCC) feature, which causes the recipient name not to appear in any header. This rule uses the exclamation mark action to e-mail the suspected spam to [email protected]. In practice, you're unlikely to e-mail your spam to any site, though; this use in Listing 25.2 is meant more as a demonstration of what Procmail can do than as a practical suggestion of what you should do with spam.

Warning Some spam-fighting tools include provisions to send "bounce"

messages to the spam's sender. This practice is reasonably safe when applied in a mail server; the bounce message is generated while the sender is still connected, so the bounce message's recipient is likely to be the correct recipient. You should not attempt to bounce spam from a Procmail recipe or a mail reader, though. Doing so will usually send the bounce message to the wrong address—often a completely bogus address, but sometimes an innocent individual whose address was forged in the spam. Thus, bouncing spam once it's been accepted by your SMTP server will only add to the spam problem.

Overall, Procmail is an extremely useful and powerful tool for combating spam.

Because many pattern matches are best applied to individual accounts, I recommend using it in this way rather than as a system-wide filter. There are exceptions, though; some rules are general enough that you may want to apply them system-wide. Also, some tools build on Procmail in a way that's useful when they're applied system-wide

0 0

Post a comment