Ivenue Postmaster Pages

Spam Scoring

How exactly does Ivenue's SpamAssassin email filter decide what is spam and what is desired? It's a complicated process, one that involves multiple methods and techniques. Here is how it works:

The basic premise is that you have a rule, usually called a test, that looks for a specific pattern or phrase or sequence of characters. This rule can check the message headers or the message body or both. If the rule finds something bad, it can add points to the spam score of the message. In the same manner, if the rule finds something good, it can subtract points from the spam score of the message. Each rule has a point value that is determined by the likelihood of that rule match being spam or not. In general, instead of one or two rules with large point values, the goal is to have a lot of rules with little point values that would add up to a score high enough to detect spam emails, for example, 0.3 or 0.4 points per rule. However, there are some rules that are referred to as "baseball bat" rules because they match exact phrases and have high scores, for example 4.5 or 10.0 points. These are typically written to catch new spam batches as they appear. On our mail servers, we can begin using these rules in a matter of minutes from the time we detect the new spam run.

If you would like to see the actual score the email received, it's possible depending upon your mail client. All local mail clients such as Outlook, Outlook Express, Eudora, Thunderbird, etc, will let you view the headers of a message. For example, if you're using Outlook, you can view the headers by looking in the message "Properties". Look for a header named X-Spam-Status. Here are two examples:

X-Spam-Status: Yes, hits=13.2 required=4.5 tests=AWL,BAYES_50,EXTRA_CASH, HTML_IMAGE_ONLY_20,HTML_MESSAGE,HTML_SHORT_LINK_IMG_3,SPF_PASS,URIBL_BLACK autolearn=no version=3.1.8 X-Spam-Status: No, hits=-69.2 required=4.5 tests=AWL,BAYES_00, DNS_FROM_RFC_ABUSE,ENV_AND_HDR_SPF_MATCH,HTML_MESSAGE,HTML_TEXT_AFTER_BODY, RCVD_IN_BSP_TRUSTED,SENDER_IN_ADDRESSBOOK,SPF_PASS,UNSUB_URL, USER_IN_DEF_SPF_WL autolearn=ham version=3.1.8

You can see from the first line that the email was determined to be a spam because it scored 13.2 points, which is above the threshhold. The threshhold our system uses to label the email a "spam" is 4.5 points. The rules that matched had their score added (or subtracted if it was a negative number) to come up with that finale score. The second line is more interesting. It matched many tests that were good things and so it subtracted many points from the score, resulting in a very low number. This email had such a low score because of the rule SENDER_IN_ADDRESSBOOK (click for more info).

On the Ivenue mail system, there are over 1000 rules that are run on each email. Approximately 90% of all inbound email is flagged and classified as spam. Since the system receives somewhere around 15 million emails per month, a lot of work goes into detecting and rejecting those bogus emails.