Table of Contents
Effectively fighting spam with DSPAM
Have you ever tasted spam? I mean the real spam. The kind served in vaguely rectangular boxes and is the color of ham. It was served to me once on Christmas Eve, and I can assure you that it's really not good! But the spam we'll talk about here is different, more electronic, but just as disgusting.
Fighting spam is probably the most complex task of a postmaster. The techniques are numerous, and it is essential to multiply them to get a relevant result. DSPAM is a statistical engine that analyzes a text and produces a probability based on the content. The internal mechanisms of DSPAM are a bit tricky to understand, that is why we recommend you spend some time reading this documentation (and especially chapter 3: configuration) while preparing your setup.
We will discuss the implementation of DSPAM with a Postfix SMTP server. We will rely of the source code for the installation, but you should probably check your distribution's repository for existing packages.
- Overview
DSPAM was originally written by Jonathan Zdziarski, an American developer, in 2003, following his research on the classification of spam. DSPAM was subsequently sold in 2006 to Sensory Networks. Since the beginning DSPAM was released under GPLv2 or later and got changed to GPLv2 in 2009. Since 2009, DSPAM is maintained by a small group of developers.
DSPAM is mainly written in C and requires a backend to store the data. Drivers for MySQL, PostgreSQL and SQLite are available; it is also possible to rely on a Hash Driver that create data files on disk, for use without a relational database. This “Hash Driver” is the default option and used to be the fastest, but nowadays a PostgreSQL backend is preferred.
DSPAM produces statistical data for each user because this approach is proved to be more efficient than having a global ruleset for all users (see Technology below). Thus, each address of the domain (as <user> @ <domain> or normal flat names as <user>) has tokenized data in one of the supported storage engines (MySQL, PostgreSQL, SQLite or Hash Driver) and some additional data like logs, statistics for the Web-UI, quarantine, corpi, preferences, etc in DSPAM's data directory. It is possible to share information between multiple users in the form of groups. There are several types of groups, which we will detail later.
- Technology
Back in 2002, Paul Graham, another U.S. developer, published “A Plan for Spam”, an article that changed the way people analyze spam. In the early days antispam rules were largely based on criteria specific to spam such as “under capitalized” or “contains 48 exclamation points”. Paul Graham has worked on such rules, which worked pretty well, but the problem was that a low percentage of false positives result, on the order of 1 to 2%, was extremely difficult to filter.
His idea, which others had before him without the same influence, is to divide the content of an e-mail into tokens; a token being typically a word, a component of headers, an html tag, etc. … and calculate statistics on them using the Bayes algorithm. The results being very satisfactory, Graham's technique has become the norm and is at the heart of DSPAM.
Graham has also shown that, to be truly effective, statistics should be produced for each user individually. Using a global basis for all users is less effective, since some words commonly used by a user may be considered spam to another (those of you who work for pharmaceutical companies certainly understand the principle).
- Installation
DSPAM source code is available on http://dspam.sourceforge.net/, and the latest version available at this writing: dspam-3.9.1-RC1.
The archive contains the documentation, which is sorely lacking on the wiki. In fact, the quite detailed README files (and this document) forms the core of what one has to know about DSPAM.
Before starting the compilation, let us define what we want to do:
- DSPAM must interface with Postfix (as content-filter), and will therefore receive and reinsert the email via TCP sockets on localhost. DSPAM should run in daemon mode.
- Each user is identified via their email address in full
- Each user will have their own dictionary of tokens and associated statistics.
DSPAM does not really have external dependencies. A fresh install of Linux with a few tools installed (gcc, make, the backend libraries, …) is enough to build it. It is also important to run the daemon with limited privileges (eg. as user 'dspam'). <note> The following configuration options fit well the folder organization of a Debian system. You will have to adapt them to your own setup.</note>
$ su # useradd -r -s /bin/false -U -d /var/spool/dspam dspam # exit $ ./configure --enable-daemon --enable-split-configuration --enable-syslog --enable-clamav --enable-preferences-extension --enable-domain-scale --with-dspam-home=/var/spool/dspam --with-dspam-home-owner=dspam --with-dspam-home-group=dspam --with-dspam-owner=dspam --with-dspam-group=dspam --with-storage-driver=hash_drv --prefix=/usr/local/dspam --sysconfdir=/etc/dspam --mandir=/usr/share/man --bindir=/usr/bin --sbindir=/usr/sbin --libdir=/usr/lib --includedir=/usr/include $ make $ su # make install
The compilation options are detailed in './configure –help'. You can also enable debugging with the following options '–enable-debug –enable-bnr-debug –enable-verbose-debug', but beware of the amount of logs produced (in /var/spool/dspam/log). If you want to compile DSPAM with support for PostgreSQL as a storage backend (instead of the hash driver), you can use the following configuration parameters:
$ ./configure --enable-daemon --enable-split-configuration --enable-syslog --enable-clamav --enable-preferences-extension --enable-domain-scale --with-dspam-home=/var/spool/dspam --with-dspam-home-owner=dspam --with-dspam-home-group=dspam --with-dspam-owner=dspam --with-dspam-group=dspam --with-storage-driver=pgsql_drv --with-pgsql-includes=/usr/include/postgresql/ --with-pgsql-libraries=/usr/lib/ --enable-virtual-users --prefix=/usr/local/dspam --sysconfdir=/etc/dspam --mandir=/usr/share/man --bindir=/usr/bin --sbindir=/usr/sbin --libdir=/usr/lib --includedir=/usr/include --enable-debug --enable-bnr-debug --enable-verbose-debug
<note>To build DSPAM with the Postgresql backend, you need the psql libraries (packages libpq5 and libpq-dev in Debian Squeeze).</note> Off course it is possible to compile DSPAM with support for more than one storage backend. To do so you can use the following configuration parameters:
$ ./configure --enable-daemon --enable-split-configuration --enable-syslog --enable-clamav --enable-preferences-extension --enable-domain-scale --with-dspam-home=/var/spool/dspam --with-dspam-home-owner=dspam --with-dspam-home-group=dspam --with-dspam-owner=dspam --with-dspam-group=dspam --with-storage-driver=hash_drv,pgsql_drv --with-pgsql-includes=/usr/include/postgresql/ --with-pgsql-libraries=/usr/lib/ --enable-virtual-users --prefix=/usr/local/dspam --sysconfdir=/etc/dspam --mandir=/usr/share/man --bindir=/usr/bin --sbindir=/usr/sbin --libdir=/usr/lib --includedir=/usr/include --enable-debug --enable-bnr-debug --enable-verbose-debug
As you see, we have enabled the Hash driver and support for PostgreSQL. As soon as you use more than one storage backend, DSPAM will compile them in separate shared library files (libpgsql_drv.so for PostgreSQL and libhash_drv.so for the Hash driver) and allow you to choose inside dspam.conf which storage engine you would like to use.
- Configuring DSPAM
Before feeding DSPAM with the flow of emails from Postfix, we will configure and test it. The default dspam.conf configuration file comes with a large number of comments, but is not all that easy to interpret without a careful reading of the README and this documentation. dspam.conf is pre-filled with the parameters from the ./configure command. It contains the configuration options related to the chosen backend database, the home folder, and so on.
[…] # # DSPAM Home: Specifies the base directory to be used for DSPAM storage # Home /var/spool/dspam […] # #StorageDriver /usr/lib/dspam/libhash_drv.so StorageDriver /usr/lib/dspam/libpgsql_drv.so […]
If you have selected during configure just one storage driver then you don't need to specify in dspam.conf which one. DSPAM will automatically know what storage driver you configured and will use it.
More on the Storage backends in the storage_driver%C2%A0 section.
- Communication with the SMTP server
As said earlier, we want Postfix to communicate with DSPAM using TCP sockets. This setup requires two separates communications:
- submit to dspam [SMTP server to DSPAM]:
- DSPAM will listen on the chosen TCP port and wait for connections coming from the SMTP server
- response from DSPAM [DSPAM to SMTP server]:
- After analyzing the message, DSPAM sends it back to the SMTP server.
The submission socket will receive messages from Postfix. It listens on port TCP/10033 (arbitrary choice) and will speak LMTP (LMTP is a lightweight version of SMTP for intra-infrastructure mail transport).
ServerPort 10033 ServerQueueSize 32 ServerPID /var/run/dspam/dspam.pid ServerMode auto ServerParameters "--deliver=innocent, spam –d %u" ServerIdent "localhost.localdomain"
The directive ServerParameters tells DSPAM reinject innocent emails and spam, as opposed to keeping spam in quarantine. While testing your setup, it is better to forward suspected spam to the user's mailbox, and filter them using a mark on the Subject and/or on the headers, rather than quarantining them directly (note that it is possible to send a list of quarantined messages to your users daily).
DSPAM will then connect to Postfix and reinject the email after analysis. The following parameters connect back to Postfix on port TCP/10034 (postfix needs to be configured as well, we'll discuss that later).
DeliveryHost 127.0.0.1 DeliveryPort 10034 DeliveryIdent localhost DeliveryProto SMTP
Note that we speak SMTP here and not LMTP anymore.
- Mode of learning
DSPAM starts its operations with empty dictionaries. This means that during the first weeks, DSPAM will learn a lot and filter little (and progressively inverse that).
It also means that it is the responsibility of the users to mark emails as spam (or ham if DSPAM mistakenly marks a message as spam). It is left to the postmaster to provide to its users a simple way to mark emails.
Several learning methods exist and are described in the man page of DSPAM. The one that interests us here is called 'teft' and forces DSPAM to learn about each email it processes, innocent and spam.
This mode is particularly intensive because it goes through every email and creates or updates all the tokens created from the message in the user dictionary. It's perfect for a new user who needs to quickly build up a dictionary, but may consume too much CPU in a busy environment. To use teft mode, set the following directive in dspam.conf:
TrainingMode teft
To overcome the problem of performance, other modes of learning exist. Mode 'tum', for example, learns on all message as well, but only for limited period of time (called training) and will only update the dictionary upon user interaction afterward.
This parameter can be set for each user separately, as we will see in the preferences. The default mode is the one set in dspam.conf.
- Method of Detection
We are now at the core of DSPAM: the mode of detection. DSPAM is essentially a statistical analysis composed of 3 sub-parts:
- Content tokenizing
- Statistical algorithm
- Calculation of probability
- Content Tokenizing
This is the module that will break up content, making each piece into a token, and store the token's unique hash in the user dictionary. These tokens can be of several forms depending on the mode chosen, the most basic being to take the words one by one, every word is a new token.
But there are also more advanced modules, capable of taking into account different parts of each sentence. For those who like the Germanic prose, here's how a sentence will be cut by the different modules:
“Heute Abend war ich mit meiner Freundin im Kino und habe viel gelacht”
The character '+' means a combination of words, the character '#' denotes a word not taken into account.
WORD module: each word represents a token, it has 13 tokens.
TOKEN: ‘Heute’ CRC: 6716984897371635712 TOKEN: ‘Abend’ CRC: 6670531613365895168 TOKEN: ‘war’ CRC: 4772677679197454336 TOKEN: ‘ich’ CRC: 6329956816985784320 [...]
CHAIN module: the word is related to the word that follows, we therefore have one token less, or 12 tokens.
TOKEN: ‘Heute+Abend’ CRC: 9299536586222406967 TOKEN: ‘Abend+war’ CRC: 5205867775940263209 TOKEN: ‘war+ich’ CRC: 6329956649787979024 TOKEN: ‘ich+mit’ CRC: 5158416839735805488 [...]
Module OSB (Orthogonal Sparse bigram): for each word, it creates a sliding window of 5 words around the word. So we will associate the word with a neighbor over a radius of -4 / +4 positions around the word.
TOKEN: ‘Heute+#+#+#+mit’ CRC: 2006452661602586241 TOKEN: ‘Abend+#+#+mit’ CRC: 5482652074219693289 TOKEN: ‘war+#+mit’ CRC: 15707817493435847227 TOKEN: ‘ich+mit’ CRC: 5158416839735805488 TOKEN: ‘Abend+#+#+#+meiner’ CRC: 8544044731047037263 TOKEN: ‘war+#+#+meiner’ CRC: 14722667808637756004 [...]
SBPH module (Sparse Binary Polynomial Hashing): similar to OSB, but more flexible, because we will use a sliding window of 5 words, but also consider the intermediate words in the window, and not just ignore them (represented by a '# 'in OSB).
TOKEN: ‘mit’ CRC: 5158417007107899392 TOKEN: ‘ich+mit’ CRC: 5158416839735805488 TOKEN: ‘war+#+mit’ CRC: 15707817493435847227 TOKEN: ‘war+ich+mit’ CRC: 6905336139605378569 TOKEN: ‘Abend+#+#+mit’ CRC: 5482652074219693289 TOKEN: ‘Abend+#+ich+mit’ CRC: 2006454003823721484
Obviously, the dictionnary of SBPH is a lot larger than the one of OSB, which is in turn larger than CHAIN and WORD.
The advantage of tokenizers such as OSB and SBPH is that they can identify phrases that have never been seen before, by using combination ('+' character) and jumping (character'#').
For example, suppose the token “Buy Viagra + # +”. This token is only able to identify phrases such as:
Buy Viagra cheep Buy Viagra good Buy Viagra Herbal Buy Viagra exclusive Buy Viagra boosting Buy Viagra fresh Buy Viagra qualitative
In the same situation, the WORD tokenizer is able to identify individual words only , and not their combination. The CHAIN tokenizer wouldn’t do almost anything … unless you have all the combinations in the dictionary.
SBPH mechanism also uses a weight in association to the tokens. Thus, a token of 5 words will have a much greater weight than a token of only one word, according to the formula: weight = 2 ^ (2 * n), where n represents the number of neighboring words taken into account.
Still juggling with our Germanic prose, the weight table for the sentence “Heute Abend war ich mit” is as follows:
Token | Weight |
---|---|
Heute | 1 |
Heute+Abend | 4 |
Heute+#+war | 4 |
Heute+Abend+war | 16 |
Heute+#+#+ich | 4 |
Heute+Abend+#+ich | 16 |
Heute+#+war+ich | 16 |
Heute+Abend+war+ich | 64 |
Heute+#+#+#+mit | 4 |
Heute+Abend+#+#+mit | 16 |
Heute+#+war+#+mit | 16 |
Heute+Abend+war+#+mit | 64 |
Heute+#+#+ich+mit | 16 |
Heute+Abend+#+ich+mit | 64 |
Heute+#+war+ich+mit | 64 |
Heute+Abend+war+ich+mit | 256 |
The weight is then used to multiply the impact of the token when calculating probability.
For our configuration, we are going to use the OSB tokenizer. But this should not prevent you from experimenting with SBPH. However, unless you have very specific needs, CHAIN and WORD are probably too primitive for you.
The directive to be placed in 'dspam.conf is therefore:
Tokenizer osb
- Statistical Algorithm
With all these tokens, the challenge is to determine which ones influence the decision, and in what proportions. As we have said, DSPAM does not come with a pre-filled dictionary. It cannot tell immediately if the token '+#+#+Abendmit' is relevant to determining whether the message is spam or not. But it learns, and it will adjust the probabilities associated with each token, and apply it to new emails.
Beyond the mere calculation of probability, the statistical algorithm used to define criteria needs to be taken into account when calculating spam probability. DSPAM gives us the choice between several statistical algorithms that are:
- naive: Naive Bayesian-(All Tokens)
- graham Graham-Bayesian (“A Plan for Spam”)
- burton Burton-Bayesian (SpamProbe)
- Chi-square Fisher-Robinson's Chi-Square Algorithm
It is also possible to combine these algorithms. A combination graham+burton generally shows a good false positive / false negative. So that's what we will use, via the following directive:
Algorithm graham burton
But here is a little background to understand the mechanics behind this. The naive approach (of the algorithm of the same name) considers all the tokens composing a message. Each token is initialized with a statistical neutral value, or 0.5, which means neither spam or ham (a value closer to 1 equal spam). But the problem with this simple operation is that a spammer could include a long text containing common words (“and”, “hello”, …), and one or two sentences containing the spam message, and the algorithm will process all tokens at the same level, allowing the “support” text to reduce the likelihood of the final message being spam.
This approach was discussed by Paul Graham, him again, to demonstrate that a more optimal solution is possible. Therefore, the Graham algorithm uses the following criteria:
- Analyze the message and selects the 15 most relevant tokens. The tokens selected are those with the highest deviation from the neutral probability 0.5.
- Ignore tokens that have been seen less than 5 times in the past.
- Use tokens only once. If a token is present twice in the message, the second occurrence will not be taken into account in the calculation.
- When adding new tokens, define an initial probability of 0.4 instead of 0.5. This allows DSPAM to biased tokens toward innocence, until proven guilty.
Brian Burton uses a modified version of Graham. The number of tokens considered is increased from 15 to 27, and if a token is relevant several times, then it will be taken into account several times. This algorithm has been first incorporated into the spam software SpamProbe.
Still in the movement of the early 2000s, Gary Robinson published in a 2003 issue of the Linux Journal an improved version of the algorithm of Graham. His own version is at the heart of the SpamBayes project, another classification engine, and presents a more efficient process for tokens that appear infrequently. His approach is based on the statistical test Chi-Square, hence the name of the directive in DSPAM.
It's hard to say which of these algorithms is most appropriate. All are achieving excellent results; feel free to experiment with all of them.
- Calculation of probabilities
So we have tokens, whose initial value is 0.4 using the Graham algorithm, and calculation parameters.
The last step is to calculate the probability that a message is spam or not. DSPAM uses what is called the pValue, and provides three algorithms to perform this calculation.
These statistical algorithms are :
- markov: from the Russian mathematician Andrey Markov
- robinson: from Gary Robinson
- bcr: Bayesian Chain Rule, by Paul Graham
The standard algorithm, the one we will use in our example, is 'bcr' Bayesian Chain Rule, which is also the algorithm described by Paul Graham in his article “A Plan for Spam”. To use it, set the following parameter in dspam.conf:
Pvalue bcr
- Bayesian filtering
When talking about anti-spam technologies, the work of Thomas Bayes is consistently cited. Bayes' theorem is to calculate the probability of occurrence of an event using its recorded occurrence in the past. In other words, it's what we call “experience”.
In our case, the formula to calculate the probability that a message is spam or not is:
P = S / (S + H)
With:
- P: (P) robability of the message being spam
- S: product of the probabilities associated to each token composing the message: S=P(token-1) * P(token-2) * … *P (token-n)
- H: inverse probability of the tokens: H=(1 - P (token-1)) * (1-P (token-2)) * … * (1-P (token-n))
As we saw, when token is added to the dictionary, it takes a default value of 0.4. Whenever DSPAM learns about a message containing the same token, it changes the value. Therefore, if the word “Viagra” is present in 10 messages, and 9 are spam, the probability associated with this token will be: P (Viagra) = 9 / (9 +1) = 0.9
Consider the message “Hi! Buy Viagra.” We will apply a WORD tokenizer to this message (WORD is easier to handle for the example).
The first thing the tokenizer does is to remove the characters not taken into account, such as the exclamation point. The message is then “Hi Buy Viagra”
Each word is a token of its own, one can imagine that the dictionary user is in the following state:
Token | Nb de Spam (s) | Nb de Ham (h) | Probability p=s/(s+h) |
---|---|---|---|
Hi | 25 | 62 | 0.29 |
Buy | 157 | 87 | 0.64 |
Viagra | 231 | 11 | 0.95 |
We can calculate the final pvalue of the message with the Bayes formula:
- S = 0.29 * 0.64 * 0.95 = 0.176
- H = (1-0.29) * (1-0.64) * (1-0.95) = 0.71 * 0.36 * 0.05 = 0.0127
- Pvalue = S / (S + H) = 0,176 / (0,176 + 0.0127) = 0.93
So the final probability of the message being spam is 93%.
- Markov
However, in the particular case where the tokenizer is SBPH, it is possible to use the concept of weight of the tokens in the statistical calculation. That is what the 'markov' method does, but it only works if SBPH is enabled (only this tokenizer keeps weight associated to the tokens). Markov is an improved version of bcr that can tell if a token is very specific (eg, 5 of 5 words) and multiply its impact on the pValue (256 times greater than a token with a single word). In short, the weight value of the token is used to multiply the impact of the probability associated with the token in the overall calculation.
- Confidence
DSPAM exports a confidence value of the result produced. The confidence is calculated based on the likelihood that the message is spam or not.
When the message is innocent, the closer the value is to zero, the more confident DSPAM is in its result: confidence is high. Thus, if the message is innocent, confidence equals (1 - probability). (Example: probability = 0.0184, confidence = 1 - 0.0184 = 0.9816).
When the message is spam, the closer the value is to 1, the more confident DSPAM is in its outcome. Thus, if the message is spam, trust equals probability.
That's all for the mathematics. If you like this topic, do not hesitate to continue the discussion on the mailing list.
- Storage driver
- Using the Hash Driver
The default and most straightforward backend to configure is the Hash Driver. It maintains per-user dictionaries of tokens in the user's folder. To use this driver, set the following parameter at the beginning of dspam.conf.
StorageDriver /usr/lib/dspam/libhash_drv.so
<note>The examples in this document are based on the Hash Driver, but can easily be transferred to any other backend.</note> Tokens that DSPAM generates take up space, lots of space. When using the Hash driver, DSPAM can set the maximum size of the hash file that each user will use (its dictionary), and with a tokenizer such as OSB, you must ensure it will be large enough.
For example, a rather active account, receiving between 200 and 300 messages a day will generate roughly 2.5 million tokens in the space of two weeks. Obviously, this value will vary greatly depending on whether the messages contain the same tokens or not.
By setting the value of 'HashRecMax' to over 6 million entries, it gives some leeway to DSPAM, but we will however give it the possibility of increasing this value up to 16 million (in increments of 50000), just in case.
HashRecMax 6291469 HashAutoExtend on HashMaxExtents 10000000 HashExtentSize 49157
It also means that the file hash of a user will be initialized with a size close to 100MB! This can be a problem on a system managing a large number of users.
- Using the Postgresql Driver
While the examples in this documentation are mostly based on the Hash Driver, you will probably chose to use another type of backend. DSPAM works extremely well with a Postgresql backend, and this is the recommended setup.
To use the Postgresql driver, set the following parameter at the beginning of dspam.conf:
StorageDriver /usr/lib/dspam/libpgsql_drv.so
Let's take a closer look at the configuration procedure.
- Granting access to the database
Assuming Postgresql (v8.4 in this example) is installed, the first step is to create a database for dspam, and grant access to the user 'dspam'.
In the command line, create and empty database named dspam:
# su postgres postgres@server:/$ psql psql (8.4.7) Type "help" for help. postgres=# create role dspam login; CREATE ROLE postgres=# alter role dspam password '309dj20ejd903j'; ALTER ROLE postgres=# create database dspam owner dspam; CREATE DATABASE
Then edit /etc/postgresql/8.4/main/pg_hba.conf to grant access to user dspam:
ramiel:/home/julien/dspam# cd /etc/postgresql/8.4/main/ ramiel:/etc/postgresql/8.4/main# vim pg_hba.conf [...] # TYPE DATABASE USER CIDR-ADDRESS METHOD local dspam dspam password
You can then connect to postgres from user dspam (make sure user dspam as a login shell such as /bin/bash in /etc/passwd, otherwise 'su' won't work).
server:/# su dspam dspam@server:/$ psql -d dspam -U dspam -h localhost psql (8.4.7) Type "help" for help. dspam=> \du List of roles Role name | Attributes | Member of -----------+-------------+----------- dspam | | {} postgres | Superuser | {} : Create role : Create DB dspam=> \q dspam@server:/$
You can try to create a test table to check that dspam user has the appropriate permissions:
dspam=> create table test (test int); CREATE TABLE dspam=> \d List of relations Schema | Name | Type | Owner --------+------+-------+------- public | test | table | dspam (1 row) dspam=> drop table test; DROP TABLE
- Create the database schema
The database schemas are located in the source code of DSPAM, in the folder src/tools.pgsql_drv.
However, before imported the schemas, we are going to create a procedural language in the DSPAM database. This is done using the command below: <note>The createlang command is a shell command, you need to execute this on the command line of your server, not in the postgresql prompt.</note>
dspam@server:/$ createlang plpgsql dspam
Now go back to the Postgresql prompt and import the schemas pgsql_objects.sql and virtual_users.sql:
dspam=> \i /home/julien/dspam-3.9.1-RC1/src/tools.pgsql_drv/pgsql_objects.sql [... tables and sequences creation output ...] dspam=> \i /home/julien/dspam-3.9.1-RC1/src/tools.pgsql_drv/virtual_users.sql
<note>You might receive some warnings when the import scripts try to perform and 'analyze' and doesn't have the permissions to do so. You can safely ignore this.</note> The dspam database should then be in the following state (tables and indexes):
dspam=> \d List of relations Schema | Name | Type | Owner --------+------------------------+----------+------- public | dspam_preferences | table | dspam public | dspam_signature_data | table | dspam public | dspam_stats | table | dspam public | dspam_token_data | table | dspam public | dspam_virtual_uids | table | dspam public | dspam_virtual_uids_seq | sequence | dspam (6 rows) dspam=> \di List of relations Schema | Name | Type | Owner | Table --------+------------------------------+-------+-------+---------------------- public | dspam_preferences_uid_key | index | dspam | dspam_preferences public | dspam_signature_data_uid_key | index | dspam | dspam_signature_data public | dspam_stats_pkey | index | dspam | dspam_stats public | dspam_token_data_uid_key | index | dspam | dspam_token_data public | dspam_virtual_uids_pkey | index | dspam | dspam_virtual_uids public | id_virtual_uids_01 | index | dspam | dspam_virtual_uids public | id_virtual_uids_02 | index | dspam | dspam_virtual_uids (7 rows)
- Configure DSPAM to connect to Postgresql
The last step is simply to feed dspam.conf with the parameters to connect to the database. The configuration file comes with a Postgresql section where you can uncomment the configuration parameters and set the proper values:
# --- PostgreSQL --- # For PgSQLServer you can Use a TCP/IP address or a socket. If your socket is # in /var/run/postgresql/.s.PGSQL.5432 specify just the path where the socket # resits (without .s.PGSQL.5432). PgSQLServer 127.0.0.1 PgSQLPort 5432 PgSQLUser dspam PgSQLPass 309dj20ejd903j PgSQLDb dspam # If you're running DSPAM in client/server (daemon) mode, uncomment the # setting below to override the default connection cache size (the number # of connections the server pools between all clients). # PgSQLConnectionCache 3
Upon restart, DSPAM will create 3 connections to the Postgresql database.
dspam 19333 1 0 Apr01 ? 00:16:21 /usr/bin/dspam --daemon postgres 19334 9851 0 Apr01 ? 00:03:12 postgres: dspam dspam 127.0.0.1(57278) idle postgres 19337 9851 0 Apr01 ? 00:49:12 postgres: dspam dspam 127.0.0.1(57279) idle postgres 19341 9851 0 Apr01 ? 00:06:11 postgres: dspam dspam 127.0.0.1(57280) idle
- Whitelist
DSPAM has the opportunity to observe the sender of messages for a given recipient, and create a whitelist of senders that have sent more than 20 emails where none have been flagged as spam. This feature, quite handy, does not need any other configuration than:
Feature whitelist
- The preferences
Each user can parameter its own preferences via the web interface (we will install it later). However, it is possible to set default values for those preferences.
For example, the default configuration does not deliver spam to users, but place them in quarantine. To change this behavior, we modify the following parameters in dspam.conf:
Preference "spamAction=tag" # { quarantine | tag | deliver } -> default:quarantine Preference "spamSubject=[SPAM]" # { string } -> default:[SPAM] Preference "tagSpam=on" # { on | off } Preference "tagNonspam=off" # { on | off }
There are many of those preferences, You can decide to leave the possibility to the users to modify them by setting:
AllowOverride spamAction AllowOverride spamSubject AllowOverride tagSpam AllowOverride tagNonspam
It is also possible to remove the DSPAM signature from messages via this preference:
Preference “signatureLocation=message” # { message | headers } -> default:message
However, this signature is quite handy for re-training messages, as we shall see later. So it's recommended to leave it until you have a better solution to retrain spam.
- Ignore some headers
Since DSPAM will take the entire email into accounts when calculating probabilities, it might be interesting to ignore some specific headers. For example, another antispam's headers, a DKIM signature, a date or a user agent might not be very useful to determine whether or not an email is a spam.
The configuration example that follow include an extensive list of headers that can be safely ignored. Feel free to expand/reduce this list.
- dspam.conf
Your final configuration file should look like the listing below. Many options are configurable, but for a quick overview, this configuration is functional. Note that we are using the Hash Driver. If you want to use another backend, you need to edit this configuration.
Home /var/spool/dspam/ StorageDriver /usr/lib/dspam/libhash_drv.so TrustedDeliveryAgent "/usr/bin/procmail" DeliveryHost 127.0.0.1 DeliveryPort 10034 DeliveryIdent localhost DeliveryProto SMTP OnFail error Trust root Trust dspam TrainingMode teft TestConditionalTraining on Feature whitelist Feature tb=5 Algorithm graham burton Tokenizer osb Pvalue bcr WebStats on Preference "trainingMode=TEFT" Preference "spamAction=tag" Preference "spamSubject=[SPAM]" Preference "statisticalSedation=5" Preference "enableBNR=on" Preference "enableWhitelist=on" Preference "signatureLocation=message" Preference "tagSpam=on" Preference "tagNonspam=off" Preference "showFactors=on" Preference "optIn=off" Preference "optOut=off" Preference "whitelistThreshold=20" Preference "makeCorpus=off" Preference "storeFragments=off" Preference "localStore=" Preference "processorBias=on" Preference "fallbackDomain=off" Preference "trainPristine=off" Preference "optOutClamAV=off" Preference "ignoreRBLLookups=off" Preference "RBLInoculate=off" Preference "notifications=on" AllowOverride enableBNR AllowOverride enableWhitelist AllowOverride fallbackDomain AllowOverride ignoreGroups AllowOverride ignoreRBLLookups AllowOverride localStore AllowOverride makeCorpus AllowOverride optIn AllowOverride optOut AllowOverride optOutClamAV AllowOverride processorBias AllowOverride RBLInoculate AllowOverride showFactors AllowOverride signatureLocation AllowOverride spamAction AllowOverride spamSubject AllowOverride statisticalSedation AllowOverride storeFragments AllowOverride tagNonspam AllowOverride tagSpam AllowOverride trainPristine AllowOverride trainingMode AllowOverride whitelistThreshold AllowOverride dailyQuarantineSummary AllowOverride notifications HashRecMax 6291469 HashAutoExtend on HashMaxExtents 10000000 HashExtentSize 49157 HashPctIncrease 10 HashMaxSeek 10 HashConnectionCache 10 Notifications on IgnoreHeader Accept-Language IgnoreHeader Approved IgnoreHeader Archive IgnoreHeader Authentication-Results IgnoreHeader Cache-Post-Path IgnoreHeader Cancel-Key IgnoreHeader Cancel-Lock IgnoreHeader Complaints-To IgnoreHeader Content-Description IgnoreHeader Content-Disposition IgnoreHeader Content-ID IgnoreHeader Content-Language IgnoreHeader Content-Return IgnoreHeader Content-Transfer-Encoding IgnoreHeader Content-Type IgnoreHeader DKIM-Signature IgnoreHeader Date IgnoreHeader Disposition-Notification-To IgnoreHeader DomainKey-Signature IgnoreHeader Importance IgnoreHeader In-Reply-To IgnoreHeader Injection-Info IgnoreHeader Lines IgnoreHeader List-Archive IgnoreHeader List-Help IgnoreHeader List-Id IgnoreHeader List-Post IgnoreHeader List-Subscribe IgnoreHeader List-Unsubscribe IgnoreHeader Message-ID IgnoreHeader Message-Id IgnoreHeader NNTP-Posting-Date IgnoreHeader NNTP-Posting-Host IgnoreHeader Newsgroups IgnoreHeader OpenPGP IgnoreHeader Organization IgnoreHeader Originator IgnoreHeader PGP-ID IgnoreHeader Path IgnoreHeader Received IgnoreHeader Received-SPF IgnoreHeader References IgnoreHeader Reply-To IgnoreHeader Resent-Date IgnoreHeader Resent-From IgnoreHeader Resent-Message-ID IgnoreHeader Thread-Index IgnoreHeader Thread-Topic IgnoreHeader User-Agent IgnoreHeader X--MailScanner-SpamCheck IgnoreHeader X-AV-Scanned IgnoreHeader X-AVAS-Spam-Level IgnoreHeader X-AVAS-Spam-Score IgnoreHeader X-AVAS-Spam-Status IgnoreHeader X-AVAS-Spam-Symbols IgnoreHeader X-AVAS-Virus-Status IgnoreHeader X-AVK-Virus-Check IgnoreHeader X-Abuse IgnoreHeader X-Abuse-Contact IgnoreHeader X-Abuse-Info IgnoreHeader X-Abuse-Management IgnoreHeader X-Abuse-To IgnoreHeader X-Abuse-and-DMCA-Info IgnoreHeader X-Accept-Language IgnoreHeader X-Admission-MailScanner-SpamCheck IgnoreHeader X-Admission-MailScanner-SpamScore IgnoreHeader X-Amavis-Alert IgnoreHeader X-Amavis-Hold IgnoreHeader X-Amavis-Modified IgnoreHeader X-Amavis-OS-Fingerprint IgnoreHeader X-Amavis-PenPals IgnoreHeader X-Amavis-PolicyBank IgnoreHeader X-AntiVirus IgnoreHeader X-Antispam IgnoreHeader X-Antivirus IgnoreHeader X-Antivirus-Scanner IgnoreHeader X-Antivirus-Status IgnoreHeader X-Archive IgnoreHeader X-Assp-Spam-Prob IgnoreHeader X-Attention IgnoreHeader X-BTI-AntiSpam IgnoreHeader X-Barracuda IgnoreHeader X-Barracuda-Bayes IgnoreHeader X-Barracuda-Spam-Flag IgnoreHeader X-Barracuda-Spam-Report IgnoreHeader X-Barracuda-Spam-Score IgnoreHeader X-Barracuda-Spam-Status IgnoreHeader X-Barracuda-Virus-Scanned IgnoreHeader X-BeenThere IgnoreHeader X-Bogosity IgnoreHeader X-Brightmail-Tracker IgnoreHeader X-CRM114-CacheID IgnoreHeader X-CRM114-Status IgnoreHeader X-CRM114-Version IgnoreHeader X-CTASD-IP IgnoreHeader X-CTASD-RefID IgnoreHeader X-CTASD-Sender IgnoreHeader X-Cache IgnoreHeader X-ClamAntiVirus-Scanner IgnoreHeader X-Comment-To IgnoreHeader X-Comments IgnoreHeader X-Complaints IgnoreHeader X-Complaints-Info IgnoreHeader X-Complaints-To IgnoreHeader X-DKIM IgnoreHeader X-DMCA-Complaints-To IgnoreHeader X-DMCA-Notifications IgnoreHeader X-Despammed-Tracer IgnoreHeader X-ELTE-SpamCheck IgnoreHeader X-ELTE-SpamCheck-Details IgnoreHeader X-ELTE-SpamScore IgnoreHeader X-ELTE-SpamVersion IgnoreHeader X-ELTE-VirusStatus IgnoreHeader X-Enigmail-Supports IgnoreHeader X-Enigmail-Version IgnoreHeader X-Evolution-Source IgnoreHeader X-Extra-Info IgnoreHeader X-FSFE-MailScanner IgnoreHeader X-FSFE-MailScanner-From IgnoreHeader X-Face IgnoreHeader X-Fellowship-MailScanner IgnoreHeader X-Fellowship-MailScanner-From IgnoreHeader X-Forwarded IgnoreHeader X-GMX-Antispam IgnoreHeader X-GMX-Antivirus IgnoreHeader X-GPG-Fingerprint IgnoreHeader X-GPG-Key-ID IgnoreHeader X-GPS-DegDec IgnoreHeader X-GPS-MGRS IgnoreHeader X-GWSPAM IgnoreHeader X-Gateway IgnoreHeader X-Greylist IgnoreHeader X-HTMLM IgnoreHeader X-HTMLM-Info IgnoreHeader X-HTMLM-Score IgnoreHeader X-HTTP-Posting-Host IgnoreHeader X-HTTP-UserAgent IgnoreHeader X-HTTP-Via IgnoreHeader X-Headers-End IgnoreHeader X-ID IgnoreHeader X-IMAIL-SPAM-STATISTICS IgnoreHeader X-IMAIL-SPAM-URL-DBL IgnoreHeader X-IMAIL-SPAM-VALFROM IgnoreHeader X-IMAIL-SPAM-VALHELO IgnoreHeader X-IMAIL-SPAM-VALREVDNS IgnoreHeader X-Info IgnoreHeader X-IronPort-Anti-Spam-Filtered IgnoreHeader X-IronPort-Anti-Spam-Result IgnoreHeader X-KSV-Antispam IgnoreHeader X-Kaspersky-Antivirus IgnoreHeader X-MDAV-Processed IgnoreHeader X-MDRemoteIP IgnoreHeader X-MDaemon-Deliver-To IgnoreHeader X-MIE-MailScanner-SpamCheck IgnoreHeader X-MIMEOLE IgnoreHeader X-MIMETrack IgnoreHeader X-MMS-Spam-Filter-ID IgnoreHeader X-MS-Exchange-Forest-RulesExecuted IgnoreHeader X-MS-Exchange-Organization-Antispam-Report IgnoreHeader X-MS-Exchange-Organization-AuthAs IgnoreHeader X-MS-Exchange-Organization-AuthDomain IgnoreHeader X-MS-Exchange-Organization-AuthMechanism IgnoreHeader X-MS-Exchange-Organization-AuthSource IgnoreHeader X-MS-Exchange-Organization-Journal-Report IgnoreHeader X-MS-Exchange-Organization-Original-Scl IgnoreHeader X-MS-Exchange-Organization-Original-Sender IgnoreHeader X-MS-Exchange-Organization-OriginalArrivalTime IgnoreHeader X-MS-Exchange-Organization-OriginalSize IgnoreHeader X-MS-Exchange-Organization-PCL IgnoreHeader X-MS-Exchange-Organization-Quarantine IgnoreHeader X-MS-Exchange-Organization-SCL IgnoreHeader X-MS-Exchange-Organization-SenderIdResult IgnoreHeader X-MS-Has-Attach IgnoreHeader X-MS-TNEF-Correlator IgnoreHeader X-MSMail-Priority IgnoreHeader X-MailScanner IgnoreHeader X-MailScanner-Information IgnoreHeader X-MailScanner-SpamCheck IgnoreHeader X-Mailer IgnoreHeader X-Mailman-Version IgnoreHeader X-Mlf-Spam-Status IgnoreHeader X-NAI-Spam-Checker-Version IgnoreHeader X-NAI-Spam-Flag IgnoreHeader X-NAI-Spam-Level IgnoreHeader X-NAI-Spam-Report IgnoreHeader X-NAI-Spam-Route IgnoreHeader X-NAI-Spam-Rules IgnoreHeader X-NAI-Spam-Score IgnoreHeader X-NAI-Spam-Threshold IgnoreHeader X-NEWT-spamscore IgnoreHeader X-NNTP-Posting-Date IgnoreHeader X-NNTP-Posting-Host IgnoreHeader X-NetcoreISpam1-ECMScanner IgnoreHeader X-NetcoreISpam1-ECMScanner-From IgnoreHeader X-NetcoreISpam1-ECMScanner-Information IgnoreHeader X-NetcoreISpam1-ECMScanner-SpamCheck IgnoreHeader X-NetcoreISpam1-ECMScanner-SpamScore IgnoreHeader X-Newsreader IgnoreHeader X-Newsserver IgnoreHeader X-No-Archive IgnoreHeader X-No-Spam IgnoreHeader X-OSBF-Lua-Score IgnoreHeader X-OWM-SpamCheck IgnoreHeader X-OWM-VirusCheck IgnoreHeader X-Olypen-Virus IgnoreHeader X-Orig-Path IgnoreHeader X-OriginalArrivalTime IgnoreHeader X-Originating-IP IgnoreHeader X-PAA-AntiVirus IgnoreHeader X-PAA-AntiVirus-Message IgnoreHeader X-PGP-Fingerprint IgnoreHeader X-PGP-Hash IgnoreHeader X-PGP-ID IgnoreHeader X-PGP-Key IgnoreHeader X-PGP-Key-Fingerprint IgnoreHeader X-PGP-KeyID IgnoreHeader X-PGP-Sig IgnoreHeader X-PIRONET-NDH-MailScanner-SpamCheck IgnoreHeader X-PIRONET-NDH-MailScanner-SpamScore IgnoreHeader X-PMX IgnoreHeader X-PMX-Version IgnoreHeader X-PN-SPAMFiltered IgnoreHeader X-Posting-Agent IgnoreHeader X-Posting-ID IgnoreHeader X-Posting-IP IgnoreHeader X-Priority IgnoreHeader X-Proofpoint-Spam-Details IgnoreHeader X-Qmail-Scanner-1.25st IgnoreHeader X-Quarantine-ID IgnoreHeader X-RAV-AntiVirus IgnoreHeader X-RITmySpam IgnoreHeader X-RITmySpam-IP IgnoreHeader X-RITmySpam-Spam IgnoreHeader X-Rc-Spam IgnoreHeader X-Rc-Virus IgnoreHeader X-Received-Date IgnoreHeader X-RedHat-Spam-Score IgnoreHeader X-RedHat-Spam-Warning IgnoreHeader X-RegEx IgnoreHeader X-RegEx-Score IgnoreHeader X-Rocket-Spam IgnoreHeader X-SA-GROUP IgnoreHeader X-SA-RECEIPTSTATUS IgnoreHeader X-STA-NotSpam IgnoreHeader X-STA-Spam IgnoreHeader X-Scam-grey IgnoreHeader X-Scanned-By IgnoreHeader X-Sender IgnoreHeader X-SenderID IgnoreHeader X-Sohu-Antivirus IgnoreHeader X-Spam IgnoreHeader X-Spam-ASN IgnoreHeader X-Spam-Check IgnoreHeader X-Spam-Checked-By IgnoreHeader X-Spam-Checker IgnoreHeader X-Spam-Checker-Version IgnoreHeader X-Spam-Clean IgnoreHeader X-Spam-DCC IgnoreHeader X-Spam-Details IgnoreHeader X-Spam-Filter IgnoreHeader X-Spam-Filtered IgnoreHeader X-Spam-Flag IgnoreHeader X-Spam-Level IgnoreHeader X-Spam-OrigSender IgnoreHeader X-Spam-Pct IgnoreHeader X-Spam-Prev-Subject IgnoreHeader X-Spam-Processed IgnoreHeader X-Spam-Pyzor IgnoreHeader X-Spam-Rating IgnoreHeader X-Spam-Report IgnoreHeader X-Spam-Scanned IgnoreHeader X-Spam-Score IgnoreHeader X-Spam-Status IgnoreHeader X-Spam-Tagged IgnoreHeader X-Spam-Tests IgnoreHeader X-Spam-Tests-Failed IgnoreHeader X-Spam-Virus IgnoreHeader X-Spam-Warning IgnoreHeader X-Spam-detection-level IgnoreHeader X-SpamAssassin-Clean IgnoreHeader X-SpamAssassin-Warning IgnoreHeader X-SpamBouncer IgnoreHeader X-SpamCatcher-Score IgnoreHeader X-SpamCop-Checked IgnoreHeader X-SpamCop-Disposition IgnoreHeader X-SpamCop-Whitelisted IgnoreHeader X-SpamDetected IgnoreHeader X-SpamInfo IgnoreHeader X-SpamPal IgnoreHeader X-SpamPal-Timeout IgnoreHeader X-SpamReason IgnoreHeader X-SpamScore IgnoreHeader X-SpamTest-Categories IgnoreHeader X-SpamTest-Info IgnoreHeader X-SpamTest-Method IgnoreHeader X-SpamTest-Status IgnoreHeader X-SpamTest-Version IgnoreHeader X-Spamadvice IgnoreHeader X-Spamarrest-noauth IgnoreHeader X-Spamarrest-speedcode IgnoreHeader X-Spambayes-Classification IgnoreHeader X-Spamcount IgnoreHeader X-Spamsensitivity IgnoreHeader X-TERRACE-SPAMMARK IgnoreHeader X-TERRACE-SPAMRATE IgnoreHeader X-TM-AS-Category-Info IgnoreHeader X-TM-AS-MatchedID IgnoreHeader X-TM-AS-Product-Ver IgnoreHeader X-TM-AS-Result IgnoreHeader X-TMWD-Spam-Summary IgnoreHeader X-TNEFEvaluated IgnoreHeader X-Text-Classification IgnoreHeader X-Text-Classification-Data IgnoreHeader X-Trace IgnoreHeader X-UCD-Spam-Score IgnoreHeader X-User-Agent IgnoreHeader X-User-ID IgnoreHeader X-User-System IgnoreHeader X-Virus-Check IgnoreHeader X-Virus-Checked IgnoreHeader X-Virus-Checker-Version IgnoreHeader X-Virus-Scan IgnoreHeader X-Virus-Scanned IgnoreHeader X-Virus-Scanner IgnoreHeader X-Virus-Scanner-Result IgnoreHeader X-Virus-Status IgnoreHeader X-VirusChecked IgnoreHeader X-Virusscan IgnoreHeader X-WSS-ID IgnoreHeader X-WinProxy-AntiVirus IgnoreHeader X-WinProxy-AntiVirus-Message IgnoreHeader X-Yandex-Forward IgnoreHeader X-Yandex-Front IgnoreHeader X-Yandex-Spam IgnoreHeader X-Yandex-TimeMark IgnoreHeader X-cid IgnoreHeader X-iHateSpam-Checked IgnoreHeader X-iHateSpam-Quarantined IgnoreHeader X-policyd-weight IgnoreHeader X-purgate IgnoreHeader X-purgate-Ad IgnoreHeader X-purgate-ID IgnoreHeader X-sgxh1 IgnoreHeader X-to-viruscore IgnoreHeader Xref IgnoreHeader acceptlanguage IgnoreHeader thread-index IgnoreHeader x-uscspam PurgeSignatures 14 PurgeNeutral 90 PurgeUnused 90 PurgeHapaxes 30 PurgeHits1S 15 PurgeHits1I 15 LocalMX 127.0.0.1 SystemLog on UserLog on Opt out ServerHost 127.0.0.1 ServerPort 10033 ServerQueueSize 32 ServerPID /var/run/dspam.pid ServerMode auto ServerParameters "--deliver=innocent,spam -d %u" ServerIdent "localhost.localdomain" ProcessorURLContext on ProcessorBias on StripRcptDomain off
- A quick test that will not work
To start the daemon as user 'dspam', the Debian standard method is to use start-stop-daemon, as follows:
# start-stop-daemon --start --chuid dspam --exec /usr/bin/dspam -- --daemon
<note> DSPAM automatically creates its pid in /var/run. Make sure the user dspam can write in this directory. </note> We get a process started and a listening port:
UID PID PPID C STIME TTY TIME CMD dspam 27473 1 0 03:26 pts/0 00:00:00 /usr/bin/dspam --daemon Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name tcp 0 0 127.0.0.1:10033 0.0.0.0:* LISTEN 999 18244 27473/dspam
The daemon responds on this port, therefore, we can see what happens when trying to send an email:
$ nc localhost 10033 220 DSPAM LMTP 3.9.1 Ready lhlo mail 250-localhost.localdomain 250-PIPELINING 250-ENHANCEDSTATUSCODES 250-8BITMIME 250 SIZE mail from:<jp.troll@gmail.com> 250 2.1.0 OK rcpt to:<jean-kevin@debian.lab> 250 2.1.5 OK data 354 Enter mail, end with « . » on a line by itself From: Jean-Pierre Troll <jp.troll@gmail.com> To: Jean-Kevin De La Motte <jean-kevin@debian.lab> Subject: This is Not a Spam might be a troll, but a spam... no! . 421 4.3.0 <jean-kevin@debian.lab> Unable to connect to server quit 221 2.0.0 OK
DSPAM accepts our message but seems to have trouble sending it back to the SMTP server, which is quite normal because we have not configured Postfix yet. However, let's take a look at the home directory of DSPAM. It has created a tree for the user in /var/spool/dspam/data/debian.lab/jean-kevin/:
# tree -s . +-- [ 23] data ¦ +-- [ 23] debian.lab ¦ +-- [ 114] jean-kevin ¦ +-- [ 100663544] jean-kevin.css ¦ +-- [ 0] jean-kevin.lock ¦ +-- [ 85] jean-kevin.log ¦ +-- [ 40] jean-kevin.sig ¦ ¦ +-- [ 384] 4c873bcd274731106759975.sig ¦ +-- [ 12] jean-kevin.stats +-- [ 6] log +-- [ 115] system.log
Look more closely at these files, you have a file 'jean-kevin.css, which size, 100MB, was specified as the hash file size in dspam.conf.
Then, the file 'jean-kevin.log' contains a log of processed messages. There, we find traces of our message:
# cat jean-kevin.log 1283931466 I Jean-Pierre Troll <jp.troll@gmail.com> 4c873d4a274731062016872 This Is Not A Spam Delivered
Each row has six columns: a unix timestamp, an inspection status (I for inspected, W for whitelisted …), the sender's name and email, an email identifier (DSPAM signature), the message subject and finally the DSPAM status. In this example, the message is marked 'Delivered' because, despite the incapacity of DSPAM to connect to Postfix, the message is considered valid.
When jean-kevin wants to re-train a message as spam or ham, DSPAM will take the signature, look for a file with this name in 'jean-kevin.sig', and update 'jean-kevin.css' with the tokens contained within the file.
This DSPAM configuration is functional, we now configure the communication with Postfix.
- Configure Postfix to connect with DSPAM
Postfix has a generic method for communicating with software such as DSPAM. That is to treat it as a Content-Filter. Postfix can very easily forward a received message to a content-filter configured in the master.cf file.
On a blank configuration of Postfix, you can add the content-filter directly into the principal smtp service (the one that listens on port TCP/25). For this, we must modify /etc/postfix/master.cf like this:
# Postfix master process configuration file. For details on the format # of the file, see the master(5) manual page (command: « man 5 master »). # # =============================================================== # service type private unpriv chroot wakeup maxproc command + args # (yes) (yes) (yes) (never) (100) # =============================================================== smtp inet n - - - - smtpd -o content_filter=lmtp:127.0.0.1:10033
This suffices to have Postfix send incoming emails to DSPAM. However, to configure the way back, we have to open a new service in master.cf that listens on port TCP/10034. This time add the new lines at the end of master.cf.
127.0.0.1:10034 inet n - n - - smtpd -o content_filter= -o receive_override_options=no_unknown_recipient_checks,no_header_body_checks -o smtpd_helo_restrictions= -o smtpd_client_restrictions= -o smtpd_sender_restrictions= -o smtpd_recipient_restrictions=permit_mynetworks,reject -o mynetworks=127.0.0.0/8 -o smtpd_authorized_xforward_hosts=127.0.0.0/8
Reload postfix with 'postfix reload'. Receiving emails should now work. Repeat the previous test with netcat on localhost, and you should receive the message. To debug, check the following files (on Debian):
- /var/log/mail.info contains all logs related to the processing of emails
- /var/spool/dspam/system.log contains the overall activity of DSPAM (one line per message processed)
- if you compiled with the debug mode, then set 'Debug *' in dspam.conf and you will get detailed logs in /var/spool/dspam/log/
- and, in the worst case scenario, use 'tcpdump -s 16436 -SvnXi lo tcp and port 10033' (or 10034) to listen to communication between Postfix and DSPAM
After the mail is passed from Postfix to DSPAM and back to Postfix, it should be received by the recipient as follows:
From jp.troll@gmail.com Wed Sep 8 04:02:27 2010 Return-Path: <jp.troll@gmail.com> X-Original-To: jean-kevin@debian.lab Delivered-To: jean-kevin@debian.lab From: Jean-Pierre Troll <jp.troll@gmail.com> To: Jean-Kevin De La Motte <jean-kevin@debian.lab> Subject: This is Not a Spam Date: Wed, 8 Sep 2010 03:56:49 -0400 (EDT) X-DSPAM-Result: Innocent X-DSPAM-Processed: Wed Sep 8 04:02:27 2010 X-DSPAM-Confidence: 0.9899 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c874313289291828119542 might be a troll, but a spam... no! !DSPAM:4c874313289291828119542!
The message is 'innocent', as described in 'X-DSPAM-Result'.
'X-DSPAM-Probability' tells us the probability that the message is spam (the closer the value is to 1, the higher the probability of the message being spam).
Finally, 'X-DSPAM-Confidence' indicates the confidence level of the filter.
If you want more details on the tests performed and the tokens included, enable the preference 'showFactors = on'. It's wordy, but instructive. Each token is then listed with the associated statistical value.
X-DSPAM-Factors: 27, To*La+#+#+kevin, 0.01000, Subject*This+#+#+a, 0.01000, To*La+#+<jean, 0.01000, To*Kevin+#+La, 0.01000, To*Motte+<jean, 0.01000 [...]
The message body also contains the signature as ”!DSPAM: <signature>!”. As mentioned previously, it is preferable to retain the signature in the body of the message because, in this way, it is not deleted when forwarding for training. The other option would be to place the signature in the headers only, but these are usually removed by user agents when a message is forwarded.
- Managing false positives and false-negative
Obviously, you shouldn't expect DSPAM to get everything perfect right away. It must be fed and learn.
First, it is possible to feed DSPAM via the command line using the signature of message. We can report our previous email as spam via the command:
# dspam --source=error --class=spam --user jean-kevin@debian.lab --signature=’4c874313289291828119542'
In the logs of the user, we will see that the message was 'retrained' based on the specified class: spam or innocent.
# tail -n 1 jean-kevin.log 1283934571 M <Not Specified> 4c874313289291828119542 <Not Specified> Retrained
This is certainly not the best solution when you have 15,000 users. It is possible to do better by forwarding spam to {spam|notspam}-<user>@<domain> (eg. spam-jean-kevin@debian.lab), or through the web interface. Both leave control in the user's hands.
- Learning in forward mode
Training in forward mode works as follows: when DSPAM inspects a message, it sets a signature in the message body. A user can then forward the same message to DSPAM indicating that it made the wrong decision.
For this to work, DSPAM needs two things; the message signature and the identity of the user.
The signing allows DSPAM to find the message in its history and record the change of state. Without this signature, DSPAM is not able to identify the message in its history.(Note: the history is preserved 14 days by default. This is set with 'PurgeSignatures'. More on that later).
The identity of the user can be automatically deduced by DSPAM. It will use the added prefix and user email from {spam|notspam}-<email address>. Our Users 'jean-kevin@debian.lab' will have two aliases 'spam-jean-kevin@debian.lab' and 'notspam-jean-kevin@debian.lab' which will be dedicated to re-training.
DSPAM has a feature to re-train when an email is automatically issued to those aliases. In fact, for each incoming message, it will look at the 'To:' header of the body of the message, and if the spam contains {spam|notspam} it will analyze the content and trigger a 'retrain'. The configuration of this function is quite basic, it goes through the following three directives in 'dspam.conf':
ParseToHeaders on ChangeModeOnParse on ChangeUserOnParse full
The directive 'ParseToHeaders' informs DSPAM to cut the 'To:' header of the email received to determine if the message contains the keywords {spam|notspam}. This 'To:' header is part of the message body, do not confuse it with the SMTP command “rcpt to”.
With parsing enabled, DSPAM can change the mode of learning according to the first part of the 'To:' field. This is controlled by 'ChangeModeOnParse', which will enable the class 'spam' if the address is 'spam-*' and class 'innocent' if the address is 'notspam-*'.
Finally, 'ChangeUserOnParse' tells DSPAM that the remaining portion of the email address contains the ID of the DSPAM user. Setting it to Full, tells DSPAM to take the user and domain as an identifier, for example 'jean-kevin@debian.lab'.
We must now tell Postfix that users 'spam-jean-kevin@debian.lab' and 'notspam-jean-kevin@debian.lab' exist. In a production environment, you'll certainly have a SQL database or LDAP directory to manage aliases, but in our case, we will simply create two entries in /etc/aliases. This will be sufficient for testing.
# vim /etc/aliases [...] spam-jean-kevin: jean-kevin notspam-jean-kevin: jean-kevin # postalias /etc/aliases
We can now reconnect to Postfix via netcat and inject the same email as above, but now address it to the spam alias. The headers can be ignored, the important sections are the To: Header and the DSPAM signature at the end of the message body.
$ nc localhost 25 220 debian.lab ESMTP Postfix (Debian/GNU) ehlo mail 250-debian.lab 250-PIPELINING 250-SIZE 10240000 250-VRFY 250-ETRN 250-STARTTLS 250-ENHANCEDSTATUSCODES 250-8BITMIME 250 DSN mail from:<jean-kevin@debian.lab> 250 2.1.0 Ok rcpt to:<spam-jean-kevin@debian.lab> 250 2.1.5 Ok data 354 End data with <CR><LF>.<CR><LF> From: Jean-Kevin De La Motte <jean-kevin@debian.lab> To: <spam-jean-kevin@debian.lab> Subject: This is Not a Spam might be a troll, but a spam... no! !DSPAM:4c874313289291828119542 250 2.0.0 Ok: queued as 42509114E28 quit 221 2.0.0 Bye
Now looking at the DSPAM logs for jean-kevin, we see that the message was 'retrained'.
1283936972 M Jean-Kevin De La Motte <jean-kevin@debian.lab> 4c874313289291828119542 This is Not a Spam Retrained <20100908090905.42509114E28@debian.lab>
DSPAM will then forward the message back to Postfix, where it will be delivered back to the user (the prefix is deleted). Text is, however, added at the end of the message informing the user that the message has been a re-trained.
These information messages need to be created (they are not ship with DSPAM). One for spam and one for the ham. This can be done as follows:
# echo 'Scanned and tagged as SPAM by DSPAM on Debian.Lab' > /var/spool/dspam/txt/msgtag.spam # echo 'Scanned and tagged as HAM by DSPAM on Debian.Lab' > /var/spool/dspam/txt/msgtag.nonspam
- Training from the web interface
Using the web interface is necessary if the messages detected as spam are not sent to users but quarantined (Preferences “spamAction = quarantine”). Users must regularly check the interface to verify that no false positive is found in quarantine. Users can also use the interface to mark emails as spam or ham.
DSPAM sources provide a directory named 'webui'. This is a set of CGI scripts to control DSPAM through a web interface. No surprise, it's written in Perl. To run it, you have to configure {apache,lighttpd, nginx, …} to run perl CGI scripts.
<note>documentation already exists for apache and lighttpd, we chose to describe the configuration for Nginx.</note>
In fact, it's more complicated than that, because the CGI should be able to determine the identity of the user who connects. So, Nginx, in our case, will have to authenticate the user and forward their identity to DSPAM.
Nginx does not know how to run external scripts. The only thing it can do is send queries to a FastCGI socket. So we will need another program, which will stand between our Nginx and CGI scripts to execute them, this program is called 'fcgiwrap'.
We will also need some Perl packages required by DSPAM CGI (for parsing the HTML, display graphs with GD, etc. …).
Install the following packages:
# aptitude install nginx fcgiwrap libcgi-pm-perl libhtml-parser-perl libgd-graph-perl libgd-graph3d-perl
The DSPAM interface needs permissions to access '/var/spool/dspam' for both reading and writing, since it will change preferences and state of the dictionaries. Since fcgiwrap will be the process executing the Perl scripts, we will launch it as user/group 'dspam'.
We will also give world write access to the fcgiwrap socket so nginx can write to it.
<note>This is a test configuration, as the proverb says “Do not do this at home.”</note>
# vim /etc/init.d/fcgiwrap [..] FCGI_USER= »dspam » FCGI_GROUP= »dspam » [...] # /etc/init.d/fcgiwrap restart # chmod o+w /var/run/fcgiwrap.socket
Nginx configuration is then easy, it just forwards requests to CGI fcgiwrap. It must also authenticate users so that DSPAM can determine the identity of the visitor. This identity is stored in the variable REMOTE_USER, set by nginx and provided to fcgiwrap.
# vim /etc/nginx/sites-available/default [...] location /dspam/cgi-bin { auth_basic « DSPAM »; auth_basic_user_file /var/www/dspam/passwords; include /etc/nginx/fastcgi_params; index dspam.cgi; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param REMOTE_USER $remote_user; if ($uri ~ « \.cgi$ »){ fastcgi_pass unix:/var/run/fcgiwrap.socket; } } # /etc/init.d/nginx restart
You must then create a file '/var/www/dspam/passwords', via the tool htpasswd. This file should contain one line per user, the username is the user's complete email address.
# htpasswd -c /var/www/dspam/passwords jean-kevin@debian.lab New password: Re-type new password: Adding password for user jean-kevin@debian.lab # cat /var/www/dspam/passwords jean-kevin@debian.lab:H2CigqsDz1U4E # chown dspam:www-data /var/www/dspam/passwords # chmod o-rwx /var/www/dspam/password
The infrastructure is ready, copy the files from DSPAM sources 'webui' directory directly into the 'document root' of nginx.
# cp -r ~/dspam-3.9.1-RC1/webui/* /var/www/dspam/ # chown dspam:www-data /var/www/dspam -R
At this stage, we still have some configuration to do. The script '/var/www/dspam/cgi-bin/configure.pl' contains the configuration for the web interface to identify the directories of DSPAM. So check the values of $CONFIG{’DSPAM_HOME’}, $CONFIG{’DSPAM_BIN’}, etc, so that they corresponds to our environment.
$CONFIG{’DSPAM_HOME’} = “/var/spool/dspam”; $CONFIG{’DSPAM_BIN’} = “/usr/bin”; [...] $CONFIG{’WEB_ROOT’} = “/dspam/htdocs/”; [...] $CONFIG{’LOCAL_DOMAIN’} = “debian.lab”;
With all this, we should be able to open the page http://myserver/dspam/cgi-bin/. Log in with user jean-kevin@debian.lab, and access the DSPAM interface. It allows, among other things, re-training of messages already processed from the tab 'History'.You can also change the preferences, etc.
The interface provides an administration section. To have access to it, you need to declare an admin in the file ‘/var/www/dspam/cgi-bin/admins’.
# echo ‘jean-kevin@debian.lab’ >> /var/www/dspam/cgi-bin/admin
We can then access the URL http://myserver/dspam/cgi-bin/admin.cgi and admire the beautiful graphics work, or change the default options.
Below are a few screenshots from the web interface:
The home page of the interface displays performances statistics for the current user.
The message history lists all inspected messages. It can be used to retrain a message.
- Users management
When using virtual_ids, which is the most common method to manage last groups of users, the users and stored in the database. With Postgresql, you can browse the “dspam” database (as created previously) using standard sql commands:
postgres@server:/$ psql psql (8.4.8) Saisissez « help » pour l'aide. postgres=# \c dspam psql (8.4.8) Vous êtes maintenant connecté à la base de données « dspam ». dspam=# \d Liste des relations Schéma | Nom | Type | Propriétaire --------+------------------------+----------+-------------- public | dspam_preferences | table | dspam public | dspam_signature_data | table | dspam public | dspam_stats | table | dspam public | dspam_token_data | table | dspam public | dspam_virtual_uids | table | dspam public | dspam_virtual_uids_seq | séquence | dspam (6 lignes) dspam=# \d dspam_virtual_uids Table « public.dspam_virtual_uids » Colonne | Type | Modificateurs ----------+------------------------+------------------------------------------------------------------ uid | integer | non NULL Par défaut, nextval('dspam_virtual_uids_seq'::regclass) username | character varying(128) | Index : "dspam_virtual_uids_pkey" PRIMARY KEY, btree (uid) "id_virtual_uids_01" UNIQUE, btree (username) "id_virtual_uids_02" UNIQUE, btree (uid)
As you can see in the description of the database above, there is a table called dspam_virtual_uids that will contain a simple mapping of the username with a generated id.
Here is how you can obtain the UID of a specific user.
dspam=# select * from dspam_virtual_uids where username = 'jean-kevin@debian.lab'; uid | username -----+----------------------- 1 | jean-kevin@debian.lab (1 ligne)
- Deleting a user
If, for any reason, you would like to remove a user from the database, you need to obtain its UID and then remove all rows from the others tables referencing this UID.
If you look at the description of the table dspam_token_data, for example, you will see that each token is attached to the UID of the user it belong too, thus making it extremely easy to identify and delete.
dspam=# \d dspam_preferences Table « public.dspam_preferences » Colonne | Type | Modificateurs ------------+------------------------+--------------- uid | integer | preference | character varying(128) | value | character varying(128) | Index : "dspam_preferences_uid_key" UNIQUE, btree (uid, preference) dspam=# delete from dspam_token_data where uid in (select uid from dspam_virtual_uids where username = 'jean-kevin@debian.lab'; DELETE 2187
Repeat this step for all tables and delete the user from dspam_virtual_uids at the end.
Also, make sure to delete the user's data folder from /var/spool/dspam/data/<domain>/<user> if you really want to remove all traces of the user.
- Group management and inoculation
While DSPAM analysis focuses on the user, it also enables groups of users to share data. By properly defining these groups, for example through their activity, we can expect the content of messages to be similar and therefore the tokens statistics to be similar. Sharing this information helps to accelerate the DSPAM training.
We saw that each user has its dictionary of tokens in the file '<user>.css' (if using the Hash driver). This dictionary contains the tokens and associated statistics, produced by the user.
DSPAM can share these tokens and statistics in different ways (or types of groups).
- Shared: group members share the same dictionary, but each member retains his own quarantine directory. Problem: If a user's behavior is different from the rest of the group, it will disrupt the whole group, starting with himself.
- Shared,Managed: same as shared group, but with a single quarantine mailbox.
- Classification: share the individual dictionaries. If the user's dictionary does not allow a user to determine if a message is spam or innocent (confidence <0.65 or dictionary containing less than 1000 innocent messages and 250 spam), the other group member dictionaries are used. The analysis stops when a class dictionary classifies the message. In practice, this group is a chain containing all users in the group, which is traversed linearly until a decision is reached. Each user should be listed as a member of the group for querying dictionaries of other members.
- Global: an alternative to classification groups. This group type is used to define a Global classification group in which all members of the system can query dictionaries of members listed. If a user dictionary is not sufficient to classify a message, then it ask the opinion of the members of Global, by traversing the chain of members until a formal decision is reached. In short, Global is a sort of “council of wise men” that each user can query.
- Merged: Merged assembles the user dictionary and the dictionary referenced to form one new dictionary and use it for analysis. New user specific tokens are always written back to the user dictionary. Training the Merged group alone (without the members) will influence the accuracy for each Merged group member.
- Inoculation: This last group is somewhat unusual. It is the principle of vaccination, and allows a user having received spam not detected to inform all other users that this message is spam. Thus, each user has its own dictionary, which he uses exclusively for analysis, but users can exchange tokens between them. The first user is infected, the others are vaccinated. This principle of inoculation also allows user to define a bin, a honeypot for spam, which receive only spam and will thus accelerate the learning for everyone. This second mode is called 'external inoculation'.
- Setting up a group
Setting up a group is rather simple, the hardest part is to determine the correct group for your environment, and then to monitor the behavior over several weeks.
In our example we will implement a group of type “classification”. Since this type allows each user to retain his personal dictionary, it has little impact on the infrastructure (in case you want to delete the group).
DSPAM reads the group configuration from a text file located its 'Home Directory'. For us, that would be under '/var/spool/dspam/group'. The file contains one line per group in the form <groupName>:<type>:<user 1>, …, < user n>
We will create a group of type 'classification' including users jean-kevin, julien and root, and we will call this group 'class-debian-lab'.
# echo "class-debian-lab:classification:jean-kevin@debian.lab,julien@debian.lab,root@debian.lab" > /var/spool/dspam/group # chown dspam:dspam /var/spool/dspam/group # kill `pidof dspam` # start-stop-daemon --start --chuid dspam --exec /usr/bin/dspam -- --daemon
By enabling the debug trace in dspam.conf (Directive 'Debug *' when dspam is compiled with debug mode), we can see the group being used in the file '/var/spool/dspam/log/dspam.debug'.
10150: [09/08/2010 14:43:19] user jean-kevin@debian.lab is member of classification group class-debian-lab 10150: [09/08/2010 14:43:19] adding user julien@debian.lab to classification network group 10150: [09/08/2010 14:43:19] adding user root@debian.lab to classification network group
- Maintenance
- dspam_logrotate
This program provides log rotation for both system and DSPAM user logs (those stored in /var/spool/dspam).
The command can be run for a specific user or for all users in the dspam directory. In our case, we want to achieve rotation for everyone when logs exceed 60 days. We will therefore put the following in crontab:
30 5 * * * dspam /usr/bin/dspam_logrotate -a 60 -d /var/spool/dspam/data/
- Hash Driver cleanup
DSPAM's hash driver stores a large amount of information, be it for tokens or history. It therefore provides a tool to do some cleaning. 'dspam_clean' will clean up the dictionaries using the parameters defined in dspam.conf.
- dspam_clean
The default configuration for 'dspam_clean' is to retain all signatures for 14 days and clean the little used tokens after 15, 30 and 90 days depending on the type. Again, the configuration file provided by the sources is rather well commented.
# # Purge configuration: Set dspam_clean purge default options, if not otherwise # specified on the commandline # PurgeSignatures 14 # Stale signatures PurgeNeutral 90 # Tokens with neutralish probabilities PurgeUnused 90 # Unused tokens PurgeHapaxes 30 # Tokens with less than 5 hits (hapaxes) PurgeHits1S 15 # Tokens with only 1 spam hit PurgeHits1I 15 # Tokens with only 1 innocent hit
To achieve a periodic purge, add dspam_clean to the dspam user's cron. For example, with a command in /etc/crontab that starts every day at 5:
0 5 * * * dspam /usr/bin/dspam_clean -s -p -u
This command will perform the purge of the three types of information, including signatures, and neutral tokens that are not used.
- Databases cleanup
If you are using a database backend, and not the Hash driver, you need an external script to connect to the database and clean the tokens.
The script contrib/dspam_maintenance/dspam_maintenance.sh is written to connect to any of the 3 types of database backend DSPAM supports, and perform that cleanup for you.
'dspam_maintenance.sh' will read the Purge Configuration (as described above) from dspam.conf, connect to the backend and perform the cleanup. It requires to have an external set of queries for your database. This is database specific, and can be found in src/tools.<backend>
tools.mysql_drv ├── purge-4.1.sql └── purge.sql tools.pgsql_drv ├── purge-pe.sql └── purge.sql tools.sqlite_drv ├── purge-2.sql └── purge-3.sql
Copy the proper set of queries in /var/spool/dspam and give the permissions to user 'dspam'.
# cp -r tools.pgsql_drv/ /var/spool/dspam/ # chown dspam:dspam /var/spool/dspam/tools.pgsql_drv/ -R
Now, copy the 'dspam_maintenance.sh' script to /etc/cron.daily/ (or cron.weekly if you prefer), and configure it as follow:
# cp contrib/dspam_maintenance/dspam_maintenance.sh /etc/cron.daily/dspam_maintenance # chmod +x /etc/cron.daily/dspam_maintenance # vim /etc/cron.daily/dspam_maintenance [...] DSPAM_CONFIGDIR="/etc/dspam" DSPAM_HOMEDIR="/var/spool/dspam/" DSPAM_PURGE_SCRIPT_DIR="/var/spool/dspam/tools.pgsql_drv/" DSPAM_BIN_DIR="/usr/bin" MYSQL_BIN_DIR="/usr/bin" PGSQL_BIN_DIR="/usr/bin" SQLITE_BIN_DIR="/usr/bin" SQLITE3_BIN_DIR="/usr/bin" [...]
<note>Remember that scripts in /etc/cron.* must not contain dots in their names (eg. no dspam_maintenance.sh, use dspam_maintenance)</note>
- Test Procedure
To conclude this section, we will demonstrate the test procedure specified in the README file of DSPAM. This procedure allows us to not only verify that the configuration is operational, but also to familiarize ourselves with the internal controls of DSPAM.
Step 1: Create a blank user
# useradd -d /home/michel-rene -U -m michel-rene # passwd michel-rene
Step 2: Send an email to our current new user
# nc localhost 25 << EOF ehlo mail mail from:<jp.troll@gmail.com> rcpt to:<michel-rene@debian.lab> data From: <jp.troll@gmail.com> To: <michel-rene@debian.lab> Subject: Cours message de test 10 mots c'est pas assez long pour un troll. . quit EOF
Step 3: Check the statistics of the user account with the command dspam_stats
# dspam_stats michel-rene@debian.lab michel-rene@debian.lab TP: 0 TN: 1 FP: 0 FN: 0 SC: 0 NC: 0
Step 4: Check the list of tokens and the associated probabilities via dspam_dump
# dspam_dump michel-rene@debian.lab 4311867737599848632 S: 00000 I: 00001 P: 0.4000 LH: Wed Sep 8 21:20:22 2010 9486336444479993084 S: 00000 I: 00001 P: 0.4000 LH: Wed Sep 8 21:20:22 2010 18360635214432484661 S: 00000 I: 00001 P: 0.4000 LH: Wed Sep 8 21:20:22 2010 […]
These tokens are associated with an innocent message, that is why the value S (spam) is zero and the value I (for Innocent) is one. Also, take note that the tokenizer 'OSB' creates 114 tokens for this small message (a few headers have been added, however, by Postfix). You can see the statistics associated with a particular token in the dictionary by entering its text at the command line. Obviously, with OSB as the tokenizer, the difficulty is knowing the original text of the token.
# dspam_dump michel-rene@debian.lab un+troll 1157728372545618534 S: 00000 I: 00001 P: 0.4000 # dspam_dump michel-rene@debian.lab assez+#+#+#+troll 695260355258399736 S: 00000 I: 000001 P: 0.4000
Step 5: Mark the message as Spam, for example in the web interface.
Step 6: Check the statistics of DSPAM user again:
# dspam_stats michel-rene@debian.lab michel-rene@debian.lab TP: 0 TN: 0 FP: 0 FN: 1 SC: 0 NC: 0
Step 7: Check the status of tokens again:
# dspam_dump michel-rene@debian.lab 4311867737599848632 S: 00001 I: 00000 P: 0.4000 LH: Wed Sep 8 21:28:31 2010 9486336444479993084 S: 00001 I: 00000 P: 0.4000 LH: Wed Sep 8 21:28:31 2010 18360635214432484661 S: 00001 I: 00000 P: 0.4000 LH: Wed Sep 8 21:28:31 2010 […]
The update completed correctly, these tokens are now associated to spam (S is 1, I is zero).
These few commands can not only control that our anti-spam is functional, but also follow the lifecycle of tokens over time.
- Conclusion
Our tour of DSPAM is complete. I have not really talked about success rates and other criteria generally used to classify antispam solutions, for two reasons: firstly, these figures are generally lying, and the results depend heavily on user behavior, so it is difficult to get reproductible figures. And second: there is no real point, today, to have an infrastructure based on a single anti-spam product. Integrating a system with Postfix greylist is trivial, and it is even possible to combine SpamAssassin and DSPAM one behind the other (just call a spamassassin content-filter after returning to Postfix from DSPAM).
So in the end, the best way to fight to use multiple techniques, but what we have seen in these pages is that DSPAM is a great tool for this work. It can be a bit difficult to pick up initially, but the result and the flexibility of the product is well worth the initial investment.
Julien Vehent, and the DSPAM team - 2011
~~DISCUSSION:off~~