qsf man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

QSF(1)				 User Manuals				QSF(1)

NAME
       qsf - quick spam filter

SYNOPSIS
       Filtering:	qsf [-snrAtav] [-d DB] [-g DB]
			    [-L LVL] [-S SUBJ] [-H MARK] [-Q NUM]
			    [-X NUM]
       Training:	qsf -T SPAM NONSPAM [MAXROUNDS] [-d DB]
       Retraining:	qsf -[m|M] [-d DB] [-w WEIGHT] [-ayN]
       Database:	qsf -[p|D|R|O] [-d DB]
       Database merge:	qsf -E OTHERDB [-d DB]
       Allowlist query: qsf -e EMAIL [-m|-M|-t] [-d DB] [-g DB]
       Denylist query:	qsf -y -e EMAIL [-m -m|-M -M|-t] [-d DB] [-g DB]
       Help:		qsf -[h|V]

DESCRIPTION
       qsf  reads  a single email on standard input, and by default outputs it
       on standard output.  If the email is determined to be  spam,  an	 addi‐
       tional header ("X-Spam: YES") will be added, and optionally the subject
       line can have "[SPAM]" prepended to it.

       qsf is intended to be used in a procmail(1) recipe, in a	 ruleset  such
       as this:

	       :0 wf
	       | qsf -ra

	       :0 H:
	       * X-Spam: YES
	       $HOME/mail/spam

       For  more examples, including sample procmail(1) recipes, see the EXAM‐
       PLES section below.

TRAINING
       Before qsf can be used properly, it needs to be trained.	 A good way to
       train qsf is to collect a copy of all your email into two folders - one
       for spam, and one for non-spam.	Once you have done this, you  can  use
       the training function, like this:

	       qsf -aT spam-folder non-spam-folder

       This  will generate a database that can be used by qsf to guess whether
       email received in the future is spam or not.  Note  that	 this  initial
       training	 run  may  take a long time, but you should only need to do it
       once.

       To mark a single message as spam, pipe it to qsf with  the  --mark-spam
       or  -m  ("mark as spam") option.	 This will update the database accord‐
       ingly and discard the email.

       To mark a single message as non-spam, pipe it to qsf with  the  --mark-
       nonspam	or  -M	("mark as non-spam") option.  Again, this will discard
       the email.

       If a message has been mis-tagged, simply send it to qsf as the opposite
       type,  i.e.  if it has been mistakenly tagged as spam, pipe it into qsf
       --mark-nonspam --weight=2 to add it to the non-spam side of  the	 data‐
       base with double the usual weighting.

OPTIONS
       The qsf options are listed below.

       -d, --database [TYPE:]FILE
	      Use  FILE	 as the spam/non-spam database.	 The default is to use
	      /var/lib/qsfdb and, if that is not available  or	is  read-only,
	      $HOME/.qsfdb.  This option can also be useful if there is a sys‐
	      tem-wide database but you do not want to	use  it	 -  specifying
	      your own here will override the default.

	      If   you	 prefix	  the  filename	 with  a  TYPE,	 of  the  form
	      btree:$HOME/.qsfdb, then this will specify what kind of database
	      FILE is, such as list, btree, gdbm, sqlite and so on.  Check the
	      output of qsf -V to see which database backends  are  available.
	      The default is to auto-detect the type, or, if the file does not
	      already exist, use list.	Note that TYPE is not case-sensitive.

       -g, --global [TYPE:]FILE
	      Use  FILE	 as  the   default   global   database,	  instead   of
	      /var/lib/qsfdb.	If  you	 also specify a database with -d, then
	      this "global" database will be used in read-only	mode  in  con‐
	      junction with the read-write database specified with -d.	The -g
	      option can be used a second time to specify  a  third  database,
	      which  will also be used in read-only mode.  Again, the filename
	      can optionally be prefixed with a TYPE which specifies the data‐
	      base type.

       -P, --plain-map FILE
	      Maintain	a  mapping  of all database tokens to their non-hashed
	      counterparts in FILE, one token per line.	 This can be useful if
	      you  want	 to be able to list the contents of your database at a
	      later date, for instance to get a list  of  email	 addresses  in
	      your allow-list.	Note that using this option may slow qsf down,
	      and only entries written to the database while  this  option  is
	      active will be stored in FILE.

       -s, --subject
	      Rewrite the Subject line of any email that turns out to be spam,
	      adding "[SPAM]" to the start of the line.

       -S, --subject-marker SUBJECT
	      Instead of adding "[SPAM]", add SUBJECT to the Subject  line  of
	      any email that turns out to be spam.  Implies -s.

       -H, --header-marker MARK
	      Instead of setting the X-Spam header to "YES", set it to MARK if
	      email turns out to be spam.  This can be useful  if  your	 email
	      client can only search all headers for a string, rather than one
	      particular header (so searching for "YES" might match more  than
	      just the output of qsf).

       -n, --no-header
	      Do not add an X-Spam header to messages.

       -r, --add-rating
	      Insert  an  additional header X-Spam-Rating which is a rating of
	      the "spamminess" of a message from 0 to 100; 90  and  above  are
	      counted  as  spam, anything under 90 is not considered spam.  If
	      combined with -t, then the rating (0-100) will be output, on its
	      own, on standard output.

       -A, --asterisk
	      Insert  an  additional  header  X-Spam-Level  which will contain
	      between 0 and 20 asterisks (*), depending on the spam rating.

       -t, --test
	      Instead of passing the message out on  standard  output,	output
	      nothing, and exit 0 if the message is not spam, or exit 1 if the
	      message is spam.	If combined with -r, then the spam rating will
	      be output on standard output.

       -a, --allowlist
	      Enable the allow-list.  This causes the email addresses given in
	      the message's "From:" and "Return-Path:" headers to  be  checked
	      against  a  list;	 if  either  one  matches, then the message is
	      always treated as non-spam, regardless of what the  token	 data‐
	      base says. When specified with a retraining flag, -a -m (mark as
	      spam) will remove that address from the allow-list  as  well  as
	      marking  the  message as spam, and -a -M (mark as non-spam) will
	      add that address to the allow-list as well as marking  the  mes‐
	      sage  as non-spam.  The idea is that you add all of your friends
	      to the allow-list, and then none	of  their  messages  ever  get
	      marked as spam.

       -y, --denylist
	      Enable  the deny-list.  This causes the email addresses given in
	      the message's "From:" and "Return-Path:" headers to  be  checked
	      against  a second list; if either one matches, then theh message
	      is always treated as spam.  Training works in the	 same  way  as
	      with  -a,	 except that you must specify -m or -M twice to modify
	      the deny-list instead of the allow-list, and  with  the  reverse
	      syntax:  -y  -m  -m  (mark as spam) will add that address to the
	      deny-list, whereas -y -M -M (mark as non-spam) will remove  that
	      address  from  the  deny-list.   This double specification is so
	      that the usual retraining process never touches  the  deny-list;
	      the  deny-list  should be carefully maintained rather than auto‐
	      matically generated.

	      Normally you would not need to use the deny-list.

       -L, --level, --threshold LEVEL
	      Change the spam scoring threshold level which  must  be  reached
	      before an email is classified as spam.  The default is 90.

       -Q, --min-tokens NUM
	      Only  give a score if more than NUM tokens are found in the mes‐
	      sage - otherwise the message is assumed to be non-spam,  and  it
	      is  not  modified	 in  any  way.	The default is 0.  This option
	      might be useful if you find that very short messages  are	 being
	      frequently miscategorised.

       -e, --email, --email-only EMAIL
	      Query  or	 update	 the  allow-list  entry	 for the email address
	      EMAIL.  With no other options, this will simply output "YES"  if
	      EMAIL  is	 in  the allow-list, or "NO" if it is not. With -t, it
	      will not output anything, but will exit 0 (success) if EMAIL  is
	      in  the  allow-list,  or	1  (failure) if it is not. With the -m
	      (mark-spam) option, any previous allow-list entry for EMAIL will
	      be  removed.  Finally,  with the -M (mark-nonspam) option, EMAIL
	      will be added to the allow-list if it is not already on it.

	      If EMAIL is just the word MSG on its own, then an email will  be
	      read  from  standard input, and the email addresses given in the
	      "From:" and "Return-Path:" headers will be used.

	      Using -e automatically switches on -a.

	      If you also specify -y, then the deny-list will be operated  on.
	      Remember that -m and -M are reversed with the deny-list.

	      If  you  specify	an  email address of the form @domain (nothing
	      before the @), then the whole  domain  will  be  allow  or  deny
	      listed.

       -v, --verbose
	      Add  extra  X-QSF-Info headers to any filtered email, containing
	      error messages and so on if applicable.  Specify	-v  more  than
	      once to increase verbosity.

       -T, --train SPAM NONSPAM [MAXROUNDS]
	      Train  the database using the two mbox folders SPAM and NONSPAM,
	      by testing each message in each folder and updating the database
	      each  time  a  message  is miscategorised.  This is done several
	      times, and may take a while to run.  Specify the -a (allow-list)
	      flag  to	add  every sender in the NONSPAM folder to your allow-
	      list as a side-effect of the training process.  If MAXROUNDS  is
	      specified,  training will end after this number of rounds if the
	      results are still not good enough. The default is a  maximum  of
	      200 rounds.

       -m, --mark-spam
	      Instead  of passing the message out on standard output, mark its
	      contents as spam and update the database	accordingly.   If  the
	      allow-list  (-a)	is enabled, the message's "From:" and "Return-
	      Path:" addresses are removed from the allow-list.	 If the	 deny-
	      list  (-y)  is  enabled  and you specify -m twice, the message's
	      addresses are added to the deny-list instead.

       -M, --mark-nonspam
	      Instead of passing the message out on standard output, mark  its
	      contents	as  non-spam  and update the database accordingly.  If
	      the allow-list  (-a)  is	enabled,  the  message's  "From:"  and
	      "Return-Path:" addresses are added to the allow-list (see the -a
	      option above).  If the deny-list (-y) is enabled and you specify
	      -M twice, the message's addresses are removed from the deny-list
	      instead.

       -w, --weight WEIGHT
	      When marking as spam or non-spam, update	the  database  with  a
	      weighting of WEIGHT per token instead of the default of 1.  Use‐
	      ful when correcting mistakes, eg a message that has been mistak‐
	      enly  detected  as  spam	should	be  marked as non-spam using a
	      weighting of 2, i.e. double the usual weighting,	to  counteract
	      the error.

       -D, --dump [FILE]
	      Dump the contents of the database as a platform-independent text
	      file, suitable for archival, transfer to another machine, and so
	      on.  The data is output on stdout or into the given FILE.

       -R, --restore [FILE]
	      Rebuild  the  database from scratch from the text file on stdin.
	      If a FILE is given, data is read	from  there  instead  of  from
	      stdin.

       -O, --tokens
	      Instead  of  filtering, output a list of the tokens found in the
	      message read from standard input, along with the number of times
	      each  token  was	found.	This is only useful if you want to use
	      qsf as a general tokeniser for use with another filtering	 pack‐
	      age.

       -E, --merge OTHERDB
	      Merge  the OTHERDB database into the current database.  This can
	      be useful if you want to take one user's mailbox	and  merge  it
	      into  the	 system-wide one, for instance (this would be done by,
	      as root, doing qsf -d /var/lib/qsfdb  -E	/home/user/.qsfdb  and
	      then removing /home/user/.qsfdb).

       -B, --benchmark SPAM NONSPAM [MAXROUNDS]
	      Benchmark	 the  training process using the two mbox folders SPAM
	      and NONSPAM.  A temporary database is created and trained	 using
	      the  first  75%  of  the	messages  in each folder, and then the
	      entire contents of each folder is tested to see how  many	 false
	      positives	 and false negatives occur. Some timing information is
	      also displayed.

	      This can be used to decide which backend is best on your system.
	      Use  -d  to  select  a backend, eg qsf -B spam nonspam -d GDBM -
	      this will create a temporary database which  is  removed	after‐
	      wards.

	      The  exception  to this is the MySQL backend, where a full data‐
	      base   specification   must    be	   given    (-d	   MySQL:data‐
	      base=db;host=localhost;...)   and	 the database table given will
	      not be wiped beforehand or dropped afterwards.

	      As with -T, if MAXROUNDS is specified, training  will  never  be
	      done for more than this number of rounds; the default is 200.

       -h, --help
	      Print a usage message on standard output and exit successfully.

       -V, --version
	      Print  version  information, including a list of available data‐
	      base backends, on standard output and exit successfully.

DEPRECATED OPTIONS
       The following options are only for use with the old binary  tree	 data‐
       base  backend  or  old  databases that haven't been upgraded to the new
       format that came in with version 1.1.0.

       -N, --no-autoprune
	      When marking as spam or nonspam, never automatically  prune  the
	      database.	 Usually the database is pruned after every 500 marks;
	      if you would rather --prune manually, use -N  to	disable	 auto‐
	      matic pruning.

       -p, --prune
	      Remove  redundant	 entries  from	the database and clean it up a
	      little.  This is	automatically  done  after  several  calls  to
	      --mark-spam  or --mark-nonspam, and during training with --train
	      if the training takes a large number of  rounds,	so  it	should
	      rarely be necessary to use --prune manually unless you are using
	      -N / --no-autoprune.

       -X, --prune-max NUM
	      When the database is being pruned, no more than NUM entries will
	      be  considered  for  removal.  This is to prevent CPU and memory
	      resources being taken over.  The default is 100,000 but in  some
	      circumstances  (if  you  find  that pruning takes too long) this
	      option may be used to reduce it to a more manageable number.

FILES
       /var/lib/qsfdb
	      The default (system-wide) spam database.	If you wish to install
	      qsf  system-wide,	 this  should  be read-only to everyone; there
	      should be one user with write access who	can  update  the  spam
	      database	with qsf --mark-spam and qsf --mark-non-spam when nec‐
	      essary.

       /var/lib/qsfdb2
	      A second, read-only, system-wide database. This  can  be	useful
	      when installing qsf system-wide and using third-party spam data‐
	      bases; the first global database can be updated with system-spe‐
	      cific  changes,  and  this  second  database can be periodically
	      updated when the third-party spam database is updated.

       $HOME/.qsfdb
	      The default spam database	 for  per-user	data.	Users  without
	      write  access  to	 the system-wide database will have their data
	      written here, and the two databases will be read together.   The
	      per-user	database  will	be  given a weighting equivalent to 10
	      times the weighting of the global database.

NOTES
       Currently, you cannot use qsf to check for spam while the  database  is
       being  updated.	 This  means  that while an update is in progress, all
       email is passed through as non-spam.

       There is an upper size limit  of	 512Kb	on  incoming  email;  anything
       larger  than this is just passed through as non-spam, to avoid tying up
       machine resources.

       The plaintext  token  mapping  maintained  by  --plain-map  will	 never
       shrink,	only  grow.   It  is intended for use by housekeeping and user
       interface scripts that, for instance, the user  can  use	 to  list  all
       email addresses on their allow-list.  These scripts should take care of
       weeding out entries for tokens that are no longer in the database.   If
       you  have no such scripts, there is probably no point in using --plain-
       map anyway.

       Avoid using the deny-list (-y) in any automated retraining, as  it  can
       be cause the filter to reject mail unnecessarily.  In general the deny-
       list is probably best left unused unless explicitly  required  by  your
       particular setup.

       If  both	 the  allow-list  and  the  deny-list  are enabled, then email
       addresses will first be checked against the deny-list, then the	allow-
       list, then the domain of the email address will be checked for matching
       "@domain" entries in the deny-list and then in the allow-list.

EXAMPLES
       To filter all of your mail through qsf, with the allow-list enabled and
       the  "spam  rating"  header  being  added, add this to your .procmailrc
       file:

	       :0 wf
	       | qsf -ra

       If you want qsf to add "[SPAM]" to the subject line of any messages  it
       thinks are spam, do this instead:

	       :0 wf
	       | qsf -sra

       To  automatically mark any email sent to spambox@yourdomain.com as spam
       (this is the "naive" version):

	       :0 H
	       * ^To:.*spambox@yourdomain.com
	       | qsf -am

       To do the same, but cleverly, so that  only  email  to  spambox@yourdo‐
       main.com	 which	qsf  does  NOT already classify as spam gets marked as
       spam in the database (this  stops  the  database	 getting  too  heavily
       weighted):

	       # If sent to spambox@yourdomain.com:
	       :0
	       * ^To:.*spambox@yourdomain.com
	       {
		  :0 wf
		  | qsf -a

		  # The above two lines can be skipped if you've
		  # already piped the message through qsf.

		  # If the qsf database says it's not spam,
		  # mark it as spam!
		  :0 H
		  * ^X-Spam: NO
		  | qsf -am
	       }

       Remove the -a option in the above examples if you don't want to use the
       allow-list.

       A more complicated filtering example - this will only run qsf  on  mes‐
       sages  which  don't  have a subject line saying "your <something> is on
       fire" and which don't have a sender address  ending  in	"@foobar.com",
       meaning	that  messages	with  that subject line OR that sender address
       will NEVER be marked as spam, no matter what:

	       :0 wf
	       * ! ^Subject: Your .* is on fire
	       * ! ^From: .*@foobar.com
	       | qsf -ra

       For more on  procmail(1)	 recipes,  see	the  procmailrc(5)  and	 proc‐
       mailex(5) manual pages.

       A couple of macros to add to your .muttrc file, if you use mutt(1) as a
       mail user agent:

	       # Press F5 to mark a message as spam and delete it
	       macro index <f5> "<pipe-message>qsf -am\n<delete-message>"
	       macro pager <f5> "<pipe-message>qsf -am\n<delete-message>"

	       # Press F9 to mark a message as non-spam
	       macro index <f9> "<pipe-message>qsf -aM\n"
	       macro pager <f9> "<pipe-message>qsf -aM\n"

       Again, remove the -a option in the above examples if you don't want  to
       use the allow-list.

       Note,  however, that the above macros won't work when operating on mul‐
       tiple tagged messages. For that, you'd need something like this:

	       macro  index  <f5>   ":set   pipe_split\n<tag-prefix><pipe-mes‐
	      sage>qsf -am\n<tag-prefix><delete-message>\n:unset pipe_split\n"

       If you use qmail(7), then to get procmail working with it you will need
       to put a line containing just DEFAULT=./Maildir/ at  the	 top  of  your
       ~/.procmailrc  file,  so	 that procmail delivers to your Maildir folder
       instead of trying to deliver to	/var/spool/mail/$USER,	and  you  will
       need to put this in your ~/.qmail file:

	       | preline procmail

       This  will  cause all your mail to be delivered via procmail instead of
       being delivered directly into your mail directory.

       See the qmail(7) documentation for more about mail delivery with qmail.

       If you use postfix(1), you can set up a system-wide mail filter by cre‐
       ating a user account for the purpose of filtering mail, populating that
       account's .qsfdb, and then creating a shell  script,  to	 run  as  that
       user, which runs qsf on stdin and passes stdout to sendmail(8).

       Doing  this  requires  some knowledge of postfix configuration and care
       needs to be taken to avoid mail loops.  One qsf user's  full  HOWTO  is
       included in the doc/ directory with this package.

THE ALLOW-LIST
       A  feature called the "allow-list" can be switched on by specifying the
       --allowlist or -a option.  This causes messages' "From:"	 and  "Return-
       Path:"  addresses  to be checked against a list of people you have said
       to allow all messages from, and if  a  message's	 "From:"  or  "Return-
       Path:"  address is in the list, it is never marked as spam.  This means
       you can add all your friends to an "allow-list" and qsf will then never
       mis-file	 their	messages - a quick way to do this is to use -a with -T
       (train); everyone in your non-spam folder who has  sent	you  an	 email
       will be added to the allow-list automatically during training.

       You  can	 manually  add and remove addresses to and from the allow-list
       using the -e (email) option. For instance, to add  foo@bar.com  to  the
       allow-list, do this:

	       qsf -e foo@bar.com -M

       To remove bad@nasty.com from the allow-list, do this:

	       qsf -e bad@nasty.com -m

       And  to	see whether someone@somewhere.com is in the allow-list or not,
       just do this:

	       qsf -e someone@somewhere.com

       In general, you probably always	want  to  enable  the  allow-list,  so
       always  specify	the -a option when using qsf.  This will automatically
       maintain the allow-list based on what you classify as spam or non-spam.

       The only times you might want to turn it off are when  people  on  your
       allow-list  are prone to getting viruses or if a virus is causing email
       to be sent to you that is pretending to be from someone on your	allow-
       list.

BACKUP AND RESTORE
       Because	the database format is platform-specific, it is a good idea to
       periodically dump the database to a text file using qsf -D so that,  if
       necessary,  it  can be transferred to another machine and restored with
       qsf -R later on.

       Also note that since the actual contents of email  messages  are	 never
       stored  in  the	database (see TECHNICAL DETAILS), you can safely share
       your qsf database with friends - simply dump your database to  a	 file,
       like this:

	       qsf -D > your-database-dump.txt

       Once  you  have sent your-database-dump.txt to another person, they can
       do this:

	       qsf -R < your-database-dump.txt

       They will then have an identical database to yours.

TECHNICAL DETAILS
       When a message is passed to qsf, any attachments are decoded, all  HTML
       elements	 are  removed,	and  the  message  text is then broken up into
       "tokens", where a "token" is a single  word  or	URL.   Each  token  is
       hashed  using  the  MD5 algorithm (see below for why), and that hash is
       then used to look up each token in the qsf database.

       For full details of which parts of an  email  (headers,	body,  attach‐
       ments, etc) are used to calculate the spam rating, see the TOKENISATION
       section below.

       Within the database, each token has two numbers associated with it: the
       number  of  times  that	token has been seen in spam, and the number of
       times it has been seen in non-spam.  These two numbers, along with  the
       total  number of spam and non-spam messages seen, are then used to give
       a "spamminess" value for	 that  particular  token.   This  "spamminess"
       value  ranges  from  "definitely	 not  spammy" at one end of the scale,
       through "neutral" in the middle, up to "definitely spammy" at the other
       end.

       Once  a "spamminess" value has been calculated for all of the tokens in
       the message, a summary calculation is made to give an overall "is  this
       spam?"  probability rating for the message.  If the overall probability
       is 0.9 or above, the message is flagged as spam.

       In addition to the probability test is the  "allow-list".   If  enabled
       (with  the  -a  option),	 the whole probability check is skipped if the
       sender of the message is listed in the allow-list, and the  message  is
       not marked as spam.

       When  training  the  database,  a  message  is  split up into tokens as
       described above, and then the numbers in the database  for  each	 token
       are  simply  added  to: if you tell qsf that a message is spam, it adds
       one to the "number of times seen in spam" counter for each  token,  and
       if  you	tell  it  a message is not spam, it adds one to the "number of
       times seen in non-spam" counter for  each  token.   If  you  specify  a
       weight, with -w, then the number you specify is added instead of one.

       To  stop	 the database growing uncontrollably, the database keeps track
       of when a token was last	 used.	 Underused  tokens  are	 automatically
       removed	from  the  database.  (The old method was to "prune" every 500
       updates).

       Finally, the reason MD5 hashes were used is  privacy.   If  the	actual
       tokens  from the messages, and the actual email addresses in the allow-
       list, were stored, you could not share a single	qsf  database  between
       multiple	 users	because	 bits  of  everyone's messages would be in the
       database - things like emailed passwords, keywords relating to personal
       gossip, and so on.  So a hash is stored instead.	 A hash is a "one-way"
       function; it is easy to turn a token into a hash but  very  hard	 (some
       might  say  impossible) to turn a hash back into the token that created
       it.  This means that you end up with a database with no personal infor‐
       mation in it.

TOKENISATION
       When  a	message is broken up into tokens, various parts of the message
       are treated in different ways.

       First, all header fields are discarded, except for the important	 ones:
       From, Return-Path, Sender, To, Reply-To, and Subject.

       Next,  any MIME-encoded attachments are decoded.	 Any attachments whose
       MIME type starts with "text/" (i.e. HTML and text) are tokenised, after
       having  any  HTML  tags	stripped.   Any	 non-textual  attachments  are
       replaced with their MD5 hash (such that two identical attachments  will
       have the same hash), and that hash is then used as a token.

       In  addition to single-word tokens from textual message parts, qsf adds
       doubled-up tokens so that word pairs get added to the  database.	  This
       makes  the  database a bit bigger (although the automatic pruning tends
       to take care of that) but makes matching more exact.

SPECIAL FILTERS
       As well as using the textual content of email to detect spam, qsf  also
       uses  special  filters  which  create  "pseudo-tokens" based on various
       rules.  This means that specific patterns, not just  individual	words,
       can be used to determine whether a message is spam or not.

       For  example,  if a message contains lots of words with multiple conso‐
       nants, like "ashjkbnxcsdjh", then each time a word like	that  is  seen
       the  special  token  ".GIBBERISH-CONSONANTS."  is  added to the list of
       tokens found in the message.  If it turns out that most	messages  with
       words  that trigger this filter rule are spam, then other messages with
       gibberish consonant strings will be more likely to be flagged as spam.

       Currently the special filters are:

       GTUBE  Flags	 any	  message      containing      the	string
	      XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-
	      EMAIL*C.34X as spam - useful for testing that your qsf installa‐
	      tion is working.

       ATTACH-SCR

       ATTACH-PIF

       ATTACH-EXE

       ATTACH-VBS

       ATTACH-VBA

       ATTACH-LNK

       ATTACH-COM

       ATTACH-BAT
	      Adds a token for every attachment whose filename ends in ".scr",
	      ".pif", ".exe",  ".vbs",	".vba",	 ".lnk",  ".com",  and	".bat"
	      respectively (these are often viruses).

       ATTACH-GIF

       ATTACH-JPG

       ATTACH-PNG
	      Adds a token for every attachment whose filename ends in ".gif",
	      ".jpg" or ".jpeg", and ".png" respectively.

       ATTACH-DOC

       ATTACH-XLS

       ATTACH-PDF
	      Adds a token for every attachment whose filename ends in ".doc",
	      ".xls",  or  ".pdf"  respectively (these tend to indicate a non-
	      spam email).

       SINGLE-IMAGE
	      Adds a token if the message contains exactly one attached image.

       MULTIPLE-IMAGES
	      Adds a token if the message  contains  more  than	 one  attached
	      image.

       GIBBERISH-CONSONANTS
	      Adds  a  token for every word found that has multiple consonants
	      in a row, as described above.  Spam often	 contains  strings  of
	      gibberish.

       GIBBERISH-VOWELS
	      Adds  a token for every word found that has multiple vowels in a
	      row, eg "aeaiaiaeeio".

       GIBBERISH-FROMCONS
	      Like GIBBERISH-CONSONANTS, but only for the "From:" and "Return-
	      Path:" addresses on their own.

       GIBBERISH-FROMVOWL
	      Like  GIBBERISH-VOWELS,  but  only  for the "From:" and "Return-
	      Path:" addresses on their own.

       GIBBERISH-BADSTART
	      Adds a token for every word that starts  with  a	bad  character
	      such as %.

       GIBBERISH-HYPHENS
	      Adds  a  token  for  every  word with more than three hyphens or
	      underscores in it.

       GIBBERISH-LONGWORDS
	      Adds a token for every word with over 30 characters in  it  (but
	      less than 60).

       HTML-COMMENTS-IN-WORDS
	      Adds  a  token  for  every HTML comment found in the middle of a
	      word.   Spam  often  contains  HTML  inside  words,  like	 this:
	      w<!--dsgfhsdgjgh-->ord

       HTML-EXTERNAL-IMG
	      Adds  a  token  for every HTML <img> (image) tag found that con‐
	      tains :// (i.e.  it refers to an external image).

       HTML-FONT
	      Adds a token for every HTML <font> tag found.

       HTML-IP-IN-URLS
	      Adds a token for every URL found containing an IP address.

       HTML-INT-IN-URL
	      Adds a token for every URL found containing an  integer  in  its
	      hostname.

       HTML-URLENCODED-URL
	      Adds  a  token  for  every  URL found containing a % sign in its
	      hostname.

       Normally, filters will just cause a token to be added, and these tokens
       are  processed  by  the	normal weighting algorithm.  However the GTUBE
       filter will immediately flag any matching message  as  spam,  bypassing
       the token matching.

DATABASE BACKENDS
       The  inbuilt  "list"  database backend will not necessarily provide the
       best performance, but is provided because using it requires no external
       libraries.

       If,  when  qsf was compiled, the correct libraries were available, then
       it will be possible to use qsf with alternative database backends.   To
       find  out which backends you have available, run qsf -V (capital V) and
       read the second line of output.	To see how well	 a  backend  performs,
       collect	some  spam and non-spam and use qsf -d BACKEND -B SPAM NONSPAM
       (see the entry for -B above).

       Some people find that they get the best performance  out	 of  the  gdbm
       backend; this is a library that is widely available on many systems.

       To  efficiently	share a qsf database across multiple machines, you may
       find the MySQL backend useful.  However, using it is a little more com‐
       plicated.

       To  use	the  MySQL  backend  you  will need to create a table with the
       fields key1, key2,  token,  value1,  value2  and	 value3.   The	token,
       value1,	value2,	 and value3 fields must be VARCHAR(64), BIGINT or INT,
       and BIGINT or INT respectively, and indexing on the token  field	 is  a
       good  idea.  The key1 and key2 fields can be anything, but they must be
       present.

       For example:

		USE mydatabase;
		CREATE TABLE qsfdb (
		  key1	    BIGINT UNSIGNED NOT NULL,
		  key2	    BIGINT UNSIGNED NOT NULL,
		  token	    VARCHAR(64) DEFAULT '' NOT NULL,
		  value1    INT UNSIGNED NOT NULL,
		  value2    INT UNSIGNED NOT NULL,
		  value3    INT UNSIGNED NOT NULL,
		  PRIMARY KEY (key1,key2,token),
		  KEY (key1),
		  KEY (key2),
		  KEY (token)
		);

       The key1 and key2 fields allow you to have multiple  qsf	 databases  in
       one table, by specifying different key1 and key2 values on invocation.

       Instead	of specifying a database file with the --database / -d option,
       you must specify either a specification string as described  below,  or
       the name of a file containing such a string on its first line.

       The specification string is as follows:

		database=DATABASE;host=HOST;port=PORT;
		user=USER;pass=PASS;table=TABLE;
		key1=KEY1;key2=KEY2

       This string must be all on one line, with no spaces.

       DATABASE
	      is the name of the MySQL database.

       HOST   is the hostname of the database server (eg "localhost").

       PORT   is the TCP port to connect on (eg 3306).

       USER   is the username to connect with.

       PASS   is the password to connect with.

       TABLE  is  the  database	 table to use.	If a table with this name does
	      not exist when qsf is called in update or training mode, then it
	      will be created if permissions allow this to be done.

       KEY1   is the value to use for the key1 field.

       KEY2   is the value to use for the key2 field.

       Since  command  lines  can  be seen in the process list, it is probably
       best to specify a filename (eg qsf -d  mysql:qsfdb.spec)	 and  put  the
       specification string inside that file.

TROUBLESHOOTING
       If  you	have  problems	with qsf, please check the list below; if this
       does not help, go to the qsf home  page	and  investigate  the  mailing
       lists, or email the author.

       Nothing is being marked as spam.
	      First,  use the -r option to switch on the X-Spam-Rating header,
	      and check that this header appears in email passed through  qsf.
	      If  it  does not, then it is likely that qsf is not being run at
	      all - check your configuration of procmail(1) or its equivalent.

	      If you are seeing X-Spam-Rating headers,	and  different	emails
	      have  different scores, then you may simply need to retrain your
	      database a little more.  Take more spam email and pass it to qsf
	      -m.

	      If  you  are  seeing X-Spam-Rating headers but they all give the
	      same spam rating, then the most likely reason is that qsf is not
	      reading any database.  Make sure that whatever is processing the
	      email has read permissions on /var/lib/qsfdb and/or  ~/.qsfdb  -
	      and  make	 sure that, if you are using ~/.qsfdb, what your data‐
	      base creator thought was ~ ($HOME) is the	 same  as  it  is  for
	      whatever is processing the email.

       Retraining sometimes takes a very long time.
	      With  the	 obtree	 backend  or  2-column MySQL or SQLite tables,
	      every 500th retrain (-m or -M), the database is pruned.  On some
	      systems  this may take some time, and during this time the data‐
	      base is locked (except when using the MySQL or SQLite backends).
	      If you constantly do a lot of retraining and want to avoid this,
	      then use the -N option to suppress auto-pruning, and then have a
	      cron(8)  job  or something run a manual prune (qsf -p) every now
	      and again.

       Running qsf from procmail fails with an error.
	      If you can run qsf from the command line, but in	your  procmail
	      log file you get errors about "qsf: cannot execute binary file",
	      then contact your system administrator for help. It may be  that
	      incoming	email  is handled by a different server to the one you
	      normally shell into, and either they are of a  different	archi‐
	      tecture or operating system, or the mail server is not permitted
	      to execute user-owned binaries.

ACKNOWLEDGEMENTS
       The following people have contributed suggestions,  comments,  patches,
       and testing:

	      Tom Parker <http://www.bits.bris.ac.uk/palfrey/>
	      Dr Kelly A. Parker
	      Vesselin Mladenov <http://www.antipodes.bg/>
	      Glyn Faulkner
	      Mark Reynolds
	      Sam Roberts
	      Scott Allen
	      Karsten Kankowski
	      M. Kolbl
	      Micha Holzmann
	      Jef Poskanzer <http://www.acme.com/jef/>
	      Clemens Fischer <http://ino-waiting.gmxhome.de/>
	      Nelson A. de Oliveira
	      Michal Vitecek
	      Tommy Pettersson <http://www.lysator.liu.se/~ptp/>

AUTHOR
       The author:

	      Andrew Wood <andrew.wood@ivarch.com>
	      http://www.ivarch.com/

       Project home page:

	      http://www.ivarch.com/programs/qsf/

BUGS
       If  you find any bugs, please contact the author, either by email or by
       using the contact form on the web site.

SEE ALSO
       procmail(1), procmailrc(5), procmailex(5)

       Someone has written a guide to using qsf with KMail that can  be	 found
       at:
       http://www.softwaredesign.co.uk/Information.SpamFilters.html

LICENSE
       This is free software, distributed under the ARTISTIC 2.0 license.

1.2.7				  August 2007				QSF(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net