spamoracle.conf man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

SPAMORACLE.CONF(5)					    SPAMORACLE.CONF(5)

NAME
       spamoracle.conf - SpamOracle configuration file format

DESCRIPTION
       The  spamoracle.conf  file is a configuration file governing the opera‐
       tion of the spamoracle(1) e-mail classification tool.  By default,  the
       configuration  file is searched in $HOME/.spamoracle.conf but an alter‐
       nate location can be specified using the -config flag to spamoracle(1).

       Important note: most of the configuration parameters should not be mod‐
       ified  lightly, as this may result in completely wrong e-mail classifi‐
       cation.	Familiarity with Graham's filtering algorithm, as described in
       the  paper  referenced  at  the end of this page, is required to really
       understand the effect of the parameters.

SYNTAX
       The spamoracle.conf file is composed of lines of the  form  variable  =
       value.  Lines starting with a hash sign (#) are treated as comments and
       ignored.	 Blank lines are ignored.

       Depending on the type of	 the  variable	(see  the  list	 of  variables
       below), the value part is of the following forms:

       string A	 sequence  of characters.  Blanks (spaces, tabs) at the begin‐
	      ning and the end of the string are ignored.  Alternatively,  the
	      string  can be enclosed in double quotes ("), in which case spa‐
	      ces are not trimmed.  Inside quoted strings, blackslashes (  and
	      double quotes (") must be escaped with a backslash, as in \ or

       boolean
	      Either  on,  yes,	 true, or 1 to activate the boolean option, or
	      off, no, false, or 0 to deactivate it.

       integer
	      A decimal integer

       float  A decimal floating-point number.

       regexp A regular expression in emacs(1) syntax.	The repetition	opera‐
	      tors are *, +, and ?.  Alternation is written \| and grouping is
	      written \(...\).	Character classes are written between brackets
	      [...]  as usual.	A single dot denotes any character except new‐
	      line.  Regular expressions are case-insensitive.

CONFIGURABLE PARAMETERS
       database_file
	      (type string, default value $HOME/.spamoracle.db )
	      The location of the file that contains the database of word fre‐
	      quencies used by spamoracle(1).

       html_retain_tags
	      (type boolean, default value false)
	      In  HTML-formatted  e-mails  and	attachments, the names of HTML
	      tags are normally not treated as words and are ignored  for  the
	      word  frequency  calculations. If the html_retain_tags parameter
	      is set to true, HTML tags (such as img or bold) are  treated  as
	      words and included in the computation of word frequencies.

       html_tag_attributes
	      (type regexp, default value
	      a/href\|img/src\|img/alt\|frame/src\|font/face\|font/color)
	      This  regular  expression	 matches  pairs	 of HTML tags and HTML
	      attributes written as tag/attribute.  When scanning HTML-format‐
	      ted  e-mails  and	 attachments, attributes to HTML tags are nor‐
	      mally ignored, unless the tag/attribute pair matches the regular
	      expression   html_tag_attributes.	  If  the  tag/attribute  pair
	      matches this regexp, the value of the attribute  (for  instance,
	      the URL for the a/href attribute) is scanned for words.

       mail_headers
	      (type regexp, default value from:\|subject:)
	      A regular expression determining which headers of an e-mail mes‐
	      sage are scanned for words.

       spam_header
	      (type string, default value X-Spam)
	      The name of the header that spamoracle mark adds to incoming  e-
	      mail messages, with the results of the spam/non-spam classifica‐
	      tion.

       attachments_header
	      (type string, default value X-Attachments)
	      The name of the header that spamoracle mark adds to incoming  e-
	      mail  messages,  with  the one-line summary of attachment types,
	      names and character sets.	 The generation of this header can  be
	      turned off with the summarize_attachment parameter.

       summarize_attachment
	      (type boolean, default value true)
	      If  this	parameter is set, spamoracle mark generates a one-line
	      summary of the attachments of the incoming messages, and inserts
	      this  summary in the message headers.  Setting this parameter to
	      false disables the generation of this extra header.

       num_meaningful_words
	      (type integer, default value 15)
	      Maximal number of "meaningful" words that are retained for  com‐
	      puting  the  spam probability.  During mail analysis, spamoracle
	      extracts all words of the message, and retains those whose  spam
	      frequency	 (frequency of occurrence in spam messages) is closest
	      to 1 or to 0.  At most  num_meaningful_words  such  "meaningful"
	      words are retained.

       max_repetitions
	      (type integer, default value 2)
	      Maximum  number  of  times  a given word can occur in the set of
	      "meaningful" words retained for computing the spam  probability.
	      The  default  value of 2 means that at most 2 occurrences of the
	      same word will be retained.

       low_freq_limit
	      (type float, default value 0.01)

       high_freq_limit
	      (type float, default value 0.99)
	      The spam frequency of a word is computed as the number of occur‐
	      rences in spam divided by number of occurrences in all messages.
	      This ratio is then clipped to  the  interval  [  low_freq_limit,
	      high_freq_limit  ],  so  that  words  that are extremely rare or
	      extremely common in spam do not bias the probability computation
	      too  much.  The default values of 0.01 and 0.99 are adequate for
	      a corpus of a few thousand e-mails.  For	larger	corpora	 (e.g.
	      10000  e-mails),	the  values  0.001  and	 0.999 may give better
	      results.

       min_meaningful_words
	      (type integer, default value 5)
	      Minimum number of "meaningful" words below which spamoracle mark
	      refuses  to  classify  the  e-mail and outputs "unknown" status.
	      This happens with very short e-mails, or	e-mails	 that  consist
	      exclusively of links and pictures.

       good_mail_prob
	      (type float, default value 0.2)
	      Spam  probability	 below	which the e-mail is classified as non-
	      spam.

       spam_mail_prob
	      (type float, default value 0.8)
	      Spam probability above which the e-mail is classified  as	 spam.
	      Messages	whose  probability  falls  between  good_mail_prob and
	      spam_mail_prob are classified as "unknown".

AUTHOR
       Xavier Leroy <Xavier.Leroy@inria.fr>

SEE ALSO
       spamoracle(1)

       http://www.paulgraham.com/spam.html (Paul Graham's seminal paper)

							    SPAMORACLE.CONF(5)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net