ifile man page on DragonFly

ifile man page on DragonFly
Man page or keyword search:
man Server 44335 pages
apropos Keyword Search (all sections)
Output format
IFILE(1)			 User Commands			      IFILE(1)

NAME
       ifile - core executable for the ifile mail filtering system

SYNOPSIS
       ifile  [-b  file] [-q|-Q] [-g] [-k] [-o] [-v num] [lexing options] file
       ...
       ifile -c -q|-Q [-T threshold] [-b file] [-g] [-k] [-o] [lexing options]
       file ...
       ifile  [-b  file]  [-d folder] [-i folder|-u folder] [-g] [-k] [-o] [-v
       num] [lexing options] file ...
       ifile -r [-b file]

DESCRIPTION
       ifile is a mail filter client that uses machine learning to classify e-
       mail  into  folders/mail	 boxes.	  The algorithm that it uses is called
       Naive Bayes.    Basically,  naive  bayes	 considers  each  document  an
       unordered  collection  of words and classifies by matching the document
       distribution with the most closely  matching  folder/mailbox  distribu‐
       tion.

OPTIONS
       -b, --db-file=file
	      Location to read/store ifile database.  Default is ~/.idata

       -c, --concise
	      equivalent of "ifile -v 0 | head -1 | cut -f1 -d".  Must be used
	      with -q or -Q.

       -d, --delete=folder
	      Delete the statistics for each of files from the category folder

       -f, --folder-calcs=folder
	      Show the word-probability calculations for folder

       -g, --log-file
	      Create and store debugging information in ~/.ifile.log

       -i, --insert=folder
	      Add the statistics for each of the files to the category folder

       -k, --keep-infrequent
	      Leave in the database words that	occur  infrequently  (normally
	      they are tossed)

       -l, --query-loocv=folder
	      For  each	 of  the  files, temporarily removes file from folder,
	      performs query and then reinserts file in folder.	  Database  is
	      not modified.

       -o, --occur
	      Uses  document  bit-vector representation.  Count each word once
	      per document.

       -q, --query
	      Output rating scores for each of the files

       -Q, --query-insert
	      For each of the files, output rating scores and  add  statistics
	      for the folder with the highest score

       -T, --threshold=threshold
	      When  used  with	both -c and -q, output the two highest ranking
	      categories if their score differs by at most threshold  /	 1000,
	      which  can  be  used  to detect border cases.  When used with -q
	      only and any threshold > 0, output the score difference percent‐
	      age.  For example,
		     ifile -T1 -q foo.txt
	      might result in
		     spam -15570.48640776
		     non-spam -18728.00272369
		     diff[spam,non-spam](%) 9.21
	      If so, then
		     ifile -T93 -q -c foo.txt
	      will result in
		     foo.txt spam,non-spam
	      whereas
		     ifile -T92 -q -c foo.txt
	      will result in
		     foo.txt spam

       -r, --reset-data
	      Erases all currently stored information

       -u, --update=folder
	      Same as 'insert' except only adds stats if folder already exists

       -v, --verbosity=num
	      Amount  of  output while running: 0=silent, 1=quiet, 2=progress,
	      3=verbose, 4=debug

       Lexing options:

       -a, --alpha-lexer
	      Lex words as sequences of alphabetic characters (default)

       -A, --alpha-only-lexer
	      Only lex space-separated character sequences which are  composed
	      entirely of alphabetic characters

       -h, --strip-header
	      Skip all of the header lines except Subject:, From: and To:

       -m, --max-length=char
	      Ignore  portion  of  message  after  first char characters.  Use
	      entire message if char set to 0.	Default is 50,000.

       -p, --print-tokens
	      Just tokenize and print, don't do any other  processing.	 Docu‐
	      ments are returned as a list of word, frequency pairs.

       -s, --no-stoplist
	      Do not throw out overly frequent (stoplist) words when lexing

       -S, --stemming
	      Use 'Porter' stemming algorithm when lexing documents

       -w, --white-lexer
	      Lex words as sequences of space separated characters

       If  no files are specified on the command line, ifile will use standard
       input as its message to process.

       -?, --help
	      Give this help list

       --usage
	      Give a short usage message

       -V, --version
	      Print program version

       Mandatory or optional arguments to long options are also	 mandatory  or
       optional for any corresponding short options.

FILES
       ~/.idata
	      ifile  database  (default	 location).  See FAQ included in ifile
	      package for description of database format.

AUTHOR
       Jason  Rennie  <jrennie@csail.mit.edu>  and  many  others.    See   the
       ChangeLog for the full list.

EXAMPLES
       Before  using  ifile,  you  need	 to train it.  Let's say that you have
       three folders, "spam", "ifile" and "friends", and the following	direc‐
       tory structure:

	      /--+--spam----+--1
		 |	    +--2
		 |	    +--3
		 |
		 +--ifile---+--1
		 |	    +--2
		 |	    +--3
		 |
		 +--friends-+--1
			    +--2
			    +--3

       The following commands build the ifile database in ~/.idata (use the -d
       option to specify a different location for the database):

	      ifile -h -i spam /spam/*
	      ifile -h -i ifile /ifile/*
	      ifile -h -i friends /friends/*

       The -h option strips off headers besides "Subject:", "From:" and "To:".
       I find that -h improves ifile's performance, but you may find otherwise
       for your personal collection.

       Note that we have made the argument to -i the same as the corresponding
       folder  name. This is not necessary. The argument to -i can be any word
       you want to use to identify a category of e-mails. The argument	to  -i
       must not include space characters (including tab, feedline, etc.).

       At this point, your ~/.idata file should look something like this:

	      spam ifile friends
	      662 1020 6451
	      3 3 3
	      jrennie 9 0:3 1:18 2:16
	      mindspring 6 1:7 2:5
	      make 9 0:5 1:3
	      yahoo 9 0:1 1:22 2:2

       The  first  line is the space-separated list of folders. Their ordering
       specifies a numbering (spam=0, ifile=1, friends=2). The second line  is
       a  token	 count	for each folder (e.g. 662 tokens observed in the three
       spam messages). The third line is an e-mail count for each folder (e.g.
       3  e-mails  for	each  of spam, ifile and friends). Each following line
       specifies statistics for a word. The format of a line is

	      word age folder:count [folder:count ...]

       where folder is the folder number determined by the first  line	order‐
       ing.  Folders  with a count of zero are not listed. So, the line begin‐
       ning with "jrennie" indicates that "jrennie" appeared 3 times in "spam"
       e-mails, 18 times in "ifile" e-mails and 16 times in "friends" e-mails.
       The age is the number of e-mails that have  been	 processed  since  the
       word  was  added to the database. Very infrequent words are pruned from
       the database to keep the database size down.

       Now that you have a database, you might want to	filter	some  e-mails.
       Say you have the following incoming e-mails:

	      /--inbox--+--1
			+--2
			+--3

       To find out what folders ifile thinks these e-mails belong in, run

	      ifile -c -q /inbox/1
	      ifile -c -q /inbox/2
	      ifile -c -q /inbox/3

       Let's  say  that	 1  is	about ifile, 2 is spam and 3 is from a friend.
       Assuming ifile does its job correctly, you'll see output like this:

	      /inbox/1 ifile
	      /inbox/2 spam
	      /inbox/3 friends

       With such little training data, ifile is unlikely  to  get  the	labels
       correct, but you should get the idea :-)

       Now,  if you move the e-mails to the folders suggested by ifile, you'll
       want to update the database accordingly. You can do this	 with  the  -i
       option,	like  before.  Or, you can simply use -Q in place of -q above.
       This automatically adds the e-mail to the folder ifile suggests.

       Now, assume for a moment that e-mail 1 was actually spam. We've added 1
       to ifile and put it in the ifile folder. We need to move it to the spam
       folder and update the ifile database accordingly.  We  can  update  the
       database with the following command:

	      ifile -d ifile -i spam /inbox/1

       This deletes the e-mail from "ifile" and adds it to "spam".

SEE ALSO
       Examples	 of how to use ifile together with procmail(1) and metamail(1)
       can be found in the directory /usr/share/doc/ifile/examples.

ifile 1.3.4			 November 2004			      IFILE(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome