recoll.conf man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

RECOLL.CONF(5)							RECOLL.CONF(5)

NAME
       recoll.conf - main personal configuration file for Recoll

DESCRIPTION
       This  file  defines  the	 index	configuration for the Recoll full-text
       search system.

       The  system-wide	 configuration	file  is   normally   located	inside
       /usr/[local]/share/recoll/examples.  Any	 parameter  set	 in the common
       file may be overridden by setting  it  in  the  personal	 configuration
       file, by default: $HOME/.recoll/recoll.conf

       Please  note  while  we	try  to keep this manual page reasonably up to
       date, it will frequently lag the current state  of  the	software.  The
       best  source of information about the configuration are the comments in
       the system-wide configuration file.

       A short extract of the file might look as follows:

	      # Space-separated list of directories to index.
	      topdirs =	 ~/docs /usr/share/doc

	      [~/somedirectory-with-utf8-txt-files]
	      defaultcharset = utf-8

       There are three kinds of lines:

	      ·	     Comment or empty

	      ·	     Parameter affectation

	      ·	     Section definition

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in the form 'name = value'.

       Section lines allow redefining a parameter  for	a  directory  subtree.
       Some  of	 the parameters used for indexing are looked up hierarchically
       from the more to the less specific. Not all parameters can be  meaning‐
       fully redefined, this is specified for each in the next section.

       The  tilde  character  (~) is expanded in file names to the name of the
       user's home directory.

       Where values are lists, white space is used for	separation,  and  ele‐
       ments with embedded spaces can be quoted with double-quotes.

OPTIONS
       topdirs = directories
	      Specifies the list of directories to index (recursively).

       skippedNames = patterns
	      A	 space-separated list of patterns for names of files or direc‐
	      tories that should be completely ignored. The  list  defined  in
	      the default file is:

	      *~ #* bin CVS  Cache caughtspam  tmp

	      The  list can be redefined for subdirectories, but is only actu‐
	      ally changed for the top level ones in topdirs

       skippedPaths = patterns
	      A space-separated list of patterns for paths the indexer	should
	      not descend into. Together with topdirs, this allows pruning the
	      indexed tree to one's content.  daemSkippedPaths can be used  to
	      define a specific value for the real time indexing monitor.

       skippedPathsFnmPathname = 0/1
	      The values in the *skippedPaths variables are matched by default
	      with  fnmatch(3),	 with  the  FNM_PATHNAME  and  FNM_LEADING_DIR
	      flags.  This  means  that '/' characters must be matched explic‐
	      itly. You can set skippedPathsFnmPathname to 0  to  disable  the
	      use   of	 FNM_PATHNAME	(meaning   that	  /*/dir3  will	 match
	      /dir1/dir2/dir3).

       followLinks = boolean
	      Specifies if the indexer	should	follow	symbolic  links	 while
	      walking  the  file tree. The default is to ignore symbolic links
	      to avoid multiple indexing of linked files. No effort is made to
	      avoid  duplication  when this option is set to true. This option
	      can be set individually for each of the topdirs members by using
	      sections. It can not be changed below the topdirs level.

       indexedmimetypes = list
	      Recoll  normally	indexes	 any  file which it knows how to read.
	      This list lets you restrict the indexed mime types to  what  you
	      specify.	If  the variable is unspecified or the list empty (the
	      default), all supported types are processed.

       compressedfilemaxkbs = value
	      Size limit for compressed (.gz or .bz2) files. These need to  be
	      decompressed  in a temporary directory for identification, which
	      can be very wasteful if 'uninteresting' big compressed files are
	      present.	 Negative means no limit, 0 means no processing of any
	      compressed file. Defaults to -1.

       textfilemaxmbs = value
	      Maximum size for text files. Very big text files are often unin‐
	      teresting logs. Set to -1 to disable (default 20MB).

       textfilepagekbs = value
	      If  this	is set to other than -1, text files will be indexed as
	      multiple documents of the given page size. This may be useful if
	      you  do want to index very big text files as it will both reduce
	      memory usage at index time and help with	loading	 data  to  the
	      preview  window. A size of a few megabytes would seem reasonable
	      (default: 1000 : 1MB).

       membermaxkbs = value in kilobytes
	      This defines the maximum size for an archive member (zip, tar or
	      rar  at  the  moment).  Bigger  entries will be skipped. Current
	      default: 50000 (50 MB).

       indexallfilenames = boolean
	      Recoll indexes file names into a special section of the database
	      to  allow	 specific  file	 names searches using wild cards. This
	      parameter decides if file name indexing is  performed  only  for
	      files  with  mime	 types	that  would qualify them for full text
	      indexing, or for all files inside the selected  subtrees,	 inde‐
	      pendent of mime type.

       usesystemfilecommand = boolean
	      Decide  if we use the file -i system command as a final step for
	      determining the mime type for a file (the	 main  procedure  uses
	      suffix associations as defined in the mimemap file). This can be
	      useful for files with suffixless names, but it will  also	 cause
	      the indexing of many bogus "text" files.

       processbeaglequeue = 0/1
	      If  this	is set, process the directory where Beagle Web browser
	      plugins copy visited pages for indexing. Of course, Beagle  MUST
	      NOT be running, else things will behave strangely.

       beaglequeuedir = directorypath
	      The path to the Beagle indexing queue. This is hard-coded in the
	      Beagle plugin as ~/.beagle/ToIndex so there should be no need to
	      change it.

       indexStripChars = 0/1
	      Decide  if we strip characters of diacritics and convert them to
	      lower-case before terms are indexed. If we don't, searches  sen‐
	      sitive  to  case	and diacritics can be performed, but the index
	      will be bigger, and some marginal weirdness may sometimes occur.
	      The  default  is a stripped index (indexStripChars = 1) for now.
	      When using multiple indexes for a search, this parameter must be
	      defined identically for all. Changing the value implies an index
	      reset.

       maxTermExpand = value
	      Maximum expansion count for a  single  term  (e.g.:  when	 using
	      wildcards).  The	default	 of 10000 is reasonable and will avoid
	      queries that appear frozen while the engine is walking the  term
	      list.

       maxXapianClauses = value
	      Maximum  number  of  elementary  clauses	we can add to a single
	      Xapian query. In some cases, the result of term expansion can be
	      multiplicative, and we want to avoid using excessive memory. The
	      default of 100 000 should be both high enough in most cases  and
	      compatible with current typical hardware configurations.

       nonumbers = 0/1
	      If this set to true, no terms will be generated for numbers. For
	      example  "123",  "1.5e6",	 192.168.1.4,  would  not  be  indexed
	      ("value123" would still be). Numbers are often quite interesting
	      to search for, and this should probably not be  set  except  for
	      special  situations,  ie, scientific documents with huge amounts
	      of numbers in them. This can only be set for a whole index,  not
	      for a subtree.

       nocjk = boolean
	      If  this	set to true, specific east asian (Chinese Korean Japa‐
	      nese) characters/word splitting is turned off. This will save  a
	      small  amount of cpu if you have no CJK documents. If your docu‐
	      ment base does include such text but you are not	interested  in
	      searching	 it, setting nocjk may be a significant time and space
	      saver.

       cjkngramlen = value
	      This lets you adjust the size of n-grams used for	 indexing  CJK
	      text.  The  default  value  of 2 is probably appropriate in most
	      cases. A value of 3 would allow more precision and efficiency on
	      longer  words,  but  the	index  will  be approximately twice as
	      large.

       indexstemminglanguages = languages
	      A list of languages for which the stem expansion databases  will
	      be built. See recollindex(1) for possible values.

       defaultcharset = charset
	      The name of the character set used for files that do not contain
	      a character set definition (ie: plain text files). This  can  be
	      redefined for any subdirectory.

       unac_except_trans = list of utf-8 groups
	      This  is a list of characters, encoded in UTF-8, which should be
	      handled specially when converting text to unaccented  lowercase.
	      For  example, in Swedish, the letter "a with diaeresis" has full
	      alphabet citizenship and should not be turned into an a.
	      Each element in the space-separated list has the special charac‐
	      ter as first element and the translation following. The handling
	      of both the lowercase and upper-case  versions  of  a  character
	      should  be  specified, as appartenance to the list will turn-off
	      both standard accent and case processing.
	      Note that the translation is not limited to a single character.
	      This parameter cannot be redefined  for  subdirectories,	it  is
	      global,  because	there is no way to do otherwise when querying.
	      If you have document sets which would need different values, you
	      will have to index and query them separately.

       maildefcharset = charactersetname
	      This  can	 be  used to define the default character set specifi‐
	      cally for email messages which don't specify it. This is	mainly
	      useful  for  readpst  (libpst) dumps, which are utf-8 but do not
	      say so.

       localfields = fieldname = value:...
	      This allows setting fields  for  all  documents  under  a	 given
	      directory.  Typical usage would be to set an "rclaptg" field, to
	      be used in mimeview to select  a	specific  viewer.  If  several
	      fields  are  to  be  set,	 they should be separated with a colon
	      (':') character (which there is currently no way to escape). Ie:
	      localfields=  rclaptg=gnus:other	=  val,	 then select specifier
	      viewer with mimetype|tag=... in mimeview.

       dbdir = directory
	      The name of the Xapian database directory. It will be created if
	      needed when the database is initialized. If this is not an abso‐
	      lute pathname, it will be taken relative	to  the	 configuration
	      directory.

       idxstatusfile = file path
	      The  name	 of the scratch file where the indexer process updates
	      its status.  Default:  idxstatus.txt  inside  the	 configuration
	      directory.

       maxfsoccuppc = percentnumber
	      Maximum  file  system  occupation	 before	 we stop indexing. The
	      value is a percentage, corresponding to what the	"Capacity"  df
	      output  column shows.  The default value is 0, meaning no check‐
	      ing.

       mboxcachedir = directory path
	      The directory where mbox message offsets cache files  are	 held.
	      This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
	      to share a directory between different configurations.

       mboxcacheminmbs = value in megabytes
	      The minimum mbox file size over  which  we  cache	 the  offsets.
	      There is really no sense in caching offsets for small files. The
	      default is 5 MB.

       webcachedir = directory path
	      This is only used by the	Beagle	web  browser  plugin  indexing
	      code,  and  defines where the cache for visited pages will live.
	      Default: $RECOLL_CONFDIR/webcache

       webcachemaxmbs = value in megabytes
	      This is only used by the	Beagle	web  browser  plugin  indexing
	      code,  and  defines  the	maximum	 size  for the web page cache.
	      Default: 40 MB.

       idxflushmb = megabytes
	      Threshold (megabytes of new text data) where we flush from  mem‐
	      ory to disk index. Setting this can help control memory usage. A
	      value of 0 means no explicit flushing, letting  Xapian  use  its
	      own  default,  which  is	flushing  every	 10000	documents  (or
	      XAPIAN_FLUSH_THRESHOLD), meaning that memory  usage  depends  on
	      average document size. The default value is 10.

       autodiacsens = 0/1
	      IF the index is not stripped, decide if we automatically trigger
	      diacritics sensitivity if the search term has  accented  charac‐
	      ters  (not in unac_except_trans). Else you need to use the query
	      language and the D modifier to specify  diacritics  sensitivity.
	      Default is no.

       autocasesens = 0/1
	      IF the index is not stripped, decide if we automatically trigger
	      character case sensitivity if the	 search	 term  has  upper-case
	      characters  in  any but the first position. Else you need to use
	      the query language and the C modifier to specify	character-case
	      sensitivity. Default is yes.

       loglevel = value
	      Verbosity	 level	for recoll and recollindex. A value of 4 lists
	      quite a lot of debug/information messages. 3 lists only  errors.
	      daemloglevel  can	 be  used to specify a different value for the
	      real-time indexing daemon.

       logfilename = file
	      Where should the messages go. 'stderr' can be used as a  special
	      value.  daemlogfilename can be used to specify a different value
	      for the real-time indexing daemon.

       mondelaypatterns = list of patterns
	      This allows  specify  wildcard  path  patterns  (processed  with
	      fnmatch(3)  with	0 flag), to match files which change too often
	      and for which a delay should  be	observed  before  re-indexing.
	      This is a space-separated list, each entry being a pattern and a
	      time in seconds, separated by a colon. You can use double quotes
	      if a path entry contains white space. Example:

	      mondelaypatterns = *.log:20 "this one has spaces*:10"

       monixinterval = value in seconds
	      Minimum  interval	 (seconds)  for processing the indexing queue.
	      The real time monitor does not process each event when it	 comes
	      in,  but	will  wait  this  time	for the queue to accumulate to
	      diminish overhead and in order to aggregate multiple  events  to
	      the same file. Default 30 S.

       monauxinterval = value in seconds
	      Period (in seconds) at which the real time monitor will regener‐
	      ate the auxiliary databases (spelling, stemming) if needed.  The
	      default is one hour.

       monioniceclass, monioniceclassdata
	      These  allow  defining  the  ionice  class  and data used by the
	      indexer (default class 3, no data).

       filtermaxseconds = value in seconds
	      Maximum filter execution time, after which it is	aborted.  Some
	      postscript programs just loop...

       filtersdir = directory
	      A	 directory  to	search for the external filter scripts used to
	      index some types of files. The  value  should  not  be  changed,
	      except  if  you  want  to modify one of the default scripts. The
	      value can be redefined for any subdirectory.

       iconsdir = directory
	      The name of the directory where recoll  result  list  icons  are
	      stored. You can change this if you want different images.

       idxabsmlen = value
	      Recoll stores an abstract for each indexed file inside the data‐
	      base. The text can come from an actual 'abstract' section in the
	      document	or  will  just be the beginning of the document. It is
	      stored in the index so that  it  can  be	displayed  inside  the
	      result  lists without decoding the original file. The idxabsmlen
	      parameter defines the size of the stored abstract.  The  default
	      value  is	 250 bytes.  The search interface gives you the choice
	      to display this stored text or a	synthetic  abstract  built  by
	      extracting  text	around	the search terms. If you always prefer
	      the synthetic abstract, you can reduce this  value  and  save  a
	      little space.

       aspellLanguage = lang
	      Language definitions to use when creating the aspell dictionary.
	      The value must match a set of aspell language definition	files.
	      You  can	type  "aspell config" to see where these are installed
	      (look for data-dir). The default if the variable is not  set  is
	      to  use  your desktop national language environment to guess the
	      value.

       noaspell = boolean
	      If this is set, the aspell dictionary generation is turned  off.
	      Useful  for cases where you don't need the functionality or when
	      it is unusable because aspell crashes during dictionary  genera‐
	      tion.

       mhmboxquirks = flags
	      This  allows  definining location-related quirks for the mailbox
	      handler. Currently only the tbird flag is defined, and it should
	      be  set  for  directories	 which hold Thunderbird data, as their
	      folder format is weird.

SEE ALSO
       recollindex(1) recoll(1)

			       14 November 2012			RECOLL.CONF(5)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net