mifluz man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]


mifluz(3)							     mifluz(3)

NAME
       mifluz - C++ library to use and manage inverted indexes

SYNOPSIS
       #include <mifluz.h>

       main()
       {
	  Configuration* config = WordContext::Initialize();

	  WordList* words = new WordList(*config);

	  ...

	  delete words;

	  WordContext::Finish();
       }

DESCRIPTION
       The  purpose of mifluz is to provide a C++ library to build and query a
       full text inverted index. It is dynamically updatable, scalable (up  to
       1Tb  indexes),  uses  a controlled amount of memory, shares index files
       and memory cache among processes or threads and compresses index	 files
       to  50%	of the raw data. The structure of the index is configurable at
       runtime and allows inclusion  of	 relevance  ranking  information.  The
       query  functions	 do  not  require  loading  all	 the  occurrences of a
       searched term.  They consume very few resources and many	 searches  can
       be run in parallel.

       The  file  management  library used in mifluz is a modified Berkeley DB
       (www.sleepycat.com) version 3.1.14.

CLASSES AND COMMANDS
       Configuration

	      reads the configuration file and manages it in memory.

       WordContext

	      read configuration and setup mifluz context.

       WordCursor

	      abstract class to search and  retrieve  entries  in  a  WordList
	      object.

       WordCursorOne

	      search and retrieve entries in a WordListOne object.

       WordDBInfo
	      inverted index usage environment.

       WordDict

	      manage and use an inverted index dictionary.

       WordKey
	      inverted index key.

       WordKeyInfo
	      information on the key structure of the inverted index.

       WordList

	      abstract class to manage and use an inverted index file.

       WordListOne

	      manage and use an inverted index file.

       WordMonitor
	      monitoring classes activity.

       WordRecord
	      inverted index record.

       WordRecordInfo
	      information on the record structure of the inverted index.

       WordReference
	      inverted index occurrence.

       WordType
	      defines a word in term of allowed characters, length etc.

       htdb_dump

	      dump the content of an inverted index in Berkeley DB fashion

       htdb_load

	      displays statistics for Berkeley DB environments.

       htdb_stat

	      displays statistics for Berkeley DB environments.

       mifluzdict

	      dump the dictionnary of an inverted index.

       mifluzdump

	      dump the content of an inverted index.

       mifluzload

	      load the content of an inverted index.

       mifluzsearch
	      search the content of an inverted index.

CONFIGURATION
       The  format  of	the configuration file read by WordContext::Initialize
       is:
       keyword: value
       Comments may be added on lines starting with a #. The default  configu‐
       ration  file  is	 read  from from the file pointed by the MIFLUZ_CONFIG
       environment variable or ~/.mifluz or /etc/mifluz.conf in this order. If
       no configuration file is available, builtin defaults are used.  Here is
       an example configuration file:
       wordlist_extend: true
       wordlist_cache_size: 10485760
       wordlist_page_size: 32768
       wordlist_compress: 1
       wordlist_wordrecord_description: NONE
       wordlist_wordkey_description: Word/DocID 32/Flags 8/Location 16
       wordlist_monitor: true
       wordlist_monitor_period: 30
       wordlist_monitor_output: monitor.out,rrd

       wordlist_allow_numbers {true|false} <number> (default false)
	      A digit is considered a valid character within a	word  if  this
	      configuration  parameter is set to true otherwise it is an error
	      to insert a word containing digits.  See	the  Normalize	method
	      for more information.

       wordlist_cache_inserts {true|false} (default false)
	      If true all Insert calls are cached in memory. When the WordList
	      object is closed or a different  access  method  is  called  the
	      cached entries are flushed in the inverted index.

       wordlist_cache_max <bytes> (default 0)
	      Maximum  size  of the cumulated cache files generated when doing
	      bulk insertion with the BatchStart() function. When  this	 limit
	      is  reached,  the	 cache	files are all merged into the inverted
	      index.   The  value  0  means  infinite	size   allowed.	   See
	      WordList(3) for the rationale behind cache file handling.

       wordlist_cache_size <bytes> (default 500K)
	      Berkeley	DB  cache  size	 (see Berkeley DB documentation) Cache
	      makes a huge difference in performance. It must be at  least  2%
	      of  the  expected	 total	data size. Note that if compression is
	      activated the data size is eight times larger  than  the	actual
	      file  size.  In  this case the cache must be scaled to 2% of the
	      data size, not 2% of the file size.  See	Cache  tuning  in  the
	      mifluz  guide for more hints.  See WordList(3) for the rationale
	      behind cache file handling.

       wordlist_compress {true|false} (default false)
	      Activate compression of the index. The resulting index is	 eight
	      times smaller than the uncompressed index.

       wordlist_env_dir <directory> (default .)
	      Only  valid  if  wordlist_env_share  set	to  true.  Specify the
	      directory in which the sharable environment will be created. All
	      inverted	indexes specified with a non-absolute pathname will be
	      created relative to this directory.

       wordlist_env_share {true,false} (default false)
	      If true a sharable environment is open or created if none exist.

       wordlist_env_skip {true,false} (default false)
	      If true no environment is created at all.	 This  must  never  be
	      used  if	a WordList object is created. It may be useful if only
	      WordKey objects are used, for instance.

       wordlist_extend {true|false} (default false)
	      If true maintain reference count of unique  words.  The  Noccur‐
	      rence method gives access to this count.

       wordlist_locale <locale> (default C)
	      Set the locale of the program to locale for more information.

       wordlist_lowercase {true|false} <number> (default true)
	      If  a word contains upper case letters it is converted to lower‐
	      case if this configuration parameter is true,  otherwise	it  is
	      left untouched.

       wordlist_maximum_word_length <number> (default 25)
	      The maximum length of a word.  See the Normalize method for more
	      information.

       wordlist_mimimun_word_length <number> (default 3)
	      The minimum length of a word.  See the Normalize method for more
	      information.

       wordlist_monitor {true|false} (default false)
	      If  true	create a WordMonitor instance to gather statistics and
	      build reports.

       wordlist_monitor_output <file>[,{rrd,readable] (default stderr)
	      Print reports on file instead of the default stderr If  type  is
	      set  to  rrd  the output is fit for the benchmark-report script.
	      Otherwise it a (hardly :-) readable string.

       wordlist_monitor_period <sec> (default 0)
	      If the value sec is a positive integer, set  a  timer  to	 print
	      reports  every sec seconds. The timer is set using the ALRM sig‐
	      nal and will fail if the calling application already has a  han‐
	      dler on that signal.

       wordlist_page_size <bytes> (default 8192)
	      Berkeley DB page size (see Berkeley DB documentation)

       wordlist_truncate {true|false} <number> (default true)
	      If   a   word  is	 too  long  according  to  the	wordlist_maxi‐
	      mum_word_length it is truncated if this configuration  parameter
	      is true otherwise it is considered an invalid word.

       wordlist_valid_punctuation [characters] (default none)
	      A	 list  of  punctuation	characters  that may appear in a word.
	      These characters will be removed from the word before  insertion
	      in the index.

       wordlist_verbose <number> (default 0)
	      Set the verbosity level of the WordList class.

	      1 walk logic

	      2 walk logic details

	      3 walk logic lots of details

       wordlist_wordkey_description <desc> (no default)
	      Describe	the  structure of the inverted index key.  In the fol‐
	      lowing explanation of the <desc> format, mandatory words are  in
	      bold and values that must be replaced in italic.

	      Word bits/name bits [/...]

	      The  name	 is an alphanumerical symbolic name for the key field.
	      The bits is the number of bits required  to  store  this	field.
	      Note  that  all values are stored in unsigned integers (unsigned
	      int).  Example:
	      Word 8/Document 16/Location 8

       wordlist_wordkey_document [field ...] (default none)
	      A white space separated list of field numbers that define a doc‐
	      ument.   The  field  number  list	 must  not  contain  gaps. For
	      instance 1 2 3 is valid but 1 3 4 is not valid.  This configura‐
	      tion parameter is not used by the mifluz library but may be used
	      by a query application to define the semantic of a document.  In
	      response	to  a  query,  the  application	 will return a list of
	      results in which only distinct documents will be shown.

       wordlist_wordkey_location field (default none)
	      A single field number that contains the position of a word in  a
	      given document.  This configuration parameter is not used by the
	      mifluz library but may be used by a query application.

       wordlist_wordrecord_description {NONE|DATA|STR} (no default)
	      NONE: the record is empty

	      DATA: the record contains an integer (unsigned int)

	      STR: the record contains a string (String)

ENVIRONMENT
       MIFLUZ_CONFIG file name of configuration file read  by  WordContext(3).
       Defaults to ~/.mifluz.  or /usr/etc/mifluz.conf

AUTHORS
       Loic Dachary loic@gnu.org

       The Ht://Dig group http://dev.htdig.org/

SEE ALSO
       htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1), mifluzload(1),
       mifluzsearch(1),	 mifluzdict(1),	 WordContext(3),  WordList(3),	 Word‐
       Dict(3), WordListOne(3), WordKey(3), WordKeyInfo(3), WordType(3), Word‐
       DBInfo(3), WordRecordInfo(3), WordRecord(3), WordReference(3), WordCur‐
       sor(3), WordCursorOne(3), WordMonitor(3), Configuration(3)

				     local			     mifluz(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net