SWISH-LIBRARY man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

SWISH-LIBRARY(1)	     SWISH-E Documentation	      SWISH-LIBRARY(1)

NAME
       SWISH-LIBRARY - Interface to the Swish-e C library

OVERVIEW
       The C library in an interface to the Swish-e search code.  It provides
       a way to embed Swish-e into your applications.  This API is based on
       Swish-e version 2.3.

       Note: This is a NEW API as of Swish-e version 2.3.  The C language
       interface has changed as has the perl interface to Swish-e.  The new
       Perl interface is the SWISH::API module and is included with the Swish-
       e distribution.	The old SWISHE perl module has been rewritten to work
       with the new API.  The SWISHE perl module is no longer included with
       the Swish-e distribution, but can be downloaded from the Swish-e web
       site.

       The advantage of the library is that the index files or files can be
       opened one time and many queries made on the open index.	 This saves
       the startup time required to fork and run the swish-e binary, and the
       expensive time of opening up the index file.  Some benchmarks have
       shown a three fold increase in speed.

       The downside is that your program now has more code and data in it (the
       index tables can use quite a bit of memory), and if a fatal error hap‐
       pens in swish it will bring down your program.  These are things to
       think about, especially if embedding swish into a web server such as
       Apache where there are many processes serving requests.

       The best way to learn about the library is to look at two files
       included with the Swish-e distribution that make use of the library.

       src/libtest.c
	   This file gives a basic overview of linking a C program with the
	   Swish-e library.  Not all available functions are used in that
	   example, but it should give you a good overview of building a C
	   program with swish-e.

	   To build and run libtest chdir to the src directory and run the
	   commands:

	       $ make libtest
	       $ ./libtest [optional name of index file]

	   You will be prompted for the search words.  The default index used
	   is index.swish-e.  This can be overridden by placing a list of
	   index files in a quote-protected string.

	       $ ./libtest 'index1 index2 index3'

       perl/API.xs
	   The API.xs file is a Perl "xsub" interface to the C library and is
	   part of the SWISH::API Perl module.	This is an object-oriented
	   interface to the Swish-e library and demonstrates how the various
	   search "objects" are created by C calls and how they are destroyed
	   when no longer needed.

Installing the Swish-e library
       The Swish-e library is installed when you run "make install" when
       building Swish-e.  No extra installation steps are required.

       The library consists of a header file "swish-e.h" and a library "lib‐
       swish-e.*" that can either be a static or shared library depending on
       your platform.

Library Overview
       When you first attach to an index file (or index files) you are
       returned a "swish handle".  From the handle you create one or more
       "search objects" which holds the parameters to query the index, such as
       the query string, sort order, search phrase delimiter, limit parameters
       and HTML structure bits.	 The "object" is really just a pointer to a C
       structure, but it's helpful to think of it as an object that data and
       functionality associated with it.

       The search object is used to query the index.  A query returns a
       "results object".  The results object holds the number of hits, the
       parsed query per index, and the result set.  The results object keeps
       track of the current position in the result set.	 You may "seek" to a
       specific record within the result set (useful for displaying a page of
       results).

       Finally, a result object represents a single result from the result
       list.  A result object provides access to the result's properties (such
       as file name, rank, etc.).

       In addition to results, there are functions available to access the
       header values stored in the index file, functions to check and report
       errors, and a few utility functions.

Available Functions
       Below is the list of available function included in the Swish-e C lan‐
       guage API.

       These functions (and typedefs) are defined in the swish-e.h header
       file.  The common objects (e.g. structures) used are:

	   SW_HANDLE  - swish handle that associates with an index file
	   SW_SEARCH  - search "object" that holds search parameters
	   SW_RESULTS - results "object" that holds a result set
	   SW_RESULT  - a single result used for accessing the result's properties
	   SW_FUZZYWORD - used for fuzzy (stemming) word conversion

       Searching

       SW_HANDLE SwishInit(char *IndexFiles);
	   This functions opens and reads the header info of the index files
	   included in IndexFiles string.  The string should contain a space-
	   separated list of index files.

	       SW_HANDLE myhandle;
	       myhandle = SwishInit("file1.idx");

	   Typically you will open a handle at the beginning of your program
	   and use it to make multiple queries on an index.

	   This function will always return a swish handle.  You must check
	   for errors, and on error free the memory used by the handle, or
	   abort.

	   Here's an example of aborting:

	       SW_HANDLE swish_handle;
	       swish_handle = SwishInit("file1.idx file2.idx");
	       if ( SwishError( swish_handle ) )
		   SwishAbortLastError( swish_handle );

	   And here's an example of catching the error:

	       SW_HANDLE swish_handle;
	       swish_handle = SwishInit("file1.idx file2.idx");
	       if ( SwishError( swish_handle ) )
	       {
		   printf("Failed to connect to swish. %s\n", SwishErrorString( swish_handle ) );
		   SwishClose( swish_handle );	/* free the memory used */
		   return 0;
	       }

	   You may have more than one handle active at a time.

	   Swish-e will not tell you if the index file changes on disk (such
	   as after reindexing).  In a persistent environment (e.g. mod_perl)
	   the calling program should check to see if the index file has
	   changed on disk.  A common way to do this is to store the inode
	   number before opening the index file(s), and then stat the file
	   name every so often and reopen the index files if the inode number
	   changes.

       void SwishClose(SW_HANDLE handle);
	   This function closes and frees the memory of a Swish handle.	 Every
	   swish handle should be freed when done searching the index.	Fail‐
	   ing to close the handle will result in a memory leak.

       SW_SEARCH New_Search_Object(SW_HANDLE handle, const char *query);
	   Returns a new search "object".  The search object holds the parame‐
	   ters used for searching an index.  A single search object can be
	   used to query the index multiple times.  The available settings
	   listed below are "sticky" in that they remain set on the search
	   object until change.

       int SwishGetStructure( SW_SEARCH srch );
	   Returns the "structure" flag of the search object passed or 0 if
	   the search object is NULL.

       void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );
	   Sets the phrase delimiter character.	 The default is double-quotes.

       char SwishGetPhraseDelimiter( SW_SEARCH srch );
	   Returns the phrase delimiter character used in the search object or
	   0 if the search object is NULL.

       void SwishSetStructure( SW_SEARCH srch, int structure );
	   Sets the "structure" flag in the search object.  The structure flag
	   is used to limit searches to parts of HTML files (such as to the
	   title or headers).  The default is to not limit.  This provides the
	   functionality of the -H command line switch.

       void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );
	   Sets the phrase delimiter character.	 The default is double-quotes.

       void SwishSetSort( SW_SEARCH srch, char *sort );
	   Sets the sort order of the results.	This is the same as the -s
	   switch used with the swish-e binary.

       void SwishSetQuery( SW_SEARCH srch, char *query );
	   Sets the query string in the search object.	This typically is not
	   needed since it can be set when creating the search object or when
	   executing a query.

       void SwishSetSearchLimit( SW_SEARCH srch, char *propertyname, char
       *low, char *hi);
	   Sets the limit parameters for a search.  Provides the same func‐
	   tionality as the -L command line switch.  You may specify a range
	   of property values that search results must be within.  You may
	   call SwishSetSearchLimit() only one time for each property (but can
	   set limits on more than one property at a time).

	   Unlike the other settings on the search object, once you run a
	   query on the search object you must call SwishResetSearchLimit() to
	   change or clear the limit parameters.

       void SwishResetSearchLimit( SW_SEARCH srch );
	   Resets the limits set on a search object set by SwishSetSearch‐
	   Limit().

       void Free_Search_Object( SW_SEARCH srch );
	   Frees the search object.  This must be called when done with the
	   search object.  Generally, you can reuse a search object for multi‐
	   ple queries so typically you would call this right before calling
	   SwishClose().

	   You may free the search object before freeing and generated results
	   objects.

       SW_RESULTS SwishExecute( SW_SEARCH search, const char *query);
	   Searches the index or indexes based on the parameters in the search
	   object.  Returns a results object.  See below for functions to
	   access the data stored in the results object.

	   You should always check for errors after calling SwishExecute().

       SW_RESULTS SwishQuery(SW_HANDLE, const char *words );
	   This is a short-cut function that bypasses the creation of a search
	   object (actually, bypasses the need to create and free a search
	   object).  This only allows passing in a query string; other search
	   parameters cannot be set.  The results are sorted by rank.

	   You should always check for errors after calling SwishQuery().

       Reading Results

       int SwishHits( SW_RESULTS results );
	   Returns the number of results in the results object.

       SWISH_HEADER_VALUE SwishParsedWords( SW_RESULTS, const char *index_name
       );
	   Returns the tokenized query.	 Words are split by WordCharacters and
	   stopwords are removed.  The parsed words are useful for highlight‐
	   ing search terms in your program.

	   The "index_name" is the name of the index supplied in the
	   SwishInit() function call.

	   Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a
	   char **.  See src/libtest.c for an example of accessing the strings
	   in this list, but in general you may cast this to a (char **).

       SWISH_HEADER_VALUE SwishRemovedStopwords( SW_RESULTS, const char
       *index_name );
	   Returns a list of stopwords removed from the input query.

	   Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a
	   char **.  See src/libtest.c for an example of accessing the strings
	   in this list, but in general you may cast this to a (char **).

       int SwishSeekResult( SW_RESULTS, int position );
	   Sets the current seek position in the list of results, with posi‐
	   tion zero being the first record (unlike -b where one is the first
	   result).

	   Returns the position or a negative number on error.

       SW_RESULT SwishNextResult( SW_RESULTS );
	   Returns the next result, or NULL if not more results are available.

	   The result object returned does not need to be freed after use
	   (unlike the swish handle, search object, and results object).

       const char *SwishResultPropertyStr(SW_RESULT, char *propertyname);
	   This function is mostly useful for testing as it returns odd
	   results on errors.

	   Aborts if called with a NULL SW_RESULT object

	   Returns a string value of the specified property.

	   Returns the empty string "" if the current result does not have the
	   specified property assigned.

	   Returns the string "(null)" on invalid property name (i.e. property
	   name is not defined in the index) and sets an error (see below)
	   indicating the invalid property name.

	   The string returned does not need to be freed, but is only valid
	   for the current result.  If you wish to save the string you must
	   copy it locally.

	   Dates are formatted using the hard-coded format string: "%Y-%m-%d
	   %H:%M:%S" in localtime.

       unsigned long SwishResultPropertyULong(SW_RESULT r, char *property‐
       name);
	   Returns a numeric property as an unsigned long.  Numeric properties
	   are used for both PropertyNamesNumeric and PropertyNamesDate type
	   of properties.  Dates are returned as a unix timestamp as reported
	   by the system when the index was created.

	   Swish-e will abort if called with a NULL SW_RESULT object.  Without
	   the SW_RESULT object swish-e cannot set any error codes.

	   On error returns UMAX_LONG.	This is commonly defined in limits.h.
	   Check SwishError() (see below) for the type of error.

	   If SwishError() returns false (zero) then it simply means that this
	   result does not have any data for the specified property.

	   If SwishError() returns true (non-zero) then either the property‐
	   name specified is invalid, or the property requested is not a
	   numeric (or date) property (e.g. it's a string property).

	   See below on how to fetch the specific error message when SwishEr‐
	   ror() is true.

       PropValue *getResultPropValue (SW_RESULT r, char *propertyname, int ID
       );
	   This is a low-level function to fetch a property regardless of
	   type.  This is likely the best function for accessing properties.

	   Swish-e will abort if called with a NULL SW_RESULT object.  Proper‐
	   tyname is the name of the property.	ID is the id number of the
	   property, if known.	ID is not normally used in the API, but it's
	   purpose is to avoid looking up the property ID for every result
	   displayed.

	   The return PropValue is a structure that contains a flag to indi‐
	   cate the type, and a union that holds the property value.  They
	   flags and structure are defined in swish-e.h.

	   The property must be copied locally and the returned "PropValue"
	   value must be freed by calling freeResultPropValue() to avoid a
	   memory leak.

	   On error returns NULL.  Check SwishError() (see below) for the type
	   of error.

	   If returns NULL but SwishError() returns false (zero) then it sim‐
	   ply means that this result does not have any data for the specified
	   property.

	   If SwishError() returns true (non-zero) then the property name
	   specified is invalid (i.e. not defined for the index).

	   See below on how to fetch the specific error message when SwishEr‐
	   ror() is true.

	   See perl/API.xs for an example on using this function.

       void freeResultPropValue(void)
	   Frees the "PropValue" returned after calling getResultPropValue().

       void Free_Results_Object( SW_RESULTS results );
	   Frees the results object (frees the result set).  This must be
	   called when done reading the results and before calling Swish‐
	   Close().

       Accessing the Index Header Values

       Each index file has associated header values that describe the index.
       These functions provide access to this data.  The header data is
       returned as a union SWISH_HEADER_VALUE, and a pointer to a
       SWISH_HEADER_TYPE is passed in and the returned value indicates the
       type of data that is returned.  See src/libtest.c and perl/API.xs for
       examples.

       const char **SwishHeaderNames( SW_HANDLE );
	   Returns the list of possible header names.  This list is the same
	   for all index files of a given version of Swish-e.  It provides a
	   way to gain access to all headers without having to list them in
	   your program.

       const char **SwishIndexNames( SW_HANDLE );
	   Returns a list of index files opened.  This is just the list of
	   index files specified in the SwishInit() call.  You need the name
	   of the index file to access a specific index's header values.

       SWISH_HEADER_VALUE SwishHeaderValue( SW_HANDLE, const char *index_name,
       const  char *cur_header, SWISH_HEADER_TYPE *type );
	   Fetches the header value for the given index file, and the header
	   name.  The call sets the "type" passed in to the type of value
	   returned.

	   See src/libtest.c and perl/API.xs for examples.

       SWISH_HEADER_VALUE SwishResultIndexValue( SW_RESULT, const char *name,
       SWISH_HEADER_TYPE *type );
	   This is like SwishHeaderValue() above, but instead of supplying an
	   index file name and a swish handle, supply a result object and the
	   header value is fetched from the result's related index file.

       Accessing Property Meta Data

       In addition to the pre-defined standard properties, you have the option
       of adding additional "meta" properties to be indexed and/or added to
       the list of properties returned with each result.  Consult the sections
       on the MetaNames and PropteryNames directives in the CONFIGURATION FILE
       for an explanation of how to do this.

       These functions provide access to the meta data stored in an index.
       You can use them to determine what meta/property information is avail‐
       able for an index including all the pre-defined standard properties.
       See libtest.c for an example.

       SWISH_META_LIST SwishMetaList( SW_HANDLE, const char *index_name );
	   Returns the list of meta entries for the given index file  as a
	   null-terminated array of SW_META objects.  Use the functions below
	   to extract specific fields from the SW_META structure.  Meta's are
	   distinct from properties.

       SWISH_META_LIST SwishPropertyList( SW_HANDLE, const char *index_name );
	   This function is the same as SwishMetaList() but it returns an
	   array of properties as opposed to meta objects.  Property
	   attributes can be extracted in the same was as meta objects using
	   the functions below.

       SWISH_META_LIST SwishResultMetaList( SW_RESULT );
	   This is like SwishMetaList() above but determines the index to use
	   from a result object.

       SWISH_META_LIST SwishResultPropertyList( SW_RESULT );
	   This is like SwishPropertyList() above but like SwishResultMetaL‐
	   ist() uses a result object instead of an index name.

       const char *SwishMetaName( SW_META );
	   Given a SW_META object returned by one of the above, this function
	   will return the meta/property's name.  You can use this name to
	   access a property's value for a given as described above.

       int SwishMetaType( SW_META );
	   Get the data type for the given meta/property. Known types are
	   listed in swish-e.h

       SwishMetaID( SW_META );
	   Get the internal ID number for the given meta/property.  These id's
	   are unique per index file but are not unique per results.

       Checking for Errors

       You should check for errors after all calls.  The last error is stored
       in the swish handle object, and is only valid until the next operation
       (which resets the error flags).

       Currently, some errors are flagged as "critical" errors.	 In these
       cases you should destroy (by calling the SwishClose() function ) the
       current swish handle.  If you have other objects in scope (e.g. a
       search object or results object) destroy those first.

       The types of errors that are critical can be seen in src/error.c.  Cur‐
       rently the list includes:

	   Could not open index file
	   Unknown index file format
	   Index file(s) is empty
	   Index file error
	   Invalid swish handle
	   Invalid results object

       int  SwishError( SW_HANDLE );
	   This returns true if an error condition exists.  It returns the
	   error number, which is a integer less than zero on error.  This
	   should be checked before calling any of the other error functions
	   below.

       const char *SwishErrorString( SW_HANDLE );
	   This returns a general text description of the current error.

       const char *SwishLastErrorMsg( SW_HANDLE );
	   In some cases this will return a string with specifics about the
	   current error.  For example, SwishErrorString() may return "Unknown
	   metaname", but SwishLastErrorMsg() will return a string with the
	   name of the unknown metaname.

       int  SwishCriticalError( SW_HANDLE );
	   Returns true if the current error condition is a critical error.
	   On critical errors you should free up any current objects and call
	   SwishClose() as swish may be in an unstable state.

       void SwishAbortLastError( SW_HANDLE );
	   This is a convenience  function that will format and print the last
	   error message, and then abort the program.

       void set_error_handle( FILE *where );
	   Sets where errors and warnings are printed (when printed by swish).
	   For historical reasons, when swish-e first starts up errors and
	   warnings are sent to stdout.

       void SwishErrorsToStderr( void );
	   A convenience method to send errors to stderr instead of stdout.

       Utility Functions

       const char *SwishWordsByLetter(SWISH * sw, char *indexname, char c);
	   Returns all the words in the index "indexname" that begin with the
	   letter passed in.  Returns NULL if the name of the index file is
	   invalid.

	   This fuction may change in the future since only 8-bit chars can
	   currently be used.

       char * SwsishStemWord( SW_HANDLE sw, char *in_word );
	   Deprecated

	   This can be used to convert a word to its stem.  It uses only the
	   original Porter Stemmer.

       SW_FUZZYWORD SwishFuzzyWord( SW_RESULT r, char *word );
	   Stems "word" based on the fuzzy mode selected during indexing.

	   The fuzzy mode used during indexing is stored in the index file.
	   Since each result is linked to a given index file this method
	   allows stemming a word based on it's index file.

	   One possible use for this is to highlight search terms in a docu‐
	   ment summary, which would be based on a given result.

	   The methods below can be used to access the data returned.  The
	   SW_FUZZYWORD object must be freed when done to avoid a memory leak.

       const char **SwishFuzzyWordList( SW_FUZZYWORD fw );
	   Returns a null terminated list of strings returned by the stemmer.
	   In most cases this will be a single string.

	   Here's an example:

	       SW_FYZZYWORD fuzzy_word = SwishFuzzyWord( result );
	       const char **word_list = SwishFuzzyWordList( fuzzy_word );
	       while ( *word_list )
	       {
		   printf("%s\n", *word_list );
		   word_list++;
	       }
	       SwishFuzzyWordFree( fuzzy_word );

	   If the stemmer does not convert the string (for example attempting
	   to stem numeric data) the word_list will contain the original word.
	   To tell if the stemmer actually stemmed the word check the return
	   value with SwishFuzzyWordError().

       int SwishFuzzyWordError( SW_FUZZYWORD fw );
	   This returns zero if the stemming operation was sucessfull, other‐
	   wise it returns a value indicating the reason the word was not
	   stemmed.  The return values are defined in the swish-e src/stem‐
	   mer.h file.

	   Not all stemmers set this value correctly.  But since SwishFuzzy‐
	   WordList() will return a valid string regardless of the return
	   value, you can often just ignore this setting.  That's what I do.

       int SwishFuzzyWordCount( SW_FUZZYWORD fw );
	   Returns the count of string in the word list available by calling
	   SwishFuzzyWordList().

	   This is normally just one, but in the case of DoubleMetaphone it
	   can be one or two (i.e. DoubleMetaphone can return one or two
	   strings).

       const char *SwishFuzzyMode( SW_RESULT r );
	   Returns the name of the stemmer used for the given result (which is
	   related to an index).

       void SwishFuzzyWordFree( SW_FUZZYWORD fw );
	   Frees the memory used by  the SW_FUZZYWORD.

Bug-Reports
       Please report bug reports to the Swish-e discussion group.  Feel also
       free to improve or enhance this feature.

Author
       Original interface: Aug 2000 Jose Ruiz jmruiz@boe.es

       Updated: Aug 22, 2002 - Bill Moseley

       Interface redesigned for Swish-e version 2.3 Oct 17, 2002 - Bill Mose‐
       ley

Document Info
       $Id: SWISH-LIBRARY.pod 1906 2007-02-07 19:25:16Z moseley $

       .

2.4.7				  2009-04-04		      SWISH-LIBRARY(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net