Lingua::Stem::En man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Lingua::Stem::En(3)   User Contributed Perl Documentation  Lingua::Stem::En(3)

NAME
       Lingua::Stem::En - Porter's stemming algorithm for 'generic' English

SYNOPSIS
	   use Lingua::Stem::En;
	   my $stems   = Lingua::Stem::En::stem({ -words => $word_list_reference,
					       -locale => 'en',
					   -exceptions => $exceptions_hash,
					    });

DESCRIPTION
       This routine applies the Porter Stemming Algorithm to its parameters,
       returning the stemmed words.

       It is derived from the C program "stemmer.c" as found in freewais and
       elsewhere, which contains these notes:

	  Purpose:    Implementation of the Porter stemming algorithm documented
		      in: Porter, M.F., "An Algorithm For Suffix Stripping,"
		      Program 14 (3), July 1980, pp. 130-137.
	  Provenance: Written by B. Frakes and C. Cox, 1986.

       I have re-interpreted areas that use Frakes and Cox's "WordSize"
       function. My version may misbehave on short words starting with "y",
       but I can't think of any examples.

       The step numbers correspond to Frakes and Cox, and are probably in
       Porter's article (which I've not seen).	Porter's algorithm still has
       rough spots (e.g current/currency, -ings words), which I've not
       attempted to cure, although I have added support for the British -ise
       suffix.

CHANGES
	1999.06.15 - Changed to '.pm' module, moved into Lingua::Stem namespace,
		     optionalized the export of the 'stem' routine
		     into the caller's namespace, added named parameters

	1999.06.24 - Switch core implementation of the Porter stemmer to
		     the one written by Jim Richardson <jimr@maths.usyd.edu.au>

	2000.08.25 - 2.11 Added stemming cache

	2000.09.14 - 2.12 Fixed *major* :( implementation error of Porter's algorithm
		     Error was entirely my fault - I completely forgot to include
		     rule sets 2,3, and 4 starting with Lingua::Stem 0.30.
		     -- Benjamin Franz

	2003.09.28 - 2.13 Corrected documentation error pointed out by Simon Cozens.

	2005.11.20 - 2.14 Changed rule declarations to conform to Perl style convention
		     for 'private' subroutines. Changed Exporter invokation to more
		     portable 'require' vice 'use'.

	2006.02.14 - 2.15 Added ability to pass word list by 'handle' for in-place stemming.

	2009.07.27   2.16 Documentation Fix

METHODS
       stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions
       });
	   Stems a list of passed words using the rules of US English. Returns
	   an anonymous array reference to the stemmed words.

	   Example:

	     my @words	       = ( 'wordy', 'another' );
	     my $stemmed_words = Lingua::Stem::En::stem({ -words => \@words,
							 -locale => 'en',
						     -exceptions => \%exceptions,
				     });

	   If the first element of @words is a list reference, then the
	   stemming is performed 'in place' on that list (modifying the passed
	   list directly instead of copying it to a new array).

	   This is only useful if you do not need to keep the original list.
	   If you do need to keep the original list, use the normal semantic
	   of having 'stem' return a new list instead - that is faster than
	   making your own copy and using the 'in place' semantics since the
	   primary difference between 'in place' and 'by value' stemming is
	   the creation of a copy of the original list.	 If you don't need the
	   original list, then the 'in place' stemming is about 60% faster.

	   Example of 'in place' stemming:

	     my $words	       = [ 'wordy', 'another' ];
	     my $stemmed_words = Lingua::Stem::En::stem({ -words => [$words],
				     -locale => 'en',
				 -exceptions => \%exceptions,
				 });

	   The 'in place' mode returns a reference to the original list with
	   the words stemmed.

       stem_caching({ -level => 0|1|2 });
	   Sets the level of stem caching.

	   '0' means 'no caching'. This is the default level.

	   '1' means 'cache per run'. This caches stemming results during a
	   single
	       call to 'stem'.

	   '2' means 'cache indefinitely'. This caches stemming results until
	       either the process exits or the 'clear_stem_cache' method is
	   called.

       clear_stem_cache;
	   Clears the cache of stemmed words

NOTES
       This code is almost entirely derived from the Porter 2.1 module written
       by Jim Richardson.

SEE ALSO
	Lingua::Stem

AUTHOR
	 Jim Richardson, University of Sydney
	 jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html

	 Integration in Lingua::Stem by
	 Benjamin Franz, FreeRun Technologies,
	 snowhare@nihongo.org or http://www.nihongo.org/snowhare/

COPYRIGHT
       Jim Richardson, University of Sydney Benjamin Franz, FreeRun
       Technologies

       This code is freely available under the same terms as Perl.

BUGS
TODO
perl v5.14.2			  2012-01-14		   Lingua::Stem::En(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net