KinoSearch1::Analysis::Tokenizer man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

KinoSearch1::Analysis:UsereContributed PerlKinoSearch1::Analysis::Tokenizer(3)

NAME
       KinoSearch1::Analysis::Tokenizer - customizable tokenizing

SYNOPSIS
	   my $whitespace_tokenizer
	       = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/\S+/, );

	   # or...
	   my $word_char_tokenizer
	       = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/\w+/, );

	   # or...
	   my $apostrophising_tokenizer = KinoSearch1::Analysis::Tokenizer->new;

	   # then... once you have a tokenizer, put it into a PolyAnalyzer
	   my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
	       analyzers => [ $lc_normalizer, $word_char_tokenizer, $stemmer ], );

DESCRIPTION
       Generically, "tokenizing" is a process of breaking up a string into an
       array of "tokens".

	   # before:
	   my $string = "three blind mice";

	   # after:
	   @tokens = qw( three blind mice );

       KinoSearch1::Analysis::Tokenizer decides where it should break up the
       text based on the value of "token_re".

	   # before:
	   my $string = "Eats, Shoots and Leaves.";

	   # tokenized by $whitespace_tokenizer
	   @tokens = qw( Eats, Shoots and Leaves. );

	   # tokenized by $word_char_tokenizer
	   @tokens = qw( Eats Shoots and Leaves	  );

METHODS
   new
	   # match "O'Henry" as well as "Henry" and "it's" as well as "it"
	   my $token_re = qr/
		   \b	     # start with a word boundary
		   \w+	     # Match word chars.
		   (?:	     # Group, but don't capture...
		      '\w+   # ... an apostrophe plus word chars.
		   )?	     # Matching the apostrophe group is optional.
		   \b	     # end with a word boundary
	       /xsm;
	   my $tokenizer = KinoSearch1::Analysis::Tokenizer->new(
	       token_re => $token_re, # default: what you see above
	   );

       Constructor.  Takes one hash style parameter.

       ยท   token_re - must be a pre-compiled regular expression matching one
	   token.

COPYRIGHT
       Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.
       See KinoSearch1 version 1.01.

perl v5.14.1			  2011-06-2KinoSearch1::Analysis::Tokenizer(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net