KinoSearch::Docs::Tutorial::Analysis man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

KinoSearch::Docs::TutoUser:Contributed)KinoSearch::Docs::Tutorial::Analysis(3)

NAME
       KinoSearch::Docs::Tutorial::Analysis - How to choose and use Analyzers.

DESCRIPTION
       Try swapping out the PolyAnalyzer in our Schema for a Tokenizer:

	   my $tokenizer = KinoSearch::Analysis::Tokenizer->new;
	   my $type = KinoSearch::Plan::FullTextType->new(
	       analyzer => $tokenizer,
	   );

       Search for "senate", "Senate", and "Senator" before and after making
       the change and re-indexing.

       Under PolyAnalyzer, the results are identical for all three searches,
       but under Tokenizer, searches are case-sensitive, and the result sets
       for "Senate" and "Senator" are distinct.

   PolyAnalyzer
       What's happening is that PolyAnalyzer is performing more aggressive
       processing than Tokenizer.  In addition to tokenizing, it's also
       converting all text to lower case so that searches are case-
       insensitive, and using a "stemming" algorithm to reduce related words
       to a common stem ("senat", in this case).

       PolyAnalyzer is actually multiple Analyzers wrapped up in a single
       package.	 In this case, it's three-in-one, since specifying a
       PolyAnalyzer with "language => 'en'" is equivalent to this snippet:

	   my $case_folder  = KinoSearch::Analysis::CaseFolder->new;
	   my $tokenizer    = KinoSearch::Analysis::Tokenizer->new;
	   my $stemmer	    = KinoSearch::Analysis::Stemmer->new( language => 'en' );
	   my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
	       analyzers => [ $case_folder, $tokenizer, $stemmer ],
	   );

       You can add or subtract Analyzers from there if you like.  Try adding a
       fourth Analyzer, a Stopalizer for suppressing "stopwords" like "the",
       "if", and "maybe".

	   my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
	       language => 'en',
	   );
	   my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
	       analyzers => [ $case_folder, $tokenizer, $stopalizer, $stemmer ],
	   );

       Also, try removing the Stemmer.

	   my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
	       analyzers => [ $case_folder, $tokenizer ],
	   );

       The original choice of a stock English PolyAnalyzer probably still
       yields the best results for this document collection, but you get the
       idea: sometimes you want a different Analyzer.

   When the best Analyzer is no Analyzer
       Sometimes you don't want an Analyzer at all.  That was true for our
       "url" field because we didn't need it to be searchable, but it's also
       true for certain types of searchable fields.  For instance, "category"
       fields are often set up to match exactly or not at all, as are fields
       like "last_name" (because you may not want to conflate results for
       "Humphrey" and "Humphries").

       To specify that there should be no analysis performed at all, use
       StringType:

	   my $type = KinoSearch::Plan::StringType->new;
	   $schema->spec_field( name => 'category', type => $type );

   Highlighting up next
       In our next tutorial chapter, KinoSearch::Docs::Tutorial::Highlighter,
       we'll add highlighted excerpts from the "content" field to our search
       results.

COPYRIGHT AND LICENSE
       Copyright 2008-2010 Marvin Humphrey

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.14.1			  2011-KinoSearch::Docs::Tutorial::Analysis(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net