KinoSearch::Docs::Cookbook::CustomQuery man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

KinoSearch::Docs::CookUser:ContributKinoSearch::Docs::Cookbook::CustomQuery(3)

NAME
       KinoSearch::Docs::Cookbook::CustomQuery - Sample subclass of Query.

ABSTRACT
       Explore KinoSearch's support for custom query types by creating a
       "PrefixQuery" class to handle trailing wildcards.

	   my $prefix_query = PrefixQuery->new(
	       field	    => 'content',
	       query_string => 'foo*',
	   );
	   my $hits = $searcher->hits( query => $prefix_query );
	   ...

Query, Compiler, and Matcher
       To add support for a new query type, we need three classes: a Query, a
       Compiler, and a Matcher.

       ·   PrefixQuery - a subclass of KinoSearch::Search::Query, and the only
	   class that client code will deal with directly.

       ·   PrefixCompiler - a subclass of KinoSearch::Search::Compiler, whose
	   primary role is to compile a PrefixQuery to a PrefixScorer.

       ·   PrefixScorer - a subclass of KinoSearch::Search::Matcher, which
	   does the heavy lifting: it applies the query to individual
	   documents and assigns a score to each match.

       The PrefixQuery class on its own isn't enough because a Query object's
       role is limited to expressing an abstract specification for the search.
       A Query is basically nothing but metadata; execution is left to the
       Query's companion Compiler and Matcher.

       Here's a simplified sketch illustrating how a Searcher's hits() method
       ties together the three classes.

	   sub hits {
	       my ( $self, $query ) = @_;
	       my $compiler = $query->make_compiler( searcher => $self );
	       my $matcher = $compiler->make_matcher(
		   reader     => $self->get_reader,
		   need_score => 1,
	       );
	       my @hits = $matcher->capture_hits;
	       return \@hits;
	   }

   PrefixQuery
       Our PrefixQuery class will have two attributes: a query string and a
       field name.

	   package PrefixQuery;
	   use base qw( KinoSearch::Search::Query );
	   use Carp;
	   use Scalar::Util qw( blessed );

	   # Inside-out member vars and hand-rolled accessors.
	   my %query_string;
	   my %field;
	   sub get_query_string { my $self = shift; return $query_string{$$self} }
	   sub get_field	{ my $self = shift; return $field{$$self} }

       PrefixQuery's constructor collects and validates the attributes.

	   sub new {
	       my ( $class, %args ) = @_;
	       my $query_string = delete $args{query_string};
	       my $field	= delete $args{field};
	       my $self		= $class->SUPER::new(%args);
	       confess("'query_string' param is required")
		   unless defined $query_string;
	       confess("Invalid query_string: '$query_string'")
		   unless $query_string =~ /\*\s*$/;
	       confess("'field' param is required")
		   unless defined $field;
	       $query_string{$$self} = $query_string;
	       $field{$$self}	     = $field;
	       return $self;
	   }

       Since this is an inside-out class, we'll need a destructor:

	   sub DESTROY {
	       my $self = shift;
	       delete $query_string{$$self};
	       delete $field{$$self};
	       $self->SUPER::DESTROY;
	   }

       The equals() method determines whether two Queries are logically
       equivalent:

	   sub equals {
	       my ( $self, $other ) = @_;
	       return 0 unless blessed($other);
	       return 0 unless $other->isa("PrefixQuery");
	       return 0 unless $field{$$self} eq $field{$$other};
	       return 0 unless $query_string{$$self} eq $query_string{$$other};
	       return 1;
	   }

       The last thing we'll need is a make_compiler() factory method which
       kicks out a subclass of Compiler.

	   sub make_compiler {
	       my $self = shift;
	       return PrefixCompiler->new( @_, parent => $self );
	   }

   PrefixCompiler
       PrefixQuery's make_compiler() method will be called internally at
       search-time by objects which subclass KinoSearch::Search::Searcher --
       such as IndexSearchers.

       A Searcher is associated with a particular collection of documents.
       These documents may all reside in one index, as with IndexSearcher, or
       they may be spread out across multiple indexes on one or more machines,
       as with KinoSearch::Search::PolySearcher.

       Searcher objects have access to certain statistical information about
       the collections they represent; for instance, a Searcher can tell you
       how many documents are in the collection...

	   my $maximum_number_of_docs_in_collection = $searcher->doc_max;

       ... or how many documents a specific term appears in:

	   my $term_appears_in_this_many_docs = $searcher->doc_freq(
	       field => 'content',
	       term  => 'foo',
	   );

       Such information can be used by sophisticated Compiler implementations
       to assign more or less heft to individual queries or sub-queries.
       However, we're not going to bother with weighting for this demo; we'll
       just assign a fixed score of 1.0 to each matching document.

       We don't need to write a constructor, as it will suffice to inherit
       new() from KinoSearch::Search::Compiler.	 The only method we need to
       implement for PrefixCompiler is make_matcher().

	   package PrefixCompiler;
	   use base qw( KinoSearch::Search::Compiler );

	   sub make_matcher {
	       my ( $self, %args ) = @_;
	       my $seg_reader = $args{reader};

	       # Retrieve low-level components LexiconReader and PostingListReader.
	       my $lex_reader
		   = $seg_reader->obtain("KinoSearch::Index::LexiconReader");
	       my $plist_reader
		   = $seg_reader->obtain("KinoSearch::Index::PostingListReader");

	       # Acquire a Lexicon and seek it to our query string.
	       my $substring = $self->get_parent->get_query_string;
	       $substring =~ s/\*.\s*$//;
	       my $field = $self->get_parent->get_field;
	       my $lexicon = $lex_reader->lexicon( field => $field );
	       return unless $lexicon;
	       $lexicon->seek($substring);

	       # Accumulate PostingLists for each matching term.
	       my @posting_lists;
	       while ( defined( my $term = $lexicon->get_term ) ) {
		   last unless $term =~ /^\Q$substring/;
		   my $posting_list = $plist_reader->posting_list(
		       field => $field,
		       term  => $term,
		   );
		   if ($posting_list) {
		       push @posting_lists, $posting_list;
		   }
		   last unless $lexicon->next;
	       }
	       return unless @posting_lists;

	       return PrefixScorer->new( posting_lists => \@posting_lists );
	   }

       PrefixCompiler gets access to a SegReader object when make_matcher()
       gets called.  From the SegReader and its sub-components LexiconReader
       and PostingListReader, we acquire a Lexicon, scan through the Lexicon's
       unique terms, and acquire a PostingList for each term that matches our
       prefix.

       Each of these PostingList objects represents a set of documents which
       match the query.

   PrefixScorer
       The Matcher subclass is the most involved.

	   package PrefixScorer;
	   use base qw( KinoSearch::Search::Matcher );

	   # Inside-out member vars.
	   my %doc_ids;
	   my %tick;

	   sub new {
	       my ( $class, %args ) = @_;
	       my $posting_lists = delete $args{posting_lists};
	       my $self		 = $class->SUPER::new(%args);

	       # Cheesy but simple way of interleaving PostingList doc sets.
	       my %all_doc_ids;
	       for my $posting_list (@$posting_lists) {
		   while ( my $doc_id = $posting_list->next ) {
		       $all_doc_ids{$doc_id} = undef;
		   }
	       }
	       my @doc_ids = sort { $a <=> $b } keys %all_doc_ids;
	       $doc_ids{$$self} = \@doc_ids;

	       # Track our position within the array of doc ids.
	       $tick{$$self} = -1;

	       return $self;
	   }

	   sub DESTROY {
	       my $self = shift;
	       delete $doc_ids{$$self};
	       delete $tick{$$self};
	       $self->SUPER::DESTROY;
	   }

       The doc ids must be in order, or some will be ignored; hence the "sort"
       above.

       In addition to the constructor and destructor, there are three methods
       that must be overridden.

       next() advances the Matcher to the next valid matching doc.

	   sub next {
	       my $self	    = shift;
	       my $doc_ids = $doc_ids{$$self};
	       my $tick	    = ++$tick{$$self};
	       return 0 if $tick >= scalar @$doc_ids;
	       return $doc_ids->[$tick];
	   }

       get_doc_id() returns the current document id, or 0 if the Matcher is
       exhausted.  (Document numbers start at 1, so 0 is a sentinel.)

	   sub get_doc_id {
	       my $self	    = shift;
	       my $tick	    = $tick{$$self};
	       my $doc_ids = $doc_ids{$$self};
	       return $tick < scalar @$doc_ids ? $doc_ids->[$tick] : 0;
	   }

       score() conveys the relevance score of the current match.  We'll just
       return a fixed score of 1.0:

	   sub score { 1.0 }

Usage
       To get a basic feel for PrefixQuery, insert the FlatQueryParser module
       described in KinoSearch::Docs::Cookbook::CustomQueryParser (which
       supports PrefixQuery) into the search.cgi sample app.

	   my $parser = FlatQueryParser->new( schema => $searcher->get_schema );
	   my $query  = $parser->parse($q);

       If you're planning on using PrefixQuery in earnest, though, you may
       want to change up analyzers to avoid stemming, because stemming --
       another approach to prefix conflation -- is not perfectly compatible
       with prefix searches.

	   # Polyanalyzer with no Stemmer.
	   my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
	       analyzers => [
		   KinoSearch::Analysis::Tokenizer->new,
		   KinoSearch::Analysis::CaseFolder->new,
	       ],
	   );

COPYRIGHT AND LICENSE
       Copyright 2008-2010 Marvin Humphrey

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.14.1			  20KinoSearch::Docs::Cookbook::CustomQuery(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net