Lingua::EN::Tagger man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Tagger(3)	      User Contributed Perl Documentation	     Tagger(3)

NAME
       Lingua::EN::Tagger - Part-of-speech tagger for English natural language
       processing.

SYNOPSIS
	   # Create a parser object
	   my $p = new Lingua::EN::Tagger;

	   # Add part of speech tags to a text
	   my $tagged_text = $p->add_tags( $text );

	   ...

	   # Get a list of all nouns and noun phrases with occurrence counts
	   my %word_list = $p->get_words( $text );

	   ...

	   # Get a readable version of the tagged text
	   my $readable_text = $p->get_readable( $text );

DESCRIPTION
       The module is a probability based, corpus-trained tagger that assigns
       POS tags to English text based on a lookup dictionary and a set of
       probability values.  The tagger assigns appropriate tags based on
       conditional probabilities - it examines the preceding tag to determine
       the appropriate tag for the current word.  Unknown words are classified
       according to word morphology or can be set to be treated as nouns or
       other parts of speech.

       The tagger also extracts as many nouns and noun phrases as it can,
       using a set of regular expressions.

CONSTRUCTOR
       new %PARAMS
	   Class constructor.  Takes a hash with the following parameters
	   (shown with default values):

	   unknown_word_tag => ''
	       Tag to assign to unknown words

	   stem => 0
	       Stem single words using Lingua::Stem::EN

	   weight_noun_phrases => 0
	       When returning occurrence counts for a noun phrase, multiply
	       the value by the number of words in the NP.

	   longest_noun_phrase => 5
	       Will ignore noun phrases longer than this threshold. This
	       affects only the get_words() and get_nouns() methods.

	   relax => 0
	       Relax the Hidden Markov Model: this may improve accuracy for
	       uncommon words, particularly words used polysemously

METHODS
       add_tags TEXT
	   Examine the string provided and return it fully tagged ( XML style
	   )

       get_words TEXT
	   Given a text string, return as many nouns and noun phrases as
	   possible.  Applies add_tags and involves three stages:

	       * Tag the text
	       * Extract all the maximal noun phrases
	       * Recursively extract all noun phrases from the MNPs

       get_readable TEXT
	   Return an easy-on-the-eyes tagged version of a text string.
	   Applies add_tags and reformats to be easier to read.

       get_sentences TEXT
	   Returns an anonymous array of sentences (without POS tags) from a
	   text.

       get_proper_nouns TAGGED_TEXT
	   Given a POS-tagged text, this method returns a hash of all proper
	   nouns and their occurrence frequencies. The method is greedy and
	   will return multi-word phrases, if possible, so it would find
	   ``Linguistic Data Consortium'' as a single unit, rather than as
	   three individual proper nouns. This method does not stem the found
	   words.

       get_nouns TAGGED_TEXT
	   Given a POS-tagged text, this method returns all nouns and their
	   occurrence frequencies.

       get_max_noun_phrases TAGGED_TEXT
	   Given a POS-tagged text, this method returns only the maximal noun
	   phrases.  May be called directly, but is also used by
	   get_noun_phrases

       get_noun_phrases TAGGED_TEXT
	   Similar to get_words, but requires a POS-tagged text as an
	   argument.

       install
	   Reads some included corpus data and saves it in a stored hash on
	   the local file system. This is called automatically if the tagger
	   can't find the stored lexicon.

AUTHORS
	   Aaron Coburn <aaron@coburncuadrado.com>

CONTRIBUTORS
	   Maciej Ceglowski <developer@ceglowski.com>
	   Eric Nichols, Nara Institute of Science and Technology

COPYRIGHT AND LICENSE
	   Copyright 2003-2010 Aaron Coburn <aaron@coburncuadrado.com>

	   This program is free software; you can redistribute it and/or modify
	   it under the terms of version 3 of the GNU General Public License as
	   published by the Free Software Foundation.

perl v5.14.2			  2010-05-11			     Tagger(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net