Parse::Lex man page on Fedora

Parse::Lex man page on Fedora
Printed from http://www.polarhome.com/service/man/?qf=Parse%3A%3ALex&af=0&tf=2&of=Fedora
Parse::Lex(3)	      User Contributed Perl Documentation	 Parse::Lex(3)

NAME
       "Parse::Lex" - Generator of lexical analyzers - moving pointer inside
       text

SYNOPSIS
	       require 5.005;

	       use Parse::Lex;
	       @token = (
		 qw(
		    ADDOP    [-+]
		    LEFTP    [\(]
		    RIGHTP   [\)]
		    INTEGER  [1-9][0-9]*
		    NEWLINE  \n

		   ),
		 qw(STRING),   [qw(" (?:[^"]+|"")* ")],
		 qw(ERROR  .*), sub {
		   die qq!can\'t analyze: "$_[1]"!;
		 }
		);

	       Parse::Lex->trace;  # Class method
	       $lexer = Parse::Lex->new(@token);
	       $lexer->from(\*DATA);
	       print "Tokenization of DATA:\n";

	       TOKEN:while (1) {
		 $token = $lexer->next;
		 if (not $lexer->eoi) {
		   print "Line $.\t";
		   print "Type: ", $token->name, "\t";
		   print "Content:->", $token->text, "<-\n";
		 } else {
		   last TOKEN;
		 }
	       }

	       __END__
	       1+2-5
	       "a multiline
	       string with an embedded "" in it"
	       an invalid string with a "" in it"

DESCRIPTION
       The classes "Parse::Lex" and "Parse::CLex" create lexical analyzers.
       They use different analysis techniques:

       1.  "Parse::Lex" steps through the analysis by moving a pointer within
       the character strings to be analyzed (use of "pos()" together with
       "\G"),

       2.  "Parse::CLex" steps through the analysis by consuming the data
       recognized (use of "s///").

       Analyzers of the "Parse::CLex" class do not allow the use of anchoring
       in regular expressions.	In addition, the subclasses of "Parse::Token"
       are not implemented for this type of analyzer.

       A lexical analyzer is specified by means of a list of tokens passed as
       arguments to the "new()" method. Tokens are instances of the
       "Parse::Token" class, which comes with "Parse::Lex". The definition of
       a token usually comprises two arguments: a symbolic name (like
       "INTEGER"), followed by a regular expression. If a sub ref (anonymous
       subroutine) is given as third argument, it is called when the token is
       recognized.  Its arguments are the "Parse::Token" instance and the
       string recognized by the regular expression.  The anonymous
       subroutine's return value is used as the new string contents of the
       "Parse::Token" instance.

       The order in which the lexical analyzer examines the regular
       expressions is determined by the order in which these expressions are
       passed as arguments to the "new()" method. The token returned by the
       lexical analyzer corresponds to the first regular expression which
       matches (this strategy is different from that used by Lex, which
       returns the longest match possible out of all that can be recognized).

       The lexical analyzer can recognize tokens which span multiple records.
       If the definition of the token comprises more than one regular
       expression (placed within a reference to an anonymous array), the
       analyzer reads as many records as required to recognize the token (see
       the documentation for the "Parse::Token" class).	 When the start
       pattern is found, the analyzer looks for the end, and if necessary,
       reads more records.  No backtracking is done in case of failure.

       The analyzer can be used to analyze an isolated character string or a
       stream of data coming from a file handle. At the end of the input data
       the analyzer returns a "Parse::Token" instance named "EOI" (End Of
       Input).

   Start Conditions
       You can associate start conditions with the token-recognition rules
       that comprise your lexical analyzer (this is similar to what Flex
       provides).  When start conditions are used, the rule which succeeds is
       no longer necessarily the first rule that matches.

       A token symbol may be preceded by a start condition specifier for the
       associated recognition rule. For example:

	       qw(C1:TERMINAL_1	 REGEXP), sub { # associated action },
	       qw(TERMINAL_2  REGEXP), sub { # associated action },

       Symbol "TERMINAL_1" will be recognized only if start condition "C1" is
       active.	Start conditions are activated/deactivated using the
       "start(CONDITION_NAME)" and "end(CONDITION_NAME)" methods.

       "start('INITIAL')" resets the analysis automaton.

       Start conditions can be combined using AND/OR operators as follows:

	       C1:SYMBOL      condition C1

	       C1:C2:SYMBOL   condition C1 AND condition C2

	       C1,C2:SYMBOL   condition C1 OR  condition C2

       There are two types of start conditions: inclusive and exclusive, which
       are declared by class methods "inclusive()" and "exclusive()"
       respectively.  With an inclusive start condition, all rules are active
       regardless of whether or not they are qualified with the start
       condition.  With an exclusive start condition, only the rules qualified
       with the start condition are active; all other rules are deactivated.

       Example (borrowed from the documentation of Flex):

	use Parse::Lex;
	@token = (
		 'EXPECT', 'expect-floats', sub {
		   $lexer->start('expect');
		   $_[1]
		 },
		 'expect:FLOAT', '\d+\.\d+', sub {
		   print "found a float: $_[1]\n";
		   $_[1]
		 },
		 'expect:NEWLINE', '\n', sub {
		   $lexer->end('expect') ;
		   $_[1]
		 },
		 'NEWLINE2', '\n',
		 'INT', '\d+', sub {
		   print "found an integer: $_[1] \n";
		   $_[1]
		 },
		 'DOT', '\.', sub {
		   print "found a dot\n";
		   $_[1]
		 },
		);

	Parse::Lex->exclusive('expect');
	$lexer = Parse::Lex->new(@token);

       The special start condition "ALL" is always verified.

   Methods
       analyze EXPR
	   Analyzes "EXPR" and returns a list of pairs consisting of a token
	   name followed by recognized text. "EXPR" can be a character string
	   or a reference to a filehandle.

	   Examples:

	    @tokens = Parse::Lex->new(qw(PLUS [+] NUMBER \d+))->analyze("3+3+3");
	    @tokens = Parse::Lex->new(qw(PLUS [+] NUMBER \d+))->analyze(\*STREAM);

       buffer EXPR
       buffer
	   Returns the contents of the internal buffer of the lexical
	   analyzer.  With an expression as argument, places the result of the
	   expression in the buffer.

	   It is not advisable to directly change the contents of the buffer
	   without changing the position of the analysis pointer ("pos()") and
	   the value length of the buffer ("length()").

       configure(HASH)
	   Instance method which permits specifying a lexical analyzer.	 This
	   method accepts the list of the following attribute values:

	   From => EXPR
		     This attribute plays the same role as the "from(EXPR)"
		     method.  "EXPR" can be a filehandle or a character
		     string.

	   Tokens => ARRAY_REF
		     "ARRAY_REF" must contain the list of attribute values
		     specifying the tokens to be recognized (see the
		     documentation for "Parse::Token").

	   Skip => REGEX
		     This attribute plays the same role as the "skip(REGEX)"
		     method. "REGEX" describes the patterns to skip over
		     during the analysis.

		     end EXPR
			 Deactivates condition "EXPR".

		     eoi Returns TRUE when there is no more data to analyze.

		     every SUB
			 Avoids having to write a reading loop in order to
			 analyze a stream of data. "SUB" is an anonymous
			 subroutine executed after the recognition of each
			 token. For example, to lex the string "1+2" you can
			 write:

				 use Parse::Lex;

				 $lexer = Parse::Lex->new(
				   qw(
				      ADDOP [-+]
				      INTEGER \d+
				     ));

				 $lexer->from("1+2");
				 $lexer->every (sub {
				   print $_[0]->name, "\t";
				   print $_[0]->text, "\n";
				 });

			 The first argument of the anonymous subroutine is the
			 "Parse::Token" instance recognized.

		     exclusive LIST
			 Class method declaring the conditions present in LIST
			 to be exclusive.

		     flush
			 If saving of the consumed strings is activated,
			 "flush()" returns and clears the buffer containing
			 the character strings recognized up to now.  This is
			 only useful if "hold()" has been called to activate
			 saving of consumed strings.

		     from EXPR
		     from
			 "from(EXPR)" allows specifying the source of the data
			 to be analyzed. The argument of this method can be a
			 string (or list of strings), or a reference to a
			 filehandle.  If no argument is given, "from()"
			 returns the filehandle if defined, or "undef" if
			 input is a string.  When an argument "EXPR" is used,
			 the return value is the calling lexer object itself.

			 By default it is assumed that data are read from
			 "STDIN".

			 Examples:

				 $handle = new IO::File;
				 $handle->open("< filename");
				 $lexer->from($handle);

				 $lexer->from(\*DATA);
				 $lexer->from('the data to be analyzed');

		     getSub
			 "getSub" returns the anonymous subroutine that
			 performs the lexical analysis.

			 Example:

				 my $token = '';
				 my $sub = $lexer->getSub;
				 while (($token = &$sub()) ne $Token::EOI) {
				   print $token->name, "\t";
				   print $token->text, "\n";
				 }

			    # or

				 my $token = '';
				 local *tokenizer = $lexer->getSub;
				 while (($token = tokenizer()) ne $Token::EOI) {
				   print $token->name, "\t";
				   print $token->text, "\n";
				 }

		     getToken
			 Same as "token()" method.

		     hold EXPR
		     hold
			 Activates/deactivates saving of the consumed strings.
			 The return value is the current setting (TRUE or
			 FALSE).  Can be used as a class method.

			 You can obtain the contents of the buffer using the
			 "flush" method, which also empties the buffer.

		     inclusive LIST
			 Class method declaring the conditions present in LIST
			 to be inclusive.

		     length EXPR
		     length
			 Returns the length of the current record.  "length
			 EXPR" sets the length of the current record.

		     line EXPR
		     line
			 Returns the line number of the current record.	 "line
			 EXPR" sets the value of the line number.  Always
			 returns 1 if a character string is being analyzed.
			 The "readline()" method increments the line number.

		     name EXPR
		     name
			 "name EXPR" lets you give a name to the lexical
			 analyzer.  "name()" return the value of this name.

		     next
			 Causes searching for the next token. Return the
			 recognized "Parse::Token" instance. Returns the
			 "Token::EOI" instance at the end of the data.

			 Examples:

				 $lexer = Parse::Lex->new(@token);
				 print $lexer->next->name;   # print the token type
				 print $lexer->next->text;   # print the token content

		     nextis SCALAR_REF
			 Variant of the "next()" method. Tokens are placed in
			 "SCALAR_REF". The method returns 1 as long as the
			 token is not "EOI".

			 Example:

				 while($lexer->nextis(\$token)) {
				    print $token->text();
				 }

		     new LIST
			 Creates and returns a new lexical analyzer. The
			 argument of the method is a list of "Parse::Token"
			 instances, or a list of triplets permitting their
			 creation.  The triplets consist of: the symbolic name
			 of the token, the regular expression necessary for
			 its recognition, and possibly an anonymous subroutine
			 that is called when the token is recognized. For each
			 triplet, an instance of type "Parse::Token" is
			 created in the calling package.

		     offset
			 Returns the number of characters already consumed
			 since the beginning of the analyzed data stream.

		     pos EXPR
		     pos "pos EXPR" sets the position of the beginning of the
			 next token to be recognized in the current line (this
			 doesn't work with analyzers of the "Parse::CLex"
			 class).  "pos()" returns the number of characters
			 already consumed in the current line.

		     readline
			 Reads data from the input specified by the "from()"
			 method. Returns the result of the reading.

			 Example:

				 use Parse::Lex;

				 $lexer = Parse::Lex->new();
				 while (not $lexer->eoi) {
				   print $lexer->readline() # read and print one line
				 }

		     reset
			 Clears the internal buffer of the lexical analyzer
			 and erases all tokens already recognized.

		     restart
			 Reinitializes the analysis automaton. The only active
			 condition becomes the condition "INITIAL".

		     setToken TOKEN
			 Sets the token to "TOKEN". Useful to requalify a
			 token inside the anonymous subroutine associated with
			 this token.

		     skip EXPR
		     skip
			 "EXPR" is a regular expression defining the token
			 separator pattern (by default "[ \t]+"). "skip('')"
			 sets this to no pattern.  With no argument, "skip()"
			 returns the value of the pattern.  "skip()" can be
			 used as a class method.

			 Changing the skip pattern causes recompilation of the
			 lexical analyzer.

			 Example:

			   Parse::Lex->skip('\s*#(?s:.*)|\s+');
			   @tokens = Parse::Lex->new('INTEGER' => '\d+')->analyze(\*DATA);
			   print "@tokens\n"; # print INTEGER 1 INTEGER 2 INTEGER 3 INTEGER 4 EOI
			   __END__
			   1 # first string to skip
			   2
			   3# second string to skip
			   4

		     start EXPR
			 Activates condition EXPR.

		     state EXPR
			 Returns the state of the condition represented by
			 EXPR.

		     token
			 Returns the instance corresponding to the last
			 recognized token. In case no token was recognized,
			 return the special token named "DEFAULT".

		     tokenClass EXPR
		     tokenClass
			 Indicates which is the class of the tokens to be
			 created from the list passed as argument to the
			 "new()" method.  If no argument is given, returns the
			 name of the class.  By default the class is
			 "Parse::Token".

		     trace OUTPUT
		     trace
			 Class method which activates trace mode. The
			 activation of trace mode must take place before the
			 creation of the lexical analyzer. The mode can then
			 be deactivated by another call of this method.

			 "OUTPUT" can be a file name or a reference to a
			 filehandle where the trace will be redirected.

ERROR HANDLING
       To handle the cases of token non-recognition, you can define a specific
       token at the end of the list of tokens that comprise our lexical
       analyzer.  If searching for this token succeeds, it is then possible to
       call an error handling function:

	    qw(ERROR  (?s:.*)), sub {
	      print STDERR "ERROR: buffer content->", $_[0]->lexer->buffer, "<-\n";
	      die qq!can\'t analyze: "$_[1]"!;
	    }

EXAMPLES
       ctokenizer.pl - Scan a stream of data using the "Parse::CLex" class.

       tokenizer.pl - Scan a stream of data using the "Parse::Lex" class.

       every.pl - Use of the "every" method.

       sexp.pl - Interpreter for prefix arithmetic expressions.

       sexpcond.pl - Interpeter for prefix arithmetic expressions, using
       conditions.

BUGS
       Analyzers of the "Parse::CLex" class do not allow the use of regular
       expressions with anchoring.

SEE ALSO
       "Parse::Token", "Parse::LexEvent", "Parse::YYLex".

AUTHOR
       Philippe Verdret. Documentation translated to English by Vladimir
       Alexiev and Ocrat.

ACKNOWLEDGMENTS
       Version 2.0 owes much to suggestions made by Vladimir Alexiev.  Ocrat
       has significantly contributed to improving this documentation.  Thanks
       also to the numerous people who have sent me bug reports and
       occasionally fixes.

REFERENCES
       Friedl, J.E.F. Mastering Regular Expressions. O'Reilly & Associates
       1996.

       Mason, T. & Brown, D. - Lex & Yacc. O'Reilly & Associates, Inc. 1990.

       FLEX - A Scanner generator (available at ftp://ftp.ee.lbl.gov/ and
       elsewhere)

COPYRIGHT
       Copyright (c) 1995-1999 Philippe Verdret. All rights reserved.  This
       module is free software; you can redistribute it and/or modify it under
       the same terms as Perl itself.

POD ERRORS
       Hey! The above document had some coding errors, which are explained
       below:

       Around line 583:
	   You forgot a '=back' before '=head1'

	   You forgot a '=back' before '=head1'

perl v5.14.0			  2010-03-26			 Parse::Lex(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome