Fink::Text::DelimMatch(3) Fink documentation Fink::Text::DelimMatch(3)NAMEFink::Text::DelimMatch - Perl extension to find regexp delimited
strings with proper nesting
SYNOPSIS
use Fink::Text::DelimMatch;
$mc = new Fink::Text::DelimMatch, $startdelim, $enddelim;
$mc->quote('"');
$mc->escape("\\");
$mc->double_escape('"');
$mc->case_sensitive(1);
($prefix, $match, $remainder) = $mc->match($string);
($prefix, $nextmatch, $remainder) = $mc->match();
$middle = $mc->strip_delim($match); # returns $match w/o start and end delim
DESCRIPTION
These routines allow you to match delimited substrings in a buffer.
The delimiters can be specified with any regular expression and the
start and end delimiters need not be the same. If the delimited text
is properly nested, entire nested groups are returned.
In addition, you may specify quoting and escaping characters that
contribute to the recognition of start and end delimiters.
For example, if you specify the start and end delimiters as '\(' and
'\)', respectively, and the double quote character as a quoting
character, and the backslash as an escaping character, then the
delimited substring in this buffer is "(ma(t)c\)h)":
'prefix text "(quoted text)" \(escaped \" text) (ma(t)c\)h) postfix text'
In order to support this rather complex interface, the matching context
is encapsulated in an object. The object, Fink::Text::DelimMatch, has
the following public methods:
new $start, $end, $escape, $dblesc, $qs1, $qe1, ... $qsn, $qen
Creates a new object. All of the arguments are optional, and can
be set with other methods, but they must be passed in the specified
order: start delimiter, end delimiter, escape characters, double
escape characters, and a set of quote characters.
match $string
In an array context, returns ($pre, $match, $post) where $pre is
the text preceding the first match, $match is the matched text
(including the delimiters), and $post is the rest of the text in
the buffer. In a scalar context, returns $match.
If $string is not provided on subsequent calls, the $post from the
previous match is used, unless keep is false. If keep is false,
the match always fails.
strip_delim $string
Returns $string with the start and end delimiters removed.
delim $start, $end
Set the start and end delimiters. Only one set of delimiters can
be in use at any one time.
Returns the delimters in use before this call.
quote $startq, $endq
Specifies the start and end quote characters. Multiple quote
character pairs are supported, so this function is additive. To
clear the current settings, pass no arguments, e.g., $mc->quote().
If only $start is passed, $end is assumed to be the same.
In matching, quotes occur in pairs. In other words, if (",") and
(',') are both specified as quote pairs and a string beginning with
" is found, it is ended only by another ", not by '.
Returns the quote hash in use before this call.
escape $esc
Specifies a set of escaping characters. This can only be a string
of characters. $esc can be a regexp set or a simple string. If it
is a simple string, it will be translated into the regexp set "[
quotemeta($esc) ]".
Returns the escape characters in use before this call.
double_escape $esc
Specifies a set of double-escaping characters, i.e., characters
that are considered escaped if they occur in pairs. For example,
in some languages,
'Don''t you see?'
defines a string containing a single apostrophe.
$esc can only be a string of characters. $esc can be a regexp set
or a simple string. If it is a simple string, it will be
translated into the regexp set "[ quotemeta($esc) ]".
Returns the double-escaping characters in use before this call.
case_sensitive $bool
Sets case sensitivity to $bool or true if $bool is not specified.
Returns the case sensitivity in use before this call.
keep $bool
Sets keep to $bool or true if $bool is not specified.
Keep, which is true by default, specifies whether or not the
matching context object keeps a local copy of the buffer used in
matching. Keeping a local copy allows repeated matching on the
same buffer, but might be a bad idea if the buffer is a terabyte
long. ;-)
Returns the keep setting in use before this call.
returndelim $bool
Sets returndelim to $bool or true if $bool is not specified.
Returndelim, which is true by default, specifies whether or not the
start and end delimiters are returned with the matching string.
Returns the returndelim setting in use before this call.
error $seterr
Returns the last error that occured. If $seterr is passed, the
error is set to that value. Some common kinds of bad input are
detected and an error condition is raised. If an error condition
is raised, all matching fails until the error is cleared.
The most common error is a bad regular expression, for example
specifing the start delimiter as "(" instead of "\\(". Remember,
these are regexps!
pre_matched
Returns the prefix text from the last match if keep is true. Sets
an error and returns an empty string if keep is false.
matched
Returns the matched text from the last match if keep is true. Sets
an error and returns an empty string if keep is false.
post_matched
Returns the postfix text from the last match if keep is true. Sets
an error and returns an empty string if keep is false.
debug $bool
Sets debug to $bool or true if $bool is not specified.
If debug is true, informative and progress messages are printed to
STDOUT by some methods.
Returns the debugging setting in use before this call.
dump
For debugging, prints all of the instance variables for a
particular object.
slow $bool
For debugging. Some classes of delimited strings can be located
with much faster algorithms than can be used in the most general
case. If slow is true, the slower, general algorithm is always
used.
For simplicity, and backward compatibility with the previous (limited
release) incarnation of this module, the following functions are also
available directly:
nested_match ($string, $start, $end, $three)
If $three is true, returns ($pre, $match, $post) in an array
context otherwise returns ("$pre$match", $post). In a scalar
context, returns "$pre$match".
skip_nested_match ($string, $start, $end, $three)
If $three is true, returns ($pre, $match, $post) in an array
context otherwise returns ("$pre$match", $post). In a scalar
context, returns $post.
EXAMPLES
$mc = new Fink::Text::DelimMatch '"';
$mc->('pre "match" post') == '"match"';
$mc->delim("\\(", "\\)");
$mc->('pre (match) post') == ('pre ', '(match)', ' post');
$mc->('pre (ma(t)ch) post') == ('pre ', '(ma(t)ch)', ' post');
$mc->quote('"');
$mc->escape("\\");
$mc->('pre (ma")"tch) post') == ('pre ', '(ma")"tch)', ' post');
$mc->('pre (ma(t)c\)h\") post') == ('pre ', '(ma(t)c\)h\")', ' post');
See also test.pl in the distribution.
AUTHOR
Norman Walsh, ndw@nwalsh.com
COPYRIGHT
From the original file: Copyright (C) 1997-2002 Norman Walsh. All
rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
For the changes by Fink: Copyright (C) 2004-2013 The Fink Package
Manager Team. This program is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
WARRANTY
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
SEE ALSOperl(1).
Fink 0.36.3.1 2013-12-30 Fink::Text::DelimMatch(3)