Perl6::Bible::S05 man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Perl6::Bible::S05(3)  User Contributed Perl Documentation Perl6::Bible::S05(3)

NAME
       Synopsis_05 - Rules

AUTHORS
       Damian Conway <damian@conway.org> and Allison Randal <al@shadowed.net>

VERSION
	  Maintainer: Patrick Michaud <pmichaud@pobox.com>
	  Date: 24 Jun 2002
	  Last Modified: 25 Feb 2006
	  Number: 5
	  Version: 12

       This document summarizes Apocalypse 5, which is about the new regex
       syntax.	We now try to call them "rules" because they haven't been
       regular expressions for a long time.  (The term "regex" is still
       acceptable.)

New match state and capture variables
       The underlying match state object is now available as the $/ variable,
       which is implicitly lexically scoped.  All access to the current (or
       most recent) match is through this variable, even when it doesn't look
       like it.	 The individual capture variables (such as $0, $1, etc.) are
       just elements of $/.

       By the way, the numbered capture variables now start at $0, $1, $2,
       etc. See below.

Unchanged syntactic features
       The following regex features use the same syntax as in Perl 5:

       ·   Capturing: (...)

       ·   Repetition quantifiers: *, +, and ?

       ·   Alternatives:  |

       ·   Backslash escape:  \

       ·   Minimal matching suffix:   ??,  *?,	+?

Modifiers
       ·   The extended syntax ("/x") is no longer required...it's the
	   default.

       ·   There are no "/s" or "/m" modifiers (changes to the meta-characters
	   replace them - see below).

       ·   There is no "/e" evaluation modifier on substitutions; instead use:

		s/pattern/{ code() }/

       ·   Modifiers are now placed as adverbs at the start of a
	   match/substitution:

		m:g:i/\s* (\w*) \s* ,?/;

	   Every modifier must start with its own colon.  The delimiter must
	   be separated from the final modifier by a colon or whitespace if it
	   would be taken as an argument to the preceding modifier.

       ·   The single-character modifiers also have longer versions:

		    :i	      :ignorecase
		    :g	      :global

       ·   The ":c" (or ":continue") modifier causes the pattern to continue
	   scanning from the string's current ".pos":

		m:c/ pattern /	      # start at end of
				      # previous match on $_

	   Note that this does not automatically anchor the pattern to the
	   starting location.  (Use ":p" for that.)  The pattern you supply to
	   "split" has an implicit ":c" modifier.

       ·   The ":p" (or ":pos") modifier causes the pattern to try to match
	   only at the string's current ".pos":

		m:p/ pattern /	      # match at end of
				      # previous match on $_

	   Since this is implicitly anchored to the position, it's suitable
	   for building parsers and lexers.  The pattern you supply to a Perl
	   macro's "is parsed" trait has an implicit ":p" modifier.

	   Note that

		m:c/pattern/

	   is roughly equivalent to

		m:p/.*? pattern/

       ·   The new ":once" modifier replaces the Perl 5 "?...?" syntax:

		m:once/ pattern /    # only matches first time

       ·   [Note: We're still not sure if :w is ultimately going to work
	   exactly as described below.	But this is how it works for now.]

	   The new ":w" (":words") modifier causes whitespace sequences to be
	   replaced by "\s*" or "\s+" subpattern as defined by the "<?ws>"
	   rule.

		m:w/ next cmd =	  <condition>/

	   Same as:

		m/ <?ws> next <?ws> cmd <?ws> = <?ws> <condition>/

	   which is effectively the same as:

		m/ \s* next \s+ cmd \s* = \s* <condition>/

	   But in the case of

		m:w { (a|\*) (b|\+) }

	   or equivalently,

		m { (a|\*) <?ws> (b|\+) }

	   "<?ws>" can't decide what to do until it sees the data.  It still
	   does the right thing.  If not, define your own "<?ws>" and ":w"
	   will use that.

       ·   New modifiers specify Unicode level:

		m:bytes / .**{2} /	 # match two bytes
		m:codes / .**{2} /	 # match two codepoints
		m:graphs/ .**{2} /	 # match two graphemes
		m:langs / .**{2} /	 # match two language dependent chars

	   There are corresponding pragmas to default to these levels.

       ·   The new ":perl5" modifier allows Perl 5 regex syntax to be used
	   instead:

		m:perl5/(?mi)^[a-z]{1,2}(?=\s)/

	   (It does not go so far as to allow you to put your modifiers at the
	   end.)

       ·   Any integer modifier specifies a count. What kind of count is
	   determined by the character that follows.

       ·   If followed by an "x", it means repetition.	Use :x(4) for the
	   general form.  So

		s:4x { (<?ident>) = (\N+) $$}{$0 => $1};

	   is the same as:

		s:x(4) { (<?ident>) = (\N+) $$}{$0 => $1};

	   which is almost the same as:

		$_.pos = 0;
		s:c{ (<?ident>) = (\N+) $$}{$0 => $1} for 1..4;

	   except that the string is unchanged unless all four matches are
	   found.  However, ranges are allowed, so you can say ":x(1..4)" to
	   change anywhere from one to four matches.

       ·   If the number is followed by an "st", "nd", "rd", or "th", it means
	   find the Nth occurrence.  Use :nth(3) for the general form.	So

		s:3rd/(\d+)/@data[$0]/;

	   is the same as

		s:nth(3)/(\d+)/@data[$0]/;

	   which is the same as:

		m/(\d+)/ && m:c/(\d+)/ && s:c/(\d+)/@data[$0]/;

	   Lists and junctions are allowed: ":nth(1|2|3|5|8|13|21|34|55|89)".

	   So are closures: ":nth{.is_fibonacci}"

       ·   With the new ":ov" (":overlap") modifier, the current rule will
	   match at all possible character positions (including overlapping)
	   and return all matches in a list context, or a disjunction of
	   matches in a scalar context.	 The first match at any position is
	   returned.

		$str = "abracadabra";

		if $str ~~ m:overlap/ a (.*) a / {
		    @substrings = $/.matches();	   # bracadabr cadabr dabr br
		}

       ·   With the new ":ex" (":exhaustive") modifier, the current rule will
	   match every possible way (including overlapping) and return all
	   matches in a list context, or a disjunction of matches in a scalar
	   context.

		$str = "abracadabra";

		if $str ~~ m:exhaustive/ a (.*) a / {
		    @substrings = $/.matches();	   # br brac bracad bracadabr
						   # c cad cadabr d dabr br
		}

       ·   The new ":rw" modifier causes this rule to "claim" the current
	   string for modification rather than assuming copy-on-write
	   semantics.  All the bindings in $/ become lvalues into the string,
	   such that if you modify, say, $1, the original string is modified
	   in that location, and the positions of all the other fields
	   modified accordingly (whatever that means).	In the absence of this
	   modifier (especially if it isn't implemented yet, or is never
	   implemented), all pieces of $/ are considered copy-on-write, if not
	   read-only.

       ·   The new ":keepall" modifier causes this rule and all invoked
	   subrules to remember everything, even if the rules themselves don't
	   ask for their subrules to be remembered.  This is for forcing a
	   grammar that throws away whitespace and comments to keep them
	   instead.

       ·   The ":i", ":w", ":perl5", and Unicode-level modifiers can be placed
	   inside the rule (and are lexically scoped):

		m/:w alignment = [:i left|right|cent[er|re]] /

       ·   User-defined modifiers will be possible:

		    m:fuzzy/pattern/;

       ·   User-defined modifiers can also take arguments:

		    m:fuzzy('bare')/pattern/;

       ·   To use parens or brackets for your delimiters you have to separate:

		    m:fuzzy (pattern);
		    m:fuzzy:(pattern);

	   or you'll end up with:

		    m:fuzzy(fuzzyargs); pattern ;

Changed metacharacters
       ·   A dot "." now matches any character including newline. (The "/s"
	   modifier is gone.)

       ·   "^" and "$" now always match the start/end of a string, like the
	   old "\A" and "\z". (The "/m" modifier is gone.)

       ·   A "$" no longer matches an optional preceding "\n" so it's
	   necessary to say "\n?$" if that's what you mean.

       ·   "\n" now matches a logical (platform independent) newline not just
	   "\x0a".

       ·   The "\A", "\Z", and "\z" metacharacters are gone.

New metacharacters
       ·   Because "/x" is default:

	   ·   An unescaped "#" now always introduces a comment.

	   ·   Whitespace is now always metasyntactic, i.e. used only for
	       layout and not matched literally (but see the ":w" modifier
	       described above).

       ·   "^^" and $$ match line beginnings and endings. (The "/m" modifier
	   is gone.)  They are both zero-width assertions.  $$ matches before
	   any "\n" (logical newline), and also at the end of the string if
	   the final character was not a "\n".	"^^" always matches the
	   beginning of the string and after any "\n" that is not the final
	   character in the string.

       ·   "." matches an "anything", while "\N" matches an "anything except
	   newline". (The "/s" modifier is gone.)  In particular, "\N" matches
	   neither carriage return nor line feed.

       ·   The new "&" metacharacter separates conjunctive terms.  The
	   patterns on either side must match with the same beginning and end
	   point.  The operator is list associative like "|", has higher
	   precedence than "|", and backtracking makes the right argument vary
	   faster than the left.

Bracket rationalization
       ·   "(...)" still delimits a capturing group. However the ordering of
	   these groups is hierarchical, rather than linear. See "Nested
	   subpattern captures".

       ·   "[...]" is no longer a character class.  It now delimits a non-
	   capturing group.

       ·   "{...}" is no longer a repetition quantifier.  It now delimits an
	   embedded closure.

       ·   You can call Perl code as part of a rule match by using a closure.
	   Embedded code does not usually affect the match--it is only used
	   for side-effects:

		/ (\S+) { print "string not blank\n"; $text = $0; }
		   \s+	{ print "but does contain whitespace\n" }
		/

       ·   It can affect the match if it calls "fail":

		/ (\d+) { $0 < 256 or fail } /

	   Closures are guaranteed to be called at the canonical time even if
	   the optimizer could prove that something after them can't match.
	   (Anything before is fair game, however.)

       ·   The repetition specifier is now "**{...}" for maximal matching,
	   with a corresponding "**{...}?" for minimal matching.  Space is
	   allowed on either side of the asterisks.  The curlies are taken to
	   be a closure returning a number or a range.

		/ value was (\d ** {1..6}?) with ([\w]**{$m..$n}) /

	   It is illegal to return a list, so this easy mistake fails:

		/ [foo]**{1,3} /

	   (At least, it fails in the absence of ""use rx :listquantifier"",
	   which is likely to be unimplemented in Perl 6.0.0 anyway).

	   The optimizer will likely optimize away things like "**{1...}" so
	   that the closure is never actually run in that case.	 But it's a
	   closure that must be run in the general case, so you can use it to
	   generate a range on the fly based on the earlier matching.  (Of
	   course, bear in mind the closure is run before attempting to match
	   whatever it quantifies.)

       ·   "<...>" are now extensible metasyntax delimiters or "assertions"
	   (i.e. they replace Perl 5's crufty "(?...)" syntax).

Variable (non-)interpolation
       ·   In Perl 6 rules, variables don't interpolate.

       ·   Instead they're passed "raw" to the rule engine, which can then
	   decide how to handle them (more on that below).

       ·   The default way in which the engine handles a scalar is to match it
	   as a "<'...'>" literal (i.e. it does not treat the interpolated
	   string as a subpattern).  In other words, a Perl 6:

		/ $var /

	   is like a Perl 5:

		/ \Q$var\E /

	   (To get rule interpolation use an assertion - see below)

       ·   An interpolated array:

		/ @cmds /

	   is matched as if it were an alternation of its elements:

		/ [ @cmds[0] | @cmds[1] | @cmds[2] | ... ] /

	   As with a scalar variable, each element is matched as a literal.

       ·   An interpolated hash matches the longest possible key of the hash
	   as a literal, or fails if no key matches.  (A "" key will match
	   anywhere, provided no longer key matches.)

	   ·   If the corresponding value of the hash element is a closure, it
	       is executed.

	   ·   If it is a string or rule object, it is executed as a subrule.

	   ·   If it has the value 1, nothing special happens beyond the
	       match.

	   ·   Any other value causes the match to fail.

Extensible metasyntax ("<;...>")
       ·   The first character after "<" determines the behaviour of the
	   assertion.

       ·   A leading alphabetic character means it's a capturing grammatical
	   assertion (i.e. a subrule or a named character class - see below):

		/ <sign>? <mantissa> <exponent>? /

       ·   The special named assertions include:

		/ <before pattern> /	# was /(?=pattern)/
		/ <after pattern> /	# was /(?<pattern)/

		/ <ws> /		# match whitespace by :w rules

		/ <sp> /		# match a space char

	   The "after" assertion implements lookbehind by reversing the syntax
	   tree and looking for things in the opposite order going to the
	   left.  It is illegal to do lookbehind on a pattern that cannot be
	   reversed.

       ·   A leading "?" causes the assertion not to capture what it matches
	   (see "Subrule captures". For example:

		/ <ident>  <ws>	 /	# $/<ident> and $/<ws> both captured
		/ <?ident> <ws>	 /	# only $/<ws> captured
		/ <?ident> <?ws> /	# nothing captured

       ·   A leading "$" indicates an indirect rule.  The variable must
	   contain either a hard reference to a rule, or a string containing
	   the rule.

       ·   A leading "::" indicates a symbolic indirect rule:

		/ <::($somename)>

	   The variable must contain the name of a rule.

       ·   A leading "@" matches like a bare array except that each element is
	   treated as a rule (string or hard ref) rather than as a literal.

       ·   A leading "%" matches like a bare hash except that each key is
	   treated as a rule (string or hard ref) rather than as a literal.

       ·   A leading "{" indicates code that produces a rule to be
	   interpolated into the pattern at that point:

		/ (<?ident>)  <{ %cache{$0} //= get_body($0) }> /

	   The closure is guaranteed to be run at the canonical time.

	   An explicit return from the closure binds the result object for
	   this match, ignores the rest of the current rule, and reports
	   success:

		   / (\d) { return $0.sqrt } NotReached /;

	   This has the effect of capturing the square root of the numified
	   string, instead of the string.  The "NotReached" part is not
	   reached.

	   These closures are invoked as anonymous methods on the "Match"
	   object.  See "Match objects" below for more about result objects.

       ·   A leading "&" interpolates the return value of a subroutine call as
	   a rule.  Hence

		<&foo()>

	   is short for

		<{ foo() }>

       ·   In any case of rule interpolation, if the value already happens to
	   be a rule object, it is not recompiled.  If it is a string, the
	   compiled form is cached with the string so that it is not
	   recompiled next time you use it unless the string changes.  (Any
	   external lexical variable names must be rebound each time though.)
	   Rules may not be interpolated with unbalanced bracketing.  An
	   interpolated subrule keeps its own inner $/, so its parentheses
	   never count toward the outer rules groupings.  (In other words,
	   parenthesis numbering is always lexically scoped.)

       ·   A leading "?{" or "!{"indicates a code assertion:

		/ (\d**{1..3}) <?{ $0 < 256 }> /
		/ (\d**{1..3}) <!{ $0 < 256 }> /

	   Similar to:

		/ (\d**{1..3}) { $0 < 256 or fail } /
		/ (\d**{1..3}) { $0 < 256 and fail } /

	   Unlike closures, code assertions are not guaranteed to be run at
	   the canonical time if the optimizer can prove something later can't
	   match.  So you can sneak in a call to a non-canonical closure that
	   way:

		/^foo .* <?{ do { say "Got here!" } or 1 }> .* bar$/

	   The "do" block is unlikely to run unless the string ends with
	   ""bar"".

       ·   A leading "(" indicates the start of a result capture:

	       / foo <( \d+ )> bar /

	   is equivalent to:

	       / <after foo> \d+ <before bar> /

	   except that the scan for "foo" can be done in the forward
	   direction, while a lookbehind assertion would presumably scan for
	   \d+ and then match "foo" backwards.	The use of "<(...)>" affects
	   only the meaning of the "result object" and the positions of the
	   beginning and ending of the match.  That is, after the match above,
	   "$()" contains only the digits matched, and ".pos" is pointing to
	   after the digits.  Other captures (named or numbered) are
	   unaffected and may be accessed through $/.

       ·   A leading "[" or "+" indicates an enumerated character class.
	   Ranges in enumerated character classes are indicated with "..".

		/ <[a..z_]>* /
		/ <+[a..z_]>* /

       ·   A leading "-" indicates a complemented character class:

		/ <-[a..z_]> <-alpha> /

       ·   Character classes can be combined (additively or subtractively)
	   within a single set of angle brackets. For example:

		/ <[a..z]-[aeiou]+xdigit> /	 # consonant or hex digit

	   If such a combination starts with a named character class, a
	   leading "+" is required:

		/ <+alpha-[Jj]> /		# J-less alpha

       ·   A leading "'" indicates a literal match (including whitespace):

		/ <'match this exactly (whitespace matters)'> /

       ·   A leading """ indicates a literal match after interpolation:

		/ <"match $THIS exactly (whitespace still matters)"> /

       ·   The special assertion "<.>" matches any logical grapheme (including
	   a Unicode combining character sequences):

		/ seekto = <.> /  # Maybe a combined char

	   Same as:

		/ seekto = [:graphs .] /

       ·   A leading "!" indicates a negated meaning (always a zero-width
	   assertion):

		/ <!before _ > /    # We aren't before an _

Backslash reform
       ·   The "\p" and "\P" properties become intrinsic grammar rules ("<prop
	   ...>" and "<!prop ...>").

       ·   The "\L...\E", "\U...\E", and "\Q...\E" sequences are gone.	In the
	   rare cases that need them you can use "<{ lc $rule }>" etc.

       ·   The "\G" sequence is gone.  Use ":p" instead.  (Note, however, that
	   it makes no sense to use ":p" within a pattern, since every
	   internal pattern is implicitly anchored to the current position.
	   You'll have to explicitly compare "<( .pos == $oldpos )>" in that
	   case.)

       ·   Backreferences (e.g. "\1", "\2", etc.) are gone; $0, $1, etc. can
	   be used instead, because variables are no longer interpolated.

       ·   New backslash sequences, "\h" and "\v", match horizontal and
	   vertical whitespace respectively, including Unicode.

       ·   "\s" now matches any Unicode whitespace character.

       ·   The new backslash sequence "\N" matches anything except a logical
	   newline; it is the negation of "\n".

       ·   A series of other new capital backslash sequences are also the
	   negation of their lower-case counterparts:

	   ·   "\H" matches anything but horizontal whitespace.

	   ·   "\V" matches anything but vertical whitespace.

	   ·   "\T" matches anything but a tab.

	   ·   "\R" matches anything but a return.

	   ·   "\F" matches anything but a formfeed.

	   ·   "\E" matches anything but an escape.

	   ·   "\X..." matches anything but the specified character (specified
	       in hexadecimal).

Regexes are rules
       ·   The Perl 5  "qr/pattern/" regex constructor is gone.

       ·   The Perl 6 equivalents are:

		rule { pattern }    # always takes {...} as delimiters
		  rx / pattern /    # can take (almost any) chars as delimiters

	   You may not use whitespace or alphanumerics for delimiters.	Space
	   is optional unless needed to distinguish from modifier arguments or
	   function parens.  So you may use parens as your "rx" delimiters,
	   but only if you interpose a colon or whitespace:

		rx:( pattern )	    # okay
		rx ( pattern )	    # okay
		rx( 1,2,3 )	    # tries to call rx function

       ·   If either form needs modifiers, they go before the opening
	   delimiter:

		$rule = rule :g:w:i { my name is (.*) };
		$rule = rx:g:w:i / my name is (.*) /;

	   Space or colon is necessary after the final modifer if you use any
	   bracketing character for the delimiter.  (Otherwise it would be
	   taken as an argument to the modifier.)

       ·   You may not use colons for the delimiter.  Space is allowed between
	   modifiers:

		$rule = rx :g :w :i / my name is (.*) /;

       ·   The name of the constructor was changed from "qr" because it's no
	   longer an interpolating quote-like operator.	 "rx" stands for "rule
	   expression", or occasionally "regex".  ":-)"

       ·   As the syntax indicates, it is now more closely analogous to a "sub
	   {...}" constructor.	In fact, that analogy will run very deep in
	   Perl 6.

       ·   Just as a raw "{...}" is now always a closure (which may still
	   execute immediately in certain contexts and be passed as a
	   reference in others), so too a raw "/.../" is now always a rule
	   (which may still match immediately in certain contexts and be
	   passed as a reference in others).

       ·   Specifically, a "/.../" matches immediately in a value context
	   (void, Boolean, string, or numeric), or when it is an explicit
	   argument of a "~~".	Otherwise it's a rule constructor.  So this:

		$var = /pattern/;

	   no longer does the match and sets $var to the result.  Instead it
	   assigns a rule reference to $var.

       ·   The two cases can always be distinguished using "m{...}" or
	   "rx{...}":

		$var = m{pattern};    # Match rule immediately, assign result
		$var = rx{pattern};   # Assign rule expression itself

       ·   Note that this means that former magically lazy usages like:

		@list = split /pattern/, $str;

	   are now just consequences of the normal semantics.

       ·   It's now also possible to set up a user-defined subroutine that
	   acts like "grep":

		sub my_grep($selector, *@list) {
		    given $selector {
			when Rule  { ... }
			when Code  { ... }
			when Hash  { ... }
			# etc.
		    }
		}

	   Using "{...}" or "/.../" in the scalar context of the first
	   argument causes it to produce a "Code" or "Rule" reference, which
	   the switch statement then selects upon.

Backtracking control
       ·   Backtracking over a single colon causes the rule engine not to
	   retry the preceding atom:

		m:w/ \( <expr> [ , <expr> ]* : \) /

	   (i.e. there's no point trying fewer "<expr>" matches, if there's no
	   closing parenthesis on the horizon)

       ·   Backtracking over a double colon causes the surrounding group of
	   alternations to immediately fail:

		m:w/ [ if :: <expr> <block>
		     | for :: <list> <block>
		     | loop :: <loop_controls>? <block>
		     ]
		/

	   (i.e. there's no point trying to match a different keyword if one
	   was already found but failed).

       ·   Backtracking over a triple colon causes the current rule to fail
	   outright (no matter where in the rule it occurs):

		rule ident {
		      ( [<alpha>|_] \w* ) ::: { fail if %reserved{$0} }
		    | " [<alpha>|_] \w* "
		}

		m:w/ get <ident>? /

	   (i.e. using an unquoted reserved word as an identifier is not
	   permitted)

       ·   Backtracking over a "<commit>" assertion causes the entire match to
	   fail outright, no matter how many subrules down it happens:

		rule subname {
		    ([<alpha>|_] \w*) <commit> { fail if %reserved{$0} }
		}
		m:w/ sub <subname>? <block> /

	   (i.e. using a reserved word as a subroutine name is instantly fatal
	   to the "surrounding" match as well)

       ·   A "<cut>" assertion always matches successfully, and has the side
	   effect of deleting the parts of the string already matched.

       ·   Attempting to backtrack past a "<cut>" causes the complete match to
	   fail (like backtracking past a "<commit>". This is because there's
	   now no preceding text to backtrack into.

       ·   This is useful for throwing away successfully processed input when
	   matching from an input stream or an iterator of arbitrary length.

Named Regexes
       ·   The analogy between "sub" and "rule" extends much further.

       ·   Just as you can have anonymous subs and named subs...

       ·   ...so too you can have anonymous rules and named rules:

		rule ident { [<alpha>|_] \w* }

		# and later...

		@ids = grep /<ident>/, @strings;

       ·   As the above example indicates, it's possible to refer to named
	   rules, such as:

		rule serial_number { <[A..Z]> \d**{8} }
		rule type { alpha | beta | production | deprecated | legacy }

	   in other rules as named assertions:

		rule identification { [soft|hard]ware <type> <serial_number> }

Nothing is illegal
       ·   The null pattern is now illegal.

       ·   To match whatever the prior successful rule matched, use:

		/<prior>/

       ·   To match the zero-width string, use:

		/<null>/

	   For example:

		split /<?null>/, $string

	   splits between characters.

       ·   To match a null alternative, use:

		/a|b|c|<?null>/

	   This makes it easier to catch errors like this:

		m:w/ [
		     | if :: <expr> <block>
		     | for :: <list> <block>
		     | loop :: <loop_controls>? <block>
		     ]
		/

       ·   However, it's okay for a non-null syntactic construct to have a
	   degenerate case matching the null string:

		$something = "";
		/a|b|c|$something/;

Return values from matches
   Match objects
       ·   A match always returns a "match object", which is also available as
	   $/, which is an environmental lexical declared in the outer
	   subroutine that is calling the rule.	 (A closure lexically embedded
	   in a rule does not redeclare $/, so $/ always refers to the current
	   match, not any prior submatch done within the closure).

       ·   Notionally, a match object contains (among other things) a boolean
	   success value, a scalar "result object", an array of ordered
	   submatch objects, and a hash of named submatch objects.  To provide
	   convenient access to these various values, the match object
	   evaluates differently in different contexts:

	   ·   In boolean context it evaluates as true or false (i.e. did the
	       match succeed?):

		    if /pattern/ {...}
		    # or:
		    /pattern/; if $/ {...}

	   ·   In string context it evaluates to the stringified value of its
	       result object, which is usually the entire matched string:

		    print %hash{ "{$text ~~ /<?ident>/}" };
		    # or equivalently:
		    $text ~~ /<?ident>/	 &&  print %hash{~$/};

	       But generally you should say "~$/" if you mean "~$/".

	   ·   In numeric context it evaluates to the numeric value of its
	       result object, which is usually the entire matched string:

		    $sum += /\d+/;
		    # or equivalently:
		    /\d+/; $sum = $sum + $/;

	   ·   When called as a closure, a Match object evaluates to its
	       underlying result object.  Usually this is just the entire
	       match string, but you can override that by calling "return"
	       inside a rule:

		   my $moose = m:{
		       <antler> <body>
		       { return Moose.new( body => $<body>().attach($<antler>) ) }
		       # match succeeds -- ignore the rest of the rule
		   }.();

	       "$()" is a shorthand for "$/.()" or "$/()".  The result object
	       may be of any type, not just a string.

	       You may also capture a subset of the match as the result object
	       using the "<(...)>" construct:

		   "foo123bar" ~~ / foo <( \d+ )> bar /
		   say $();    # says 123

	       In this case the result object is always a string when doing
	       string matching, and a list of one or more elements when doing
	       array matching.

	       Additionally, the "Match" object delegates its "coerce" calls
	       (such as "+$match" and "~$match") to its underlying result
	       object.	The only exception is that "Match" handles boolean
	       coercion itself, which returns whether the match had succeeded.

	       This means that these two work the same:

		   / <moose> { $<moose>.() as Moose } /
		   / <moose> { $<moose>	   as Moose } /

	   ·   When used as an array, a Match object pretends to be an array
	       of all its positional captures.	Hence

		    ($key, $val) = m:w/ (\S+) => (\S+)/;

	       can also be written:

		    $result = m:w/ (\S+) => (\S+)/;
		    ($key, $val) = @$result;

	       To get a single capture into a string, use a subscript:

		    $mystring = "{ m:w/ (\S+) => (\S+)/[0] }";

	       To get all the captures into a string, use a "zen" slice:

		    $mystring = "{ m:w/ (\S+) => (\S+)/[] }";

	       Note that, as a scalar variable, $/ doesn't automatically
	       flatten in list context.	 Use "@$/" or $/[] to flatten as an
	       array.

	   ·   When used as a hash, a Match object pretends to be a hash of
	       all its named captures.	The keys do not include any sigils, so
	       if you capture to variable "@<foo>" its real name is $/{'foo'}
	       or "$/<foo>".  However, you may still refer to it as "@<foo>"
	       anywhere $/ is visible.	(But it is erroneous to use the same
	       name for two different capture datatypes.)

	       Note that, as a scalar variable, $/ doesn't automatically
	       flatten in list context.	 Use "%$/" or $/{} to flatten as a
	       hash, or bind it to a variable of the appropriate type.

	   ·   The numbered captures may be treated as named, so "$<0 1 2>" is
	       equivalent to $/[0,1,2].	 This allows you to write slices of
	       intermixed named and numbered captures.

	   ·   In ordinary code, variables $0, $1, etc. are just aliases into
	       $/[0], $/[1], etc.  Hence they will all be undefined if the
	       last match failed (unless they were explicitly bound in a
	       closure without using the "let" keyword).

       ·   "Match" objects have methods that provide additional information
	   about the match. For example:

		if m/ def <ident> <codeblock> / {
		    say "Found sub def from index $/.from() to index $/.to()";
		}

       ·   All match attempts--successful or not--against any rule, subrule,
	   or subpattern (see below) return an object of class "Match". That
	   is:

		$match_obj = $str ~~ /pattern/;
		say "Matched" if $match_obj;

       ·   This returned object is also automatically assigned to the lexical
	   $/ variable, unless the match statement is inside another rule.
	   That is:

		$str ~~ /pattern/;
		say "Matched" if $/;

       ·   Inside a rule, the $/ variable holds the current rule's incomplete
	   "Match" object (which can be modified via the internal $/.  For
	   example:

	       $str ~~ / foo		    # Match 'foo'
			  { $/ = 'bar' }     # But pretend we matched 'bar'
			/;
	       say $/;			     # says 'bar'

	   This is slightly dangerous, insofar as you might return something
	   that does not behave like a "Match" object to some context that
	   requires one.  Fortunately, you normally just want to return a
	   result object instead:

	       $str ~~ / foo		     # Match 'foo'
			  { return 'bar' }   # But pretend we matched 'bar'
			/;
	       say $();			     # says 'bar'

   Subpattern captures
       ·   Any part of a rule that is enclosed in capturing parentheses is
	   called a subpattern. For example:

		   #		   subpattern
		   #  _________________/\____________________
		   # |					     |
		   # |	     subpattern	 subpattern	     |
		   # |		__/\__	  __/\__	     |
		   # |	       |      |	 |	|	     |
		m:w/ (I am the (walrus), ( khoo )**{2} kachoo) /;

       ·   Each subpattern in a rule produces a "Match" object if it is
	   successfully matched.

       ·   Each subpattern's "Match" object is pushed onto the array inside
	   the outer "Match" object belonging to the surrounding scope (known
	   as its parent "Match" object). The surrounding scope may be either
	   the innermost surrounding subpattern (if the subpattern is nested)
	   or else the entire rule itself.

       ·   Like all captures, these assignments to the array are hypothetical,
	   and are undone if the subpattern is backtracked.

       ·   For example, if the following pattern matched successfully:

		   #		    subpat-A
		   #  _________________/\____________________
		   # |					     |
		   # |	       subpat-B	 subpat-C	     |
		   # |		__/\__	  __/\__	     |
		   # |	       |      |	 |	|	     |
		m:w/ (I am the (walrus), ( khoo )**{2} kachoo) /;

	   then the "Match" objects representing the matches made by subpat-B
	   and subpat-C would be successively pushed onto the array inside
	   subpat- A's "Match" object. Then subpat-A's "Match" object would
	   itself be pushed onto the array inside the "Match" object for the
	   entire rule (i.e. onto $/'s array).

       ·   As a result of these semantics, capturing parentheses in Perl 6 are
	   hierarchical, not linear (see "Nested subpattern captures").

   Accessing captured subpatterns
       ·   The array elements of a "Match" object are referred to using either
	   the standard array access notation (e.g. $/[0], $/[1], $/[2], etc.)
	   or else via the corresponding lexically scoped numeric aliases
	   (i.e.  $0, $1, $2, etc.) So:

		say "$/[1] was found between $/[0] and $/[2]";

	   is the same as:

		say "$1 was found between $0 and $2";

       ·   Note that, in Perl 6, the numeric capture variables start from $0,
	   not $1, with the numbers corresponding to the element's index
	   inside $/.

       ·   The array elements of the rule's "Match" object (i.e. $/) store
	   individual "Match" objects representing the substrings that where
	   matched and captured by the first, second, third, etc. outermost
	   (i.e. unnested) subpatterns. So these elements can be treated like
	   fully fledged match results. For example:

		if m/ (\d\d\d\d)-(\d\d)-(\d\d) (BCE?|AD|CE)?/ {
		      ($yr, $mon, $day) = $/[0..2]
		      $era = "$3" if $3;		    # stringify/boolify
		      @datepos = ( $0.from() .. $2.to() );  # Call Match methods
		}

   Nested subpattern captures
       ·   Substrings matched by nested subpatterns (i.e. nested capturing
	   parens) are assigned to the array inside the subpattern's parent
	   "Match" surrounding subpattern, not to the array of $/.

       ·   This behaviour is quite different to Perl 5 semantics:

		 # Perl 5...
		 #
		 # $1---------------------  $4---------	 $5------------------
		 # |   $2---------------  | |	       | | $6----  $7------  |
		 # |   |	 $3--	| | |	       | | |	 | |	   | |
		 # |   |	 |   |	| | |	       | | |	 | |	   | |
		m/ ( A (guy|gal|g(\S+)	) ) (sees|calls) ( (the|a) (gal|guy) ) /x;

       ·   In Perl 6, nested parens produce properly nested captures:

		 # Perl 6...
		 #
		 # $0---------------------  $1---------	 $2------------------
		 # |   $0[0]------------  | |	       | | $2[0]-  $2[1]---  |
		 # |   |       $0[0][0] | | |	       | | |	 | |	   | |
		 # |   |	 |   |	| | |	       | | |	 | |	   | |
		m/ ( A (guy|gal|g(\S+)	) ) (sees|calls) ( (the|a) (gal|guy) ) /;

   Quantified subpattern captures
       ·   If a subpattern is directly quantified (using any quantifier), it
	   no longer produces a single "Match" object. Instead, it produces a
	   list of "Match" objects corresponding to the sequence of individual
	   matches made by the repeated subpattern.

       ·   Because a quantified subpattern returns a list of "Match" objects,
	   the corresponding array element for the quantified capture will
	   store a reference to a (nested) array, rather than a single "Match"
	   object.  For example:

		if m/ (\w+) \: (\w+ \s+)* / {
		    say "Key:	 $0";	      # Unquantified --> single Match
		    say "Values: { @{$1} }";  # Quantified   --> array of Match
		}

   Indirectly quantified subpattern captures
       ·   A subpattern may sometimes be nested inside a quantified non-
	   capturing structure:

		 #	 non-capturing	     quantifier
		 #  __________/\____________  __/\__
		 # |			    ||	    |
		 # |   $0	  $1	    ||	    |
		 # |  _^_      ___^___	    ||	    |
		 # | |	 |    |	      |	    ||	    |
		m/ [ (\w+) \: (\w+ \h*)* \n ]**{2...} /

	   Non-capturing brackets don't create a separate nested lexical
	   scope, so the two subpatterns inside them are actually still in the
	   rule's top-level scope. Hence their top-level designations: $0 and
	   $1.

       ·   However, because the two subpatterns are inside a quantified
	   structure, $0 and $1 will each contain a reference to an array.
	   The elements of that array will be the submatches returned by the
	   corresponding subpattern on each iteration of the non-capturing
	   parentheses. For example:

		my $text = "foo:food fool\nbar:bard barb";

			  #   $0--     $1------
			  #   |	  |    |       |
		$text ~~ m/ [ (\w+) \: (\w+ \h*)* \n ]**{2...} /;

		# Because they're in a quantified non-capturing block...
		# $0 contains the equivalent of:
		#
		#	[ Match.new(str=>'foo'), Match.new(str=>'bar') ]
		#
		# and $1 contains the equivalent of:
		#
		#	[ Match.new(str=>'food '),
		#	  Match.new(str=>'fool' ),
		#	  Match.new(str=>'bard '),
		#	  Match.new(str=>'barb' ),
		#	]

       ·   In contrast, if the outer quantified structure is a capturing
	   structure (i.e. a subpattern) then it will introduce a nested
	   lexical scope. That outer quantified structure will then return an
	   array of "Match" objects representing the captures of the inner
	   parens for every iteration (as described above). That is:

		my $text = "foo:food fool\nbar:bard barb";

			  # $0-----------------------
			  # |			     |
			  # | $0[0]    $0[1]---	     |
			  # | |	  |    |       |     |
		$text ~~ m/ ( (\w+) \: (\w+ \h*)* \n )**{2...} /;

		# Because it's in a quantified capturing block,
		# $0 contains the equivalent of:
		#
		#	[ Match.new( str=>"foo:food fool\n",
		#		     arr=>[ Match.new(str=>'foo'),
		#			    [
		#				Match.new(str=>'food '),
		#				Match.new(str=>'fool'),
		#			    ]
		#			  ],
		#		   ),
		#	  Match.new( str=>'bar:bard barb',
		#		     arr=>[ Match.new(str=>'bar'),
		#			    [
		#				Match.new(str=>'bard '),
		#				Match.new(str=>'barb'),
		#			    ]
		#			  ],
		#		   ),
		#	]
		#
		# and there is no $1

       ·   In other words, quantified non-capturing parens collect their
	   components into handy flattened lists, whereas quantified capturing
	   parens collect their components in a handy hierarchical structure.

   Subpattern numbering
       ·   The index of a given subpattern can always be statically
	   determined, but is not necessarily unique nor always monotonic. The
	   numbering of subpatterns restarts in each lexical scope (either a
	   rule, a subpattern, or the branch of an alternation).

       ·   In particular, the index of capturing parentheses restarts after
	   each "|". Hence:

			     # $0      $1    $2	  $3	$4	     $5
		$tune_up = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
			     # $0      $1      $2    $3	       $4
			     | (every) (green) (BEM) (devours) (faces)
			     /;

	   This means that if the second alternation matches, the "@$/" array
	   will contain "('every', 'green', 'BEM', 'devours', 'faces')",
	   rather than "(undef, undef, undef, undef, undef, undef, 'every',
	   'green', 'BEM', 'devours', 'faces')" (as the same regex would in
	   Perl 5).

       ·   Note that it is still possible to mimic the monotonic Perl 5
	   capture indexing semantics.	See "Numbered scalar aliasing" below
	   for details.

   Subrule captures
       ·   Any call to a named "<rule>" within a pattern is known as a
	   subrule.

       ·   Any bracketed construct that is aliased (see Aliasing below) to a
	   named variable is also a subrule.

       ·   For example, this rule contains three subrules:

		 # subrule	 subrule      subrule
		 #  __^__    _______^______    __^__
		 # |	 |  |		   |  |	    |
		m/ <ident>  $<spaces>:=(\s*)  <digit>+ /

       ·   Just like subpatterns, each successfully matched subrule within a
	   rule produces a "Match" object. But, unlike subpatterns, that
	   "Match" object is not assigned to the array inside its parent
	   "Match" object.  Instead, it is assigned to an entry of the hash
	   inside its parent "Match" object. For example:

		 #  .... $/ .....................................
		 # :						 :
		 # :		  .... $/[0] ..................	 :
		 # :		 :			       : :
		 # : $/<ident>	 :	  $/[0]<ident>	       : :
		 # :   __^__	 :	     __^__	       : :
		 # :  |	    |	 :	    |	  |	       : :
		m:w/  <ident> \: ( known as <ident> previously ) /

   Accessing captured subrules
       ·   The hash entries of a "Match" object can be referred to using any
	   of the standard hash access notations ($/{'foo'}, "$/<bar>",
	   "$/XbazX", etc.), or else via corresponding lexically scoped
	   aliases ("$<foo>", $XbarX, "$<baz>", etc.)  So the previous example
	   also implies:

		 #    $<ident>		   $0<ident>
		 #     __^__		     __^__
		 #    |	    |		    |	  |
		m:w/  <ident> \: ( known as <ident> previously ) /

       ·   Note that it makes no difference whether a subrule is angle-
	   bracketted ("<ident>") or aliased ("$<ident> := (<alpha>\w*)". The
	   name's the thing.

   Repeated captures of the same subrule
       ·   If a subrule appears two (or more) times in any branch of a lexical
	   scope (i.e. twice within the same subpattern and alternation), or
	   if the subrule is quantified anywhere within a given scope, then
	   its corresponding hash entry is always assigned a reference to an
	   array of "Match" objects, rather than a single "Match" object.

       ·   Successive matches of the same subrule (whether from separate
	   calls, or from a single quantified repetition) append their
	   individual "Match" objects to this array. For example:

		if m:w/ mv <file> <file> / {
		    $from = $<file>[0];
		    $to	  = $<file>[1];
		}

	   Likewise, with a quantified subrule:

		if m:w/ mv <file>**{2} / {
		    $from = $<file>[0];
		    $to	  = $<file>[1];
		}

	   Likewise, with a mixture of both:

		if m:w/ mv <file>+ <file> / {
		    $to	  = pop @{$<file>};
		    @from = @{$<file>};
		}

       ·   However, if a subrule is explicitly renamed (or aliased -- see
	   Aliasing), then only the "final" name counts when deciding whether
	   it is or isn't repeated. For example:

		if m:w/ mv <file> $<dir>:=<file> / {
		    $from = $<file>;  # Only one subrule named <file>, so scalar
		    $to	  = $<dir>;   # The Capture Formerly Known As <file>
		}

	   Likewise, neither of the following constructions causes "<file>" to
	   produce an array of "Match" objects, since none of them has two or
	   more "<file>" subrules in the same lexical scope:

		if m:w/ (keep) <file> | (toss) <file> / {
		    # Each <file> is in a separate alternation, therefore <file>
		    # is not repeated in any one scope, hence $<file> is
		    # not an array ref...
		    $action = $0;
		    $target = $<file>;
		}

		if m:w/ <file> \: (<file>|none) / {
		    # Second <file> nested in subpattern which confers a
		    # different scope...
		    $actual  = $/<file>;
		    $virtual = $/[0]<file> if $/[0]<file>;
		}

       ·   On the other hand, unaliased square brackets don't confer a
	   separate scope (because they don't have an associated "Match"
	   object). So:

		if m:w/ <file> \: [<file>|none] / { # Two <file>s in same scope
		    $actual  = $/<file>[0];
		    $virtual = $/<file>[1] if $/<file>[1];
		}

   Aliasing
       Aliases can be named or numbered. They can be scalar-, array-, or hash-
       like.  And they can be applied to either capturing or non-capturing
       constructs. The following sections highlight special features of the
       semantics of some of those combinations.

       Named scalar aliasing to subpatterns

       ·   If a named scalar alias is applied to a set of capturing parens:

		   #	      ______/capturing parens\_____
		   #	     |				   |
		   #	     |				   |
		m:w/ $<key>:=( (<[A..E]>) (\d**{3..6}) (X?) ) /;

	   then the outer capturing parens no longer capture into the array of
	   $/ (like unaliased parens would). Instead the aliased parens
	   capture into the hash of $/; specifically into the hash element
	   whose key is the alias name.

       ·   So, in the above example, a successful match sets "$<key>" (i.e.
	   "$/<key>"), but not $0 (i.e. not $/[0]).

       ·   More specifically:

	   ·   "$/<key>" will contain the "Match" object that would previously
	       have been placed in $/[0].

	   ·   "$/<key>[0]" will contain the A-E letter,

	   ·   "$/<key>[1]" will contain the digits,

	   ·   "$/<key>[2]" will contain the optional X.

       ·   Another way to think about this behaviour is that aliased parens
	   create a kind of lexically scoped named subrule; that the contents
	   of the brackets are treated as if they were part of a separate
	   subrule whose name is the alias.

       Named scalar aliases applied to non-capturing brackets

       ·   If an named scalar alias is applied to a set of non-capturing
	   brackets:

		   #	      ___/non-capturing brackets\__
		   #	     |				   |
		   #	     |				   |
		m:w/ $<key>:=[ (<[A..E]>) (\d**{3..6}) (X?) ] /;

	   then the corresponding "$/<key>" object contains only the string
	   matched by the non-capturing brackets.

       ·   In particular, the array of the "$/<key>" entry is empty. That's
	   because square brackets do not create a nested lexical scope, so
	   the subpatterns are unnested and hence correspond to $0, $1, and
	   $2, and not to "$/<key>[0]", "$/<key>[1]", and "$/<key>[2]".

       ·   In other words:

	   ·   "$/<key>" will contain the complete substring matched by the
	       square brackets (in a "Match" object, as described above),

	   ·   $0 will contain the A-E letter,

	   ·   $1 will contain the digits,

	   ·   $2 will contain the optional X.

       Named scalar aliasing to subrules

       ·   If a subrule is aliased, it assigns its "Match" object to the hash
	   entry whose key is the name of the alias. And it no longer assigns
	   anything to the hash entry whose key is the subrule name. That is:

		if m:/ ID\: $<id>:=<ident> / {
		    say "Identified as $/<id>";	   # $/<ident> is undefined
		}

       ·   Hence aliasing a subrule changes the destination of the subrule's
	   "Match" object. This is particularly useful for differentiating two
	   or more calls to the same subrule in the same scope. For example:

		if m:w/ mv <file>+ $<dir>:=<file> / {
		    @from = @{$<file>};
		    $to	  = $<dir>;
		}

       Numbered scalar aliasing

       ·   If a numbered alias is used instead of a named alias:

		m/ $1:=(<-[:]>*) \:  $0:=<ident> /

	   the behaviour is exactly the same as for a named alias (i.e the
	   various cases described above), except that the resulting "Match"
	   object is assigned to the corresponding element of the appropriate
	   array, rather than to an element of the hash.

       ·   If any numbered alias is used, the numbering of subsequent
	   unaliased subpatterns in the same scope automatically increments
	   from that alias number (much like enum values increment from the
	   last explicit value). That is:

		 #  ---$1---	-$2-	---$6---    -$7-
		 # |	    |  |    |  |	|  |	|
		m/ $1:=(food)  (bard)  $6:=(bazd)  (quxd) /;

       ·   This "follow-on" behaviour is particularly useful for reinstituting
	   Perl5 semantics for consecutive subpattern numbering in
	   alternations:

		$tune_up = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
			     | $6:=(every) (green) (BEM) (devours) (faces)
			     #		   $7	   $8	 $9	   $10
			     /;

       ·   It also provides an easy way in Perl 6 to reinstitute the unnested
	   numbering semantics of nested Perl 5 subpatterns:

		 # Perl 5...
		 #		 $1
		 #  _____________/\______________
		 # |	$2	    $3	     $4	 |
		 # |  __/\___	____/\____   /\	 |
		 # | |	     | |	  | |  | |
		m/ ( (<[A..E]>) (\d**{3..6}) (X?) ) /;

		 # Perl 6...
		 #		 $0
		 #  _____________/\______________
		 # |  $0[0]	  $0[1]	   $0[2] |
		 # |  __/\___	____/\____   /\	 |
		 # | |	     | |	  | |  | |
		m/ ( (<[A..E]>) (\d**{3..6}) (X?) ) /;

		 # Perl 6 simulating Perl 5...
		 #		   $1
		 #  _______________/\________________
		 # |	    $2		$3	 $4  |
		 # |	  __/\___   ____/\____	 /\  |
		 # |	 |	 | |	      | |  | |
		m/ $1:=[ (<[A..E]>) (\d**{3..6}) (X?) ] /;

	   The non-capturing brackets don't introduce a scope, so the
	   subpatterns within them are at rule scope, and hence numbered at
	   the top level. Aliasing the square brackets to $1 means that the
	   next subpattern at the same level (i.e. the "(<[A..E]>)") is
	   numbered sequentially (i.e. $2), etc.

       Scalar aliases applied to quantified constructs

       ·   All of the above semantics apply equally to aliases which are bound
	   to quantified structures.

       ·   The only difference is that, if the aliased construct is a subrule
	   or subpattern, that quantified subrule or subpattern will have
	   returned a list of "Match" objects (as described in "Quantified
	   subpattern captures" and "Repeated captures of the same subrule").
	   So the corresponding array element or hash entry for the alias will
	   contain a reference to an array, instead of a single "Match"
	   object.

       ·   In other words, aliasing and quantification are completely
	   orthogonal.	For example:

		if m/ mv $0:=<file>+ / {
		    # <file>+ returns a list of Match objects,
		    # so $0 contains a reference to an array of Match objects,
		    # one for each successful call to <file>

		    # $/<file> does not exist (it's pre-empted by the alias)
		}

		if m/ mv $<from>:=(\S+ \s+)* / {
		    # Quantified subpattern returns a list of Match objects,
		    # so $/<from> contains a reference to an array of Match
		    # objects, one for each successful match of the subpattern

		    # $0 does not exist (it's pre-empted by the alias)
		}

       ·   Note, however, that a set of quantified non-capturing brackets
	   always returns a single "Match" object which contains only the
	   complete substring that was matched by the full set of repetitions
	   of the brackets (as described in "Named scalar aliases applied to
	   non-capturing brackets"). For example:

		"coffee fifo fumble" ~~ m/ $<effs>:=[f <-[f]>**{1..2} \s*]+ /;

		say $<effs>;	# prints "fee fifo fum"

       Array aliasing

       ·   An alias can also be specified using an array as the alias instead
	   of scalar.  For example:

		m/ mv @<from>:=[(\S+) \s+]* <dir> /;

       ·   Using the "@<alias>:=" notation instead of a "$<alias>:=" mandates
	   that the corresponding hash entry or array element always receives
	   a reference to an array of "Match" objects, even if the construct
	   being aliased would normally return a single "Match" object.	 This
	   is useful for creating consistent capture semantics across
	   structurally different alternations (by enforcing array captures in
	   all branches):

		m:w/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
		   | Mr?s? @<names>:=<ident>
		   /;

		# Aliasing to @<names> means $/<names> is always
		# an array reference, so...

		say @{$/<names>};

       ·   For convenience and consistency, "@<key>" can also be used outside
	   a regex, as a shorthand for "@{ $/<key> }". That is:

		m:w/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
		   | Mr?s? @<names>:=<ident>
		   /;

		say @<names>;

       ·   If an array alias is applied to a quantified pair of non-capturing
	   brackets, it captures the substrings matched by each repetition of
	   the brackets into separate elements of the corresponding array.
	   That is:

		m/ mv $<files>:=[ f.. \s* ]* /; # $/<files> assigned a single
						# Match object containing the
						# complete substring matched by
						# the full set of repetitions
						# of the non-capturing brackets

		m/ mv @<files>:=[ f.. \s* ]* /; # $/<files> assigned an array,
						# each element of which is a
						# C<Match> object containing
						# the substring matched by Nth
						# repetition of the non-
						# capturing bracket match

       ·   If an array alias is applied to a quantified pair of capturing
	   parens (i.e. to a subpattern), then the corresponding hash or array
	   element is assigned a list constructed by concatenating the array
	   values of each "Match" object returned by one repetition of the
	   subpattern. That is, an array alias on a subpattern flattens and
	   collects all nested subpattern captures within the aliased
	   subpattern. For example:

		if m:w/ $<pairs>:=( (\w+) \: (\N+) )+ / {
		    # Scalar alias, so $/<pairs> is assigned an array
		    # of Match objects, each of which has its own array
		    # of two subcaptures...

		    for @{$<pairs>} -> $pair {
			say "Key: $pair[0]";
			say "Val: $pair[1]";
		    }
		}

		if m:w/ @<pairs>:=( (\w+) \: (\N+) )+ / {
		    # Array alias, so $/<pairs> is assigned an array
		    # of Match objects, each of which is flattened out of
		    # the two subcaptures within the subpattern

		    for @{$<pairs>} -> $key, $val {
			say "Key: $key";
			say "Val: $val";
		    }
		}

       ·   Likewise, if an array alias is applied to a quantified subrule,
	   then the hash or array element corresponding to the alias is
	   assigned a list containing the array values of each "Match" object
	   returned by each repetition of the subrule, all flattened into a
	   single array:

		rule pair :w { (\w+) \: (\N+) \n }

		if m:w/ $<pairs>:=<pair>+ / {
		    # Scalar alias, so $/<pairs> contains an array of
		    # Match objects, each of which is the result of the
		    # <pair> subrule call...

		    for @{$<pairs>} -> $pair {
			say "Key: $pair[0]";
			say "Val: $pair[1]";
		    }
		}

		if m:w/ mv @<pairs>:=<pair>+ / {
		    # Array alias, so $/<pairs> contains an array of
		    # Match objects, all flattened down from the
		    # nested arrays inside the Match objects returned
		    # by each match of the <pair> subrule...

		    for @{$<pairs>} -> $key, $val {
			say "Key: $key";
			say "Val: $val";
		    }
		}

       ·   In other words, an array alias is useful to flatten into a single
	   array any nested captures that might occur within a quantified
	   subpattern or subrule.  Whereas a scalar alias is useful to
	   preserve within a top-level array the internal structure of each
	   repetition.

       ·   It is also possible to use a numbered variable as an array alias.
	   The semantics are exactly as described above, with the sole
	   difference being that the resulting array of "Match" objects is
	   assigned into the appropriate element of the rule's match array,
	   rather than to a key of its match hash. For example:

		if m/ mv  \s+  @0:=((\w+) \s+)+	 $1:=((\W+) (\s*))* / {
		    #	       |		 |
		    #	       |		 |
		    #	       |		  \_ Scalar alias, so $1 gets an
		    #	       |		     array, with each element
		    #	       |		     a Match object containing
		    #	       |		     the two nested captures
		    #	       |
		    #		\___ Array alias, so $0 gets a flattened array of
		    #		     just the (\w+) captures from each repetition

		    @from     = @{$0};	    # Flattened list

		    $to_str   = $1[0][0];   # Nested elems of
		    $to_gap   = $1[0][1];   #	 unflattened list
		}

       ·   Note again that, outside a rule, @0 is simply a shorthand for
	   "@{$0}", so the first assignment above could also have been
	   written:

		    @from = @0;

       Hash aliasing

       ·   An alias can also be specified using a hash as the alias variable,
	   instead of a scalar or an array. For example:

		m/ mv %<location>:=( (<ident>) \: (\N+) )+ /;

       ·   A hash alias causes the correponding hash or array element in the
	   current scope's "Match" object to be assigned a (nested) hash
	   reference (rather than an array reference or a single "Match"
	   object).

       ·   If a hash alias is applied to a subrule or subpattern then the
	   first nested numeric capture becomes the key of each hash entry and
	   any remaining numeric captures become the values (in an array if
	   there is more than one),

       ·   As with array aliases it is also possible to use a numbered
	   variable as a hash alias. Once again, the only difference is where
	   the resulting "Match" object is stored:

		rule one_to_many {  (\w+) \: (\S+) (\S+) (\S+) }

		if m:w/ %0:=<one_to_many>+ / {
		    # $/[0] contains a hash, in which each key is provided by
		    # the first subcapture within C<one_to_many>, and each
		    # value is a reference to an  array containing the
		    # subrule's second, third, and fourth, etc. subcaptures...

		    for %{$/[0]} -> $pair {
			say "One:  $pair.key";
			say "Many: { @{$pair.value} }";
		    }
		}

       ·   Outside the rule, %0 is a shortcut for "%{$0}":

		    for %0 -> $pair {
			say "One:  $pair.key";
			say "Many: { @{$pair.value} }";
		    }

       External aliasing

       ·   Instead of using internal aliases like:

		m/ mv  @<files>:=<ident>+  $<dir>:=<ident> /

	   the name of an ordinary variable can be used as an "external
	   alias", like so:

		m/ mv  @files:=<ident>+	 $dir:=<ident> /

       ·   In this case, the behaviour of each alias is exactly as described
	   in the previous sections, except that the resulting capture(s) are
	   bound directly (but still hypothetically) to the variables of the
	   specified name that exist in the scope in which the rule declared.

   Capturing from repeated matches
       ·   When an entire rule is successfully matched with repetitions
	   (specified via the ":x" or ":g" flag) or overlaps (specified via
	   the ":ov" or ":ex" flag), it will usually produce a series of
	   distinct matches.

       ·   A successful match under any of these flags still returns a single
	   "Match" object in $/. However, the values of this match object are
	   slightly different from those provided by a non-repeated match:

	   ·   The boolean value of $/ after such matches is true or false,
	       depending on whether the pattern matched.

	   ·   The string value is the substring from the start of the first
	       match to the end of the last match (including any intervening
	       parts of the string that the rule skipped over in order to find
	       later matches).

	   ·   There are no array contents or hash entries.

	   For example:

		if $text ~~ m:w:g/ (\S+:) <rocks> / {
		    say 'Full match context is: [$/]';
		}

       ·   The list of individual match objects corresponding to each separate
	   match is also available, via the ".matches" method. For example:

		if $text ~~ m:w:g/ (\S+:) <rocks> / {
		    say "Matched { +$/.matches } times";

		    for $/.matches -> $m {
			say "Match between $m.from() and $m.to()";
			say 'Right on, dude!' if $m[0] eq 'Perl';
			say "Rocks like $m<rocks>";
		    }
		}

   ":keepall"
       ·   All rules remember everything if ":keepall" is in effect anywhere
	   in the outer dynamic scope.	In this case everything inside the
	   angles is used as part of the key.  Suppose the earlier example
	   parsed whitespace:

		/ <key> <?ws> <'=>'> <?ws> <value> { %hash{$<key>} = $<value> } /

	   The two instances of "<?ws>" above would store an array of two
	   values accessible as "@<?ws>".  It would also store the literal
	   match into "$<'=\>'>".  Just to make sure nothing is forgotten,
	   under ":keepall" any text or whitespace not otherwise remembered is
	   attached as an extra property on the subsequent node. (The name of
	   that property is ""pretext"".)

Grammars
       ·   Your private "ident" rule shouldn't clobber someone else's "ident"
	   rule.  So some mechanism is needed to confine rules to a namespace.

       ·   If subs are the model for rules, then modules/classes are the
	   obvious model for aggregating them.	Such collections of rules are
	   generally known as "grammars".

       ·   Just as a class can collect named actions together:

		class Identity {
		    method name { "Name = $.name" }
		    method age	{ "Age	= $.age"  }
		    method addr { "Addr = $.addr" }

		    method desc {
			print &.name(), "\n",
			      &.age(),	"\n",
			      &.addr(), "\n";
		    }

		    # etc.
		}

	   so too a grammar can collect a set of named rules together:

		grammar Identity {
		    rule name :w { Name = (\N+) }
		    rule age  :w { Age	= (\d+) }
		    rule addr :w { Addr = (\N+) }
		    rule desc {
			<name> \n
			<age>  \n
			<addr> \n
		    }

		    # etc.
		}

       ·   Like classes, grammars can inherit:

		grammar Letter {
		    rule text	  { <greet> <body> <close> }

		    rule greet :w { [Hi|Hey|Yo] $<to>:=(\S+?) , $$}

		    rule body	  { <line>+ }

		    rule close :w { Later dude, $<from>:=(.+) }

		    # etc.
		}

		grammar FormalLetter is Letter {

		    rule greet :w { Dear $<to>:=(\S+?) , $$}

		    rule close :w { Yours sincerely, $<from>:=(.+) }

		}

       ·   Just like the methods of a class, the rule definitions of a grammar
	   are inherited (and polymorphic!). So there's no need to respecify
	   "body", "line", etc.

       ·   Perl 6 will come with at least one grammar predefined:

		grammar Perl {	  # Perl's own grammar

		    rule prog { <statement>* }

		    rule statement { <decl>
			      | <loop>
			      | <label> [<cond>|<sideff>|;]
		    }

		    rule decl { <sub> | <class> | <use> }

		    # etc. etc. etc.
		}

       ·   Hence:

		given $source_code {
		    $parsetree = m:keepall/<Perl.prog>/;
		}

Syntactic categories
       For writing your own backslash and assertion rules or macros, you may
       use the following syntactic categories:

	    rule rxbackslash:<w> { ... }    # define your own \w and \W
	    rule rxassertion:<*> { ... }    # define your own <*stuff>
	    macro rxmetachar:<,> { ... }    # define a new metacharacter
	    macro rxmodinternal:<x> { ... } # define your own /:x() stuff/
	    macro rxmodexternal:<x> { ... } # define your own m:x()/stuff/

       As with any such syntactic shenanigans, the declaration must be visible
       in the lexical scope to have any effect.	 It's possible the
       internal/external distinction is just a trait, and that some of those
       things are subs or methods rather than rules or macros.	(The numeric
       rxmods are recognized by fallback macros defined with an empty operator
       name.)

Pragmas
       The "rx" pragma may be used to control various aspects of regex
       compilation and usage not otherwise provided for.

Transliteration
       ·   The "tr///" quote-like operator now also has a method form called
	   "trans()".  Its argument is a list of pairs.	 You can use anything
	   that produces a pair list:

		$str.trans( %mapping.pairs.sort );

	   Use the .= form to do a translation in place:

		$str.=trans( %mapping.pairs.sort );

       ·   The two sides of the any pair can be strings interpreted as "tr///"
	   would:

		$str.=trans( 'A..C' => 'a..c', 'XYZ' => 'xyz' );

	   As a degenerate case, each side can be individual characters:

		$str.=trans( 'A'=>'a', 'B'=>'b', 'C'=>'c' );

       ·   The two sides of each pair may also be array references:

		$str.=trans( ['A'..'C'] => ['a'..'c'], <X Y Z> => <x y z> );

       ·   The array version can map one-or-more characters to one-or-more
	   characters:

		$str.=trans( [' ',	'<',	'>',	'&'    ] =>
			     [' ', '<', '>', '&' ]);

	   In the case that more than one sequence of input characters
	   matches, the longest one wins.  In the case of two identical
	   sequences the first in order wins.

	   There are also method forms of "m//" and "s///":

		$str.match(//);
		$str.subst(//, "replacement")
		$str.subst(//, {"replacement"})
		$str.=subst(//, "replacement")
		$str.=subst(//, {"replacement"})

Matching against non-strings
       ·   Anything that can be tied to a string can be matched against a
	   rule. This feature is particularly useful with input streams:

		my $stream is from($fh);       # tie scalar to filehandle

		# and later...

		$stream ~~ m/pattern/;	       # match from stream

	   An array can be matched against a rule.  The special "<,>" rule
	   matches the boundary between elements.  If the array elements are
	   strings, they are concatenated virtually into a single logical
	   string.  If the array elements are tokens or other such objects,
	   the objects must provide appropriate methods for the kinds of rules
	   to match against.  It is an assertion error to match a string-
	   matching assertion against an object that doesn't provide a string
	   view.  However, pure token objects can be parsed as long as the
	   match rule restricts itself to assertions like:

		<.isa(Dog)>
		<.does(Bark)>
		<.can('scratch')>

	   It is permissible to mix tokens and strings in an array as long as
	   they're in different elements.  You may not embed objects in
	   strings, however.

	   To match against each element of an array, use a hyper operator:

		@arrayX.match($rule)

perl v5.14.0			  2011-06-17		  Perl6::Bible::S05(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net