Perl6::Bible::S09 man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Perl6::Bible::S09(3)  User Contributed Perl Documentation Perl6::Bible::S09(3)

NAME
       Synopsis_09 - Data Structures

AUTHOR
       Larry Wall <larry@wall.org>

VERSION
	 Maintainer: Larry Wall <larry@wall.org>
	 Date: 13 Sep 2004
	 Last Modified: 22 Nov 2005
	 Number: 9
	 Version: 5

Overview
       This synopsis summarizes the non-existent Apocalypse 9, which discussed
       in detail the design of Perl 6 data structures.	It was primarily a
       discussion of how the existing features of Perl 6 combine to make it
       easier for the PDL folks to write numeric Perl.

Lazy lists
       All list contexts are lazy by default.  They still flatten eventually,
       but only when forced to.	 You have to use unary "**" to get a non-lazy
       flattening list context (that is, to flatten immediately like Perl 5).

Sized types
       Sized low-level types are named most generally by appending the number
       of bits to a generic low-level type name:

	   int1
	   int2
	   int4
	   int8
	   int16
	   int32       (aka int on 32-bit machines)
	   int64       (aka int on 64-bit machines)

	   uint1       (aka bit)
	   uint2
	   uint4
	   uint8       (aka byte)
	   uint16
	   uint32
	   uint64

	   num32
	   num64       (aka num on most architectures)
	   num128

	   complex32
	   complex64   (aka complex on most architectures)
	   complex128

       Complex sizes indicate the size of each "num" component rather than the
       total.  This would extend to tensor typenames as well if they're built-
       in types.  Of course, the typical tensor structure is just reflected in
       the dimensions of the array--but the principle still holds that the
       name is based on the number of bits of the simple base type.

       The unsized types "int" and "num" are based on the architecture's
       normal size for "int" and "double" in whatever version of C the run-
       time system (presumably Parrot) is compiled in.	So "int" typically
       means "int32" or "int64", while "num" usually means "num64", and
       "complex" means two of whatever "num" turns out to be.

       You are, of course, free to use macros or type declarations to
       associate additional names, such as "short" or "single".	 These are not
       provided by default.  An implementation of Perl is not required to
       support 64-bit integer types or 128-bit floating-point types unless the
       underlying architecture supports them.

       And yes, an "int1" can store only -1 or 0.  I'm sure someone'll think
       of a use for it...

Compact structs
       A class whose attributes are all low-level types can behave as a
       struct.	(Access from outside the class is still only through
       accessors, though.)  Whether such a class is actually stored compactly
       is up to the implementation, but it ought to behave that way, at least
       to the extent that it's trivially easy (from the user's perspective) to
       read and write to the equivalent C structure.  That is, when byte-
       stringified, it should look like the C struct, even if that's not how
       it's actually represented inside the class.  (This is to be construed
       as a substitute for at least some of the current uses of
       "pack"/"unpack".)

Compact arrays
       In declarations of the form:

	   my bit @bits;
	   my int @ints;
	   my num @nums;
	   my int4 @nybbles;
	   my buf @buffers;
	   my ref[Array] @ragged2d;
	   my complex128 @longdoublecomplex;

       the presence of a low-level type tells Perl that it is free to
       implement the array with "compact storage", that is, with a chunk of
       memory containing contiguous (or as contiguous as practical) elements
       of the specified type without any fancy object boxing that typically
       applies to undifferentiated scalars.  (Perl tries really hard to make
       these elements look like objects when you treat them like objects--this
       is called autoboxing.)

       The declarations above declare one-dimensional arrays of indeterminate
       length.	Such arrays are autoextending just like ordinary Perl arrays
       (at the price of occasionally copying the block of data to another
       memory location).  For many purposes, though, it's useful to define
       array types of a particular size and shape that, instead of
       autoextending, throw an exception if you try to access outside their
       declared dimensionality.	 Such arrays tend to be faster to allocate and
       access as well.

       A multidimensional array is indexed by a semicolon list, which is
       really a list of pipes in disguise.  Each sublist is a slice/pipe of
       one particular dimension.  So

	   @array[0..10; 42; @x]

       is really short for

	   @array.postcircumfix:<[ ]>( <== 0..10 <== 42 <== @x );

       The compiler is free to optimize to something faster when it is known
       that lazy multidimensional subscripts are not necessary.

       Note that

	   @array[@x,@y]

       is always interpreted as a one-dimensional slice in the outermost
       dimension, which is the same as:

	   @array[@x,@y;]

       or more verbosely:

	   @array.postcircumfix:<[ ]>( <== @x,@y );

       To interpolate an array at the semicolon level rather than the comma
       level, use the "[;]" reduce operator:

	   @array[[;] @x]

       which is equivalent to

	   @array.postcircumfix:<[ ]>( <== @x[0] <== @x[1] <== @x[2]...);

       Alternately, use a multislice array, indicated by a ";" twigil:

	   @array[@;x]

       Multislice arrays have the advantage of being able to collect
       individual pipes and keep them distinct:

	   my @;x;
	   @;x <==  %hash.keys.grep: {/^X/};
	   @;x <== =<>;
	   @;x <== 1...;
	   @;x <== gather { loop { take rand 100 } };

	   %hash{@;x}

       To declare a multidimensional array, you may declare it with a
       signature as if it were a function returning one of its entries:

	   my num @nums (Int);	 # one dimension, @nums[Int]

       or alternately:

	   my @nums (Int --> num);   # one dimension, @nums[Int]

       You can use ranges as types:

	   my @nums (0..2 --> num);   # one dimension, @nums[0..2]
	   my @ints (0..3, 0..1 --> int);   # one dimension, @ints[0..3; 0..1]

       That includes the "upto" range type:

	   my @ints (^4, ^2 --> int);	# one dimension, @ints[0..3; 0..1]

       You can pretend you're programming in Fortran, or awk:

	   my int @ints (1..4, 1..2); # two dimensions, @ints[1..4; 1..2]

       Note that this only influences your view of the array, not the actual
       shape of the array.  If you pass this array to another module, it will
       see it as have a shape of "(0..3, 0..1)" unless it also declares a
       variable to view it differently.

       Alternately, you may declare it using a prototype subscript, but then
       you must remember to use semicolons instead of commas to separate
       dimensions, because each slice represents an enumeration of the
       possible values, so the following are all equivalent:

	   my @ints (0..3, 0..1 --> int);
	   my int @ints (0..3, 0..1);
	   my int @ints[^4;^2];
	   my int @ints[0..3; 0..1];
	   my int @ints[0,1,2,3; 0,1];

       You can pass a multislice for the shape as well:

	   my int @ints[[;]@fooshape];
	   my int @ints[@;fooshape];   # same thing

       Again, the "[;]" list operator interpolates a list into a semicolon
       list, which we do for consistency with subscript notation, not because
       it makes a great deal of sense to allow slices for dimensional specs
       (apart from ranges).  So while the following is okay:

	   my int @ints[0,1,2,3,4];    # same as 0..4

       the following is a semantic error that the compiler should catch:

	   my int @ints[^3,^3,^3];     # oops, comma instead of semicolon

       The shape may be supplied entirely by the object at run-time:

	   my num @nums = Array of num.new(:shape(^3;^3;^3));
	   my num @nums .=new():shape(^3;^3;^3); # same thing

       Any dimension of the array may be specified as ""Int"", in which case
       that dimension will autoextend.	Typically this would be used in the
       final dimension to make a ragged array functionally equivalent to an
       array of arrays:

	   my int @ints[^42; Int];
	   push(@ints[41], getsomeints());

       The shape may also be specified by types rather than sizes:

	   my int @ints[Even; Odd];

       or by both:

	   my int @ints[0..100 where Even; 1..99 where Odd];

       (presuming "Even" and "Odd" are types already constrained to be even or
       odd).

PDL support
       An array @array can be tied to a piddle at declaration time:

	   my num @array[@;mytensorshape] is Piddle;
	   my @array is Piddle(:shape(^2;^2;^2;^2)) of int8;

       Piddles are allowed to assume a type of "num" by default rather than
       the usual simple scalar.	 (And in general, the type info is merely made
       available to the "tie" implementation to do with what it will.  Some
       data structures may ignore the "of" type and just store everything as
       general scalars.	 Too bad...)

       Arrays by default are one dimensional, but may be declared to have any
       dimensionality supported by the implementation.	You may use arrays
       just like scalar references--the main caveat is that you have to use
       binding rather than assignment to set one without copying:

	   @b := @a[0...:by(2)]

       With piddles in particular, this might alias each of the individual
       elements rather than the array as a whole.  So modifications to @b are
       likely to be reflected back into @a.  (But maybe the PDLers will prefer
       a different notation for that.)

       The dimensionality of an array may be declared on the variable, but the
       actual dimensionality of the array depends on how it was created.
       Reconciling these views is a job for the particular array
       implementation.	It's not necessarily the case that the declared
       dimensionality must match the actual dimensionality.  It's quite
       possible that the array variable is deliberately declared with a
       different dimensionality to provide a different "view" on the actual
       value:

	   my int @array[^2;^2] is Puddle .= new(:shape(^4) <== 0,1,2,3);

       Again, reconciling those ideas is up to the implementation, "Puddle" in
       this case.  The traits system is flexible enough to pass any metadata
       required, including ideas about sparseness, raggedness, and various
       forms of non-rectangleness such as triangleness.	 The implementation
       should probably carp about any metadata it doesn't recognize though.
       The implementation is certainly free to reject any object that doesn't
       conform to the variable's shape requirements.

Subscript and slice notation
       A subscript indicates a "slice" of an array.  Each dimension of an
       array is sliced separately, so we say a subscript is a semicolon-
       separated list of slice specifiers, also known as a multislice.	A
       three-dimensional slice might look like this:

	   @x[0..10; 1,0; 1...:by(2)]

       It is up to the implementation of @x to decide how aggressively or
       lazily this subscript is evaluated, and whether the slice entails
       copying.	 (The PDL folks will generally want it to merely produce a
       virtual piddle where the new array aliases its values back into the old
       one.)

       Of course, a single element can be selected merely by providing a
       single index value to each slice list:

	   @x[0;1;42]

The semicolon operator
       At the statement level, a semicolon terminates the current expression.
       Within any kind of bracketing construct, semicolon notionally separates
       slices, the interpretation of which depends on the context.  Such a
       semicolon list always provides list context to each of its sublists.
       The storage of these sublists is hidden in the inner workings of the
       list.  It does not produce a list of lists.

       Single dimensional arrays expect simple slice subscripts, meaning they
       will treat a list subscript as a slice in the single dimension of the
       array.  Multi-dimensional arrays, on the other hand, know how to handle
       multiple slices, one for each dimension.	 You need not specify all the
       dimensions; if you don't, the unspecified dimensions are "wildcarded".
       Supposing you have:

	   my num @nums[^3;^3;^3];

       Then

	   @nums[0..2]

       is the same as

	   @nums[0..2;]

       which is the same as

	   @nums[0,1,2;*;*]

       But you should maybe write the last form anyway just for good
       documentation, unless you don't actually know how many more dimensions
       there are.

       If you wanted that 0..2 range to mean

	   @nums[0;1;2]

       instead, then you need to use the "[;]" reduction operator:

	   @nums[[;] 0..2]

       The zero-dimensional slice:

	   @x[]

       is assumed to want everything, not nothing.  It's particularly handy
       because Perl 6 (unlike Perl 5) won't interpolate a bare array without
       brackets:

	   @x = (1,2,3);
	   say "@x = @x[]";    # prints @x = 1 2 3

       Lists are lazy in Perl 6, and the slice lists are no exception.	In
       particular, things like range objects are not flattened until they need
       to be, if ever.	So a PDL implementation is free to steal the values
       from these ranges and "piddle" around with them:

	   @nums[$min..$max:by(3)]
	   @nums[$min..$max]
	   @nums[$min...:by(3)]
	   @nums[1...:by(2)]	       # the odds
	   @nums[0...:by(2)]	       # the evens

       That's all just the standard Perl 6 notation for ranges.	 Additional
       syntactic relief is always available as long as it's predeclared
       somehow.	 It's possible the range operator could be taught that ":2"
       means :by(2), for instance.  (But I rather dislike the RFC-proposed
       "0:10:2" notation that makes colon mean two different things so close
       together, plus it conflicts with Perl 6's general adverb notation if
       the next thing is alphabetic.  On top of which, we're using ":2" as a
       general radix notation.)

       Another thing that's not going to fly easily is simply dropping out
       terms.  Perl depends rather heavily on knowing when it's expecting a
       term or an operator, and simply leaving out terms before or after a
       binary operator really screws that up.  For instance,

	   0..:by(2)

       parses as

	   0 .. (by => 2)

       rather than

	   0 .. Inf :by(2)

       That why we have postfix "..." to mean "..Inf".	But then if you leave
       out the first argument:

	   ...:by(2)

       you've written the yada-yada-yada operator, which is actually a term
       that will not produce an infinite range for you.	 Don't do that.

       Maybe you should just find some nice Unicode characters for your
       operators...

PDL signatures
       To rewrite a Perl 5 PDL definition like this:

	      pp_def(
		   'inner',
		   Pars => 'a(n); b(n); [o]c(); ', # the signature, see above
		   Code => 'double tmp = 0;
			    loop(n) %{ tmp += $a() * $b(); %}
			    $c() = tmp;' );

       you might want to write a macro that parses something vaguely
       resembling this:

	   role PDL_stuff[::TYPE] {
	       PDLsub inner (@a[$n], @b[$n] --> @c[]) {
		   my TYPE $tmp = 0;
		   for ^$n {
		       $tmp += @a[$_] * @b[$_];
		   }
		   @c[] = tmp;
	       }
	   }

       where that turns into something like this:

	   role PDL_stuff[::TYPE] {
	       multi sub inner (TYPE @a, TYPE @b --> TYPE) {
		   my $n = @a.shape[0];	       # or maybe $n is just a parameter
		   assert($n == @b.shape[0]);  #  and this is already checked by PDL
		   my TYPE $tmp = 0;
		   for ^$n {
		       $tmp += @a[$_] * @b[$_];
		   }
		   return $tmp;
	       }
	   }

       Then any class that "does PDL_stuff[num]" has an "inner()" function
       that can (hopefully) be compiled down to a form useful to the PDL
       threading engine.  Presumably the macro also stores away the PDL
       signature somewhere safe, since the translated code hides that
       information down in procedural code.  Possibly some of the "[n]"
       information can come back into the signature via "where" constraints on
       the types.  This would presumably make multimethod dispatch possible on
       similarly typed arrays with differing constraints.

       (The special destruction problems of Perl 5's PDL should go away with
       Perl 6's GC approach, as long as PDL's objects are registered with
       Parrot correctly.)

Junctions
       A junction is a superposition of data values pretending to be a single
       data value.  Junctions come in four varieties:

	   list op     infix op
	   =======     ========
	   any()       |
	   all()       &
	   one()       ^
	   none()      (no "nor" op defined)

       Note that the infix ops are "list-associative", insofar as

	   $a | $b | $c
	   $a & $b & $c
	   $a ^ $b ^ $c

       mean

	   any($a,$b,$c)
	   all($a,$b,$c)
	   one($a,$b,$c)

       rather than

	   any(any($a,$b),$c)
	   all(all($a,$b),$c)
	   one(one($a,$b),$c)

       Some contexts, such as boolean contexts, have special rules for dealing
       with junctions.	In any scalar context not expecting a junction of
       values, a junction produces automatic parallelization of the algorithm.
       In particular, if a junction is used as an argument to any routine
       (operator, closure, method, etc.), and the scalar parameter you are
       attempting to bind the argument to is inconsistent with the Junction
       type, that routine is "autothreaded", meaning the routine will be
       called automatically as many times as necessary to process the
       individual scalar elements of the junction in parallel.

       The results of these separate calls are then recombined into a single
       junction of the same species as the junctive argument.  If two or more
       arguments are junctive, then the argument that is chosen to be
       "autothreaded" is:

       ·   the left-most conjunction or injunction (if any), or else

       ·   the left-most abjunction or disjunction

       with the tests applied in that order.

       Each of the resulting set of calls is then recursively autothreaded
       until no more junctive arguments remain. That is:

	      substr("camel", 0|1, 2&3)

	   -> all( substr("camel", 0|1, 2),	 # autothread the conjunctive arg
		   substr("camel", 0|1, 3)
		 )

	   -> all( any( substr("camel", 0, 2),	 # autothread the disjunctive arg
			substr("camel", 1, 2),
		      ),
		   any( substr("camel", 0, 3),	 # autothread the disjunctive arg
			substr("camel", 1, 3),
		      )
		 )

	   -> all( any( "ca",			 # evaluate
			"am",
		      ),
		   any( "cam",
			"ame",
		      )

	   -> ("ca"|"am") & ("cam"|"ame")	 # recombine results in junctions

       Junctions passed as part of a container do not cause autothreading
       unless individually pulled out and used as a scalar.  It follows that
       junctions passed as members of a "slurpy" array or hash do not cause
       autothreading on that parameter.	 Only individually declared parameters
       may autothread.	(Note that positional array and hash parameters are in
       fact scalar parameters, though, so you could pass a junction of array
       or hash references.)

Parallelized parameters and autothreading
       Within the scope of a "use autoindex" pragma (or equivalent, such as
       "use PDL" (maybe)), any closure that uses parameters as subscripts is
       also a candidate for autothreading.  For each such parameter, the
       compiler supplies a default value that is a range of all possible
       values that subscript can take on (where "possible" is taken to mean
       the declared shape of a shaped array, or the actual shape of an
       autoextending array).  That is, if you have a closure of the form:

	   -> $x, $y { @foo[$x;$y] }

       then the compiler adds defaults for you, something like:

	   -> $x = @foo.shape[0].range,
	      $y = @foo.shape[1].range { @foo[$x;$y] }

       where each such range is autoiterated for you.

       In the abstract (and often in the concrete), this puts an implicit loop
       around the block of the closure that visits all the possible subscript
       values for that dimension (unless the parameter is actually supplied to
       the closure, in which case that is what is used as the slice
       subscript).

       So to write a typical tensor multiplication:

	   Cijkl = Aij * Bkl

       you can just write this:

	   use autoindex;
	   do { @c[$^i, $^j, $^k, $^l] = @a[$^i, $^j] * @b[$^k, $^l] };

       or equivalently:

	   -> $i, $j, $k, $l { @c[$i, $j, $k, $l] = @a[$i, $j] * @b[$k, $l] }();

       or even:

	   do -> $i, $j, $k, $l {
	       @c[$i, $j, $k, $l] = @a[$i, $j] * @b[$k, $l]
	   }

       That's almost pretty.

       It is erroneous for an unbound parameter to match multiple existing
       array subscripts differently.  (Arrays being created don't count.)

       Note that you could pass any of $i, $j, $k or $l explicitly, or prebind
       them with a ".assuming" method, in which only the unbound parameters
       autothread.

       If you use an unbound array parameter as a semicolon-list interpolator
       (via the "[;]" reduction operator), it functions as a wildcard list of
       subscripts that must match the same everywhere that parameter is used.
       For example,

	   do -> @wild { @b[[;] reverse @wild] = @a[[;] @wild]; };

       produces an array with the dimensions reversed regardless of the
       dimensionality of @a.

       The optimizer is, of course, free to optimize away any implicit loops
       that it can figure out how to do more efficiently without changing the
       semantics.

       See RFC 207 for more ideas on how to use autothreading (though the
       syntax proposed there is rather different).

Hashes
       Everything we've said for arrays applies to hashes as well, except that
       if you're going to limit the keys of one dimension of a hash, you have
       to provide an explicit list of keys to that dimension of the shape, or
       an equivalent range:

	   my num %hash{<a b c d e f>; Str};
	   my num %hash{'a'..'f'; Str};		       # same thing

       To declare a hash that can take any object as a key rather than just a
       string, say something like:

	   my %hash{Any};

       Likewise, you can limit the keys to objects of particular types:

	   my Fight %hash{Dog;Cat};

       The standard Hash is just

	   my Any %hash{Str};

       Note that any type used as a key must be intrinsically immutable, or it
       has to be able to make a copy that functions as an immutable key, or it
       has to have copy-on-write semantics.  It is erroneous to change a key
       object's value within the hash except by deleting it and reinserting
       it.

Autosorted hashes
       The default hash iterator is a property called ".iterator" that can be
       user replaced.  When the hash itself needs an iterator for ".pairs",
       ".keys", ".values", or ".kv", it calls "%hash.iterator()" to start one.
       In scalar context, ".iterator" returns an iterator object.  In list
       context, it returns a lazy list fed by the iterator.  It must be
       possible for a hash to be in more than one iterator at at time, as long
       as the iterator state is stored in a lazy list.	However, there is only
       one implicit iterator (the "each" iterator) that works in scalar
       context to return the next pair.	 [Or maybe not.]

       The downside to making a hash autosort via the iterator is that you'd
       have to store all the keys in sorted order, and resort it when the hash
       changes.	 Alternately, the entire hash could be tied to an ISAM
       implementation (not included (XXX or should it be?)).

       For multidimensional hashes, the key returned by any hash iterator is a
       list of keys, the size of which is the number of declared dimensions of
       the hash.  [XXX but this seems to imply another lookup to find the
       value.  Perhaps the associated value can also be bundled in somehow.]

Autovivification
       Autovivification will only happen if the vivifiable path is used as a
       container, by binding, assigning, or taking a reference. On the other
       hand, value extraction does not autovivify.

       This is as opposed to Perl 5, where autovivification could happen
       unintentionally, even when the code looks like a non-destructive test:

	   my %hash;
	   exists $hash{foo}{bar}; # creates $hash{foo} as an empty hash reference

       In Perl 6 these read-only operations are indeed non-destructive:

	   my %hash;
	   exists $hash{foo}{bar}; # %hash is still empty

       But these ones do autovivify:

	   my %hash;
	   my $val := $hash{foo}{bar};

	   my @array;
	   my $ref = \$array[0][0];

	   my %hash;
	   $hash{foo}{bar} = "foo"; # duh

       This rule applies to dereferencing arrays, hashes, and scalar
       references.

perl v5.14.0			  2006-02-28		  Perl6::Bible::S09(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net