struct::list(n) Tcl Data Structures struct::list(n)______________________________________________________________________________NAME
struct::list - Procedures for manipulating lists
SYNOPSIS
package require Tcl 8.0
package require struct::list ?1.7?
::struct::list longestCommonSubsequence sequence1 sequence2 ?maxOccurs?
::struct::list longestCommonSubsequence2 sequence1 sequence2 ?maxOc‐
curs?
::struct::list lcsInvert lcsData len1 len2
::struct::list lcsInvert2 lcs1 lcs2 len1 len2
::struct::list lcsInvertMerge lcsData len1 len2
::struct::list lcsInvertMerge2 lcs1 lcs2 len1 len2
::struct::list reverse sequence
::struct::list assign sequence varname ?varname?...
::struct::list flatten ?-full? ?--? sequence
::struct::list map sequence cmdprefix
::struct::list mapfor var sequence script
::struct::list filter sequence cmdprefix
::struct::list filterfor var sequence expr
::struct::list split sequence cmdprefix ?passVar failVar?
::struct::list fold sequence initialvalue cmdprefix
::struct::list shift listvar
::struct::list iota n
::struct::list equal a b
::struct::list repeat size element1 ?element2 element3...?
::struct::list repeatn value size...
::struct::list dbJoin ?-inner|-left|-right|-full? ?-keys varname? {key‐
col table}...
::struct::list dbJoinKeyed ?-inner|-left|-right|-full? ?-keys varname?
table...
::struct::list swap listvar i j
::struct::list firstperm list
::struct::list nextperm perm
::struct::list permutations list
::struct::list foreachperm var list body
_________________________________________________________________DESCRIPTION
The ::struct::list namespace contains several useful commands for pro‐
cessing Tcl lists. Generally speaking, they implement algorithms more
complex or specialized than the ones provided by Tcl itself.
It exports only a single command, struct::list. All functionality pro‐
vided here can be reached through a subcommand of this command.
COMMANDS
::struct::list longestCommonSubsequence sequence1 sequence2 ?maxOccurs?
Returns the longest common subsequence of elements in the two
lists sequence1 and sequence2. If the maxOccurs parameter is
provided, the common subsequence is restricted to elements that
occur no more than maxOccurs times in sequence2.
The return value is a list of two lists of equal length. The
first sublist is of indices into sequence1, and the second sub‐
list is of indices into sequence2. Each corresponding pair of
indices corresponds to equal elements in the sequences; the
sequence returned is the longest possible.
::struct::list longestCommonSubsequence2 sequence1 sequence2 ?maxOc‐
curs?
Returns an approximation to the longest common sequence of ele‐
ments in the two lists sequence1 and sequence2. If the maxOc‐
curs parameter is omitted, the subsequence computed is exactly
the longest common subsequence; otherwise, the longest common
subsequence is approximated by first determining the longest
common sequence of only those elements that occur no more than
maxOccurs times in sequence2, and then using that result to
align the two lists, determining the longest common subsequences
of the sublists between the two elements.
As with longestCommonSubsequence, the return value is a list of
two lists of equal length. The first sublist is of indices into
sequence1, and the second sublist is of indices into sequence2.
Each corresponding pair of indices corresponds to equal elements
in the sequences. The sequence approximates the longest common
subsequence.
::struct::list lcsInvert lcsData len1 len2
This command takes a description of a longest common subsequence
(lcsData), inverts it, and returns the result. Inversion means
here that as the input describes which parts of the two
sequences are identical the output describes the differences
instead.
To be fully defined the lengths of the two sequences have to be
known and are specified through len1 and len2.
The result is a list where each element describes one chunk of
the differences between the two sequences. This description is a
list containing three elements, a type and two pairs of indices
into sequence1 and sequence2 respectively, in this order. The
type can be one of three values:
added Describes an addition. I.e. items which are missing in
sequence1 can be found in sequence2. The pair of indices
into sequence1 describes where the added range had been
expected to be in sequence1. The first index refers to
the item just before the added range, and the second
index refers to the item just after the added range. The
pair of indices into sequence2 describes the range of
items which has been added to it. The first index refers
to the first item in the range, and the second index
refers to the last item in the range.
deleted
Describes a deletion. I.e. items which are in sequence1
are missing from sequence2. The pair of indices into
sequence1 describes the range of items which has been
deleted. The first index refers to the first item in the
range, and the second index refers to the last item in
the range. The pair of indices into sequence2 describes
where the deleted range had been expected to be in
sequence2. The first index refers to the item just before
the deleted range, and the second index refers to the
item just after the deleted range.
changed
Describes a general change. I.e a range of items in
sequence1 has been replaced by a different range of items
in sequence2. The pair of indices into sequence1
describes the range of items which has been replaced. The
first index refers to the first item in the range, and
the second index refers to the last item in the range.
The pair of indices into sequence2 describes the range of
items replacing the original range. Again the first index
refers to the first item in the range, and the second
index refers to the last item in the range.
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion = {{deleted {0 0} {-1 0}}
{changed {3 3} {2 2}}
{deleted {6 7} {4 5}}
{added {10 11} {8 8}}}
Notes:
· An index of -1 in a deleted chunk refers to just before
the first element of the second sequence.
· Also an index equal to the length of the first sequence
in an added chunk refers to just behind the end of the
sequence.
::struct::list lcsInvert2 lcs1 lcs2 len1 len2
Similar to lcsInvert. Instead of directly taking the result of a
call to longestCommonSubsequence this subcommand expects the
indices for the two sequences in two separate lists.
::struct::list lcsInvertMerge lcsData len1 len2
Similar to lcsInvert. It returns essentially the same structure
as that command, except that it may contain chunks of type
unchanged too.
These new chunks describe the parts which are unchanged between
the two sequences. This means that the result of this command
describes both the changed and unchanged parts of the two
sequences in one structure.
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion/Merge = {{deleted {0 0} {-1 0}}
{unchanged {1 2} {0 1}}
{changed {3 3} {2 2}}
{unchanged {4 5} {3 4}}
{deleted {6 7} {4 5}}
{unchanged {8 10} {5 7}}
{added {10 11} {8 8}}}
::struct::list lcsInvertMerge2 lcs1 lcs2 len1 len2
Similar to lcsInvertMerge. Instead of directly taking the result
of a call to longestCommonSubsequence this subcommand expects
the indices for the two sequences in two separate lists.
::struct::list reverse sequence
The subcommand takes a single sequence as argument and returns a
new sequence containing the elements of the input sequence in
reverse order.
::struct::list assign sequence varname ?varname?...
The subcommand assigns the first n elements of the input
sequence to the one or more variables whose names were listed
after the sequence, where n is the number of specified vari‐
ables.
If there are more variables specified than there are elements in
the sequence the empty string will be assigned to the superflu‐
ous variables.
If there are more elements in the sequence than variable names
specified the subcommand returns a list containing the unas‐
signed elements. Else an empty list is returned.
tclsh> ::struct::list assign {a b c d e} foo bar
c d e
tclsh> set foo
a
tclsh> set bar
b
::struct::list flatten ?-full? ?--? sequence
The subcommand takes a single sequence and returns a new
sequence where one level of nesting was removed from the input
sequence. In other words, the sublists in the input sequence are
replaced by their elements.
The subcommand will remove any nesting it finds if the option
-full is specified.
tclsh> ::struct::list flatten {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 {8 9} 10
tclsh> ::struct::list flatten -full {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 8 9 10
::struct::list map sequence cmdprefix
The subcommand takes a sequence to operate on and a command pre‐
fix (cmdprefix) specifying an operation, applies the command
prefix to each element of the sequence and returns a sequence
consisting of the results of that application.
The command prefix will be evaluated with a single word appended
to it. The evaluation takes place in the context of the caller
of the subcommand.
tclsh> # squaring all elements in a list
tclsh> proc sqr {x} {expr {$x*$x}}
tclsh> ::struct::list map {1 2 3 4 5} sqr
1 4 9 16 25
tclsh> # Retrieving the second column from a matrix
tclsh> # given as list of lists.
tclsh> proc projection {n list} {::lindex $list $n}
tclsh> ::struct::list map {{a b c} {1 2 3} {d f g}} {projection 1}
b 2 f
::struct::list mapfor var sequence script
The subcommand takes a sequence to operate on and a tcl script,
applies the script to each element of the sequence and returns a
sequence consisting of the results of that application.
The script will be evaluated as is, and has access to the cur‐
rent list element through the specified iteration variable var.
The evaluation takes place in the context of the caller of the
subcommand.
tclsh> # squaring all elements in a list
tclsh> ::struct::list mapfor x {1 2 3 4 5} {
expr {$x * $x}
}
1 4 9 16 25
tclsh> # Retrieving the second column from a matrix
tclsh> # given as list of lists.
tclsh> ::struct::list mapfor x {{a b c} {1 2 3} {d f g}} {
lindex $x 1
}
b 2 f
::struct::list filter sequence cmdprefix
The subcommand takes a sequence to operate on and a command pre‐
fix (cmdprefix) specifying an operation, applies the command
prefix to each element of the sequence and returns a sequence
consisting of all elements of the sequence for which the command
prefix returned true. In other words, this command filters out
all elements of the input sequence which fail the test the cmd‐
prefix represents, and returns the remaining elements.
The command prefix will be evaluated with a single word appended
to it. The evaluation takes place in the context of the caller
of the subcommand.
tclsh> # removing all odd numbers from the input
tclsh> proc even {x} {expr {($x % 2) == 0}}
tclsh> ::struct::list filter {1 2 3 4 5} even
2 4
Note: The filter is a specialized application of fold where the
result is extended with the current item or not, depending o
nthe result of the test.
::struct::list filterfor var sequence expr
The subcommand takes a sequence to operate on and a tcl expres‐
sion (expr) specifying a condition, applies the conditionto each
element of the sequence and returns a sequence consisting of all
elements of the sequence for which the expression returned true.
In other words, this command filters out all elements of the
input sequence which fail the test the condition expr repre‐
sents, and returns the remaining elements.
The expression will be evaluated as is, and has access to the
current list element through the specified iteration variable
var. The evaluation takes place in the context of the caller of
the subcommand.
tclsh> # removing all odd numbers from the input
tclsh> ::struct::list filterfor x {1 2 3 4 5} {($x % 2) == 0}
2 4
::struct::list split sequence cmdprefix ?passVar failVar?
This is a variant of method filter, see above. Instead of
returning just the elements passing the test we get lists of
both passing and failing elements.
If no variable names are specified then the result of the com‐
mand will be a list containing the list of passing elements, and
the list of failing elements, in this order. Otherwise the lists
of passing and failing elements are stored into the two speci‐
fied variables, and the result will be a list containing two
numbers, the number of elements passing the test, and the number
of elements failing, in this order.
The interface to the test is the same as used by filter.
::struct::list fold sequence initialvalue cmdprefix
The subcommand takes a sequence to operate on, an arbitrary
string initial value and a command prefix (cmdprefix) specifying
an operation.
The command prefix will be evaluated with two words appended to
it. The second of these words will always be an element of the
sequence. The evaluation takes place in the context of the call‐
er of the subcommand.
It then reduces the sequence into a single value through
repeated application of the command prefix and returns that
value. This reduction is done by
1 Application of the command to the initial value and the
first element of the list.
2 Application of the command to the result of the last call
and the second element of the list.
i
Application of the command to the result of the
last call and the i'th element of the list.
end
Application of the command to the
result of the last call and the last
element of the list. The result of
this call is returned as the result
of the subcommand.
tclsh> # summing the elements in a list.
tclsh> proc + {a b} {expr {$a + $b}}
tclsh> ::struct::list fold {1 2 3 4 5} 0 +
15
::struct::list shift listvar
The subcommand takes the list con‐
tained in the variable named by
listvar and shifts it down one ele‐
ment. After the call listvar will
contain a list containing the second
to last elements of the input list.
The first element of the ist is
returned as the result of the com‐
mand. Shifting the empty list does
nothing.
::struct::list iota n
The subcommand returns a list con‐
taining the integer numbers in the
range [0,n). The element at index i
of the list contain the number i.
For "n == 0" an empty list will be
returned.
::struct::list equal a b
The subcommand compares the two
lists a and b for equality. In other
words, they have to be of the same
length and have to contain the same
elements in the same order. If an
element is a list the same defini‐
tion of equality applies recur‐
sively.
A boolean value will be returned as
the result of the command. This
value will be true if the two lists
are equal, and false else.
::struct::list repeat size element1 ?ele‐
ment2 element3...?
The subcommand creates a list of
length "size * number of elements"
by repeating size times the sequence
of elements element1 element2 ....
size must be a positive integer,
elementn can be any Tcl value. Note
that repeat 1 arg ... is identical
to list arg ..., though the arg is
required with repeat.
Examples:
tclsh> ::struct::list repeat 3 a
a a a
tclsh> ::struct::list repeat 3 [::struct::list repeat 3 0]
{0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeat 3 a b c
a b c a b c a b c
tclsh> ::struct::list repeat 3 [::struct::list repeat 2 a] b c
{a a} b c {a a} b c {a a} b c
::struct::list repeatn value size...
The subcommand creates a (nested)
list containing the value in all
positions. The exact size and degree
of nesting is determined by the size
arguments, all of which have to be
integer numbers greater than or
equal to zero.
A single argument size which is a
list of more than one element will
be treated as if more than argument
size was specified.
If only one argument size is present
the returned list will not be
nested, of length size and contain
value in all positions. If more
than one size argument is present
the returned list will be nested,
and of the length specified by the
last size argument given to it. The
elements of that list are defined as
the result of Repeat for the same
arguments, but with the last size
value removed.
An empty list will be returned if no
size arguments are present.
tclsh> ::struct::list repeatn 0 3 4
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeatn 0 {3 4}
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeatn 0 {3 4 5}
{{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}}
::struct::list dbJoin
?-inner|-left|-right|-full? ?-keys varname?
{keycol table}...
The method performs a table join
according to relational algebra. The
execution of any of the possible
outer join operation is triggered by
the presence of either option -left,
-right, or -full. If none of these
options is present a regular inner
join will be performed. This can
also be triggered by specifying
-inner. The various possible join
operations are explained in detail
in section TABLE JOIN.
If the -keys is present its argument
is the name of a variable to store
the full list of found keys into.
Depending on the exact nature of the
input table and the join mode the
output table may not contain all the
keys by default. In such a case the
caller can declare a variable for
this information and then insert it
into the output table on its own, as
she will have more information about
the placement than this command.
What is left to explain is the for‐
mat of the arguments.
The keycol arguments are the indices
of the columns in the tables which
contain the key values to use for
the joining. Each argument applies
to the table following immediately
after it. The columns are counted
from 0, which references the first
column. The table associated with
the column index has to have at
least keycol+1 columns. An error
will be thrown if there are less.
The table arguments represent a ta‐
ble or matrix of rows and columns of
values. We use the same representa‐
tion as generated and consumed by
the methods get rect and set rect of
matrix objects. In other words, each
argument is a list, representing the
whole matrix. Its elements are
lists too, each representing a sin‐
gle rows of the matrix. The elements
of the row-lists are the column val‐
ues.
The table resulting from the join
operation is returned as the result
of the command. We use the same rep‐
resentation as described above for
the input tables.
::struct::list dbJoinKeyed
?-inner|-left|-right|-full? ?-keys varname?
table...
The operations performed by this
method are the same as described
above for dbJoin. The only differ‐
ence is in the specification of the
keys to use. Instead of using column
indices separate from the table here
the keys are provided within the ta‐
ble itself. The row elements in each
table are not the lists of column
values, but a two-element list where
the second element is the regular
list of column values and the first
element is the key to use.
::struct::list swap listvar i j
The subcommand exchanges the ele‐
ments at the indices i and j in the
list stored in the variable named by
listvar. The list is modified in
place, and also returned as the
result of the subcommand.
::struct::list firstperm list
This subcommand returns the lexico‐
graphically first permutation of the
input list.
::struct::list nextperm perm
This subcommand accepts a permuta‐
tion of a set of elements (provided
by perm) and returns the next permu‐
tatation in lexicographic sequence.
The algorithm used here is by Donal
E. Knuth, see section REFERENCES for
details.
::struct::list permutations list
This subcommand returns a list con‐
taining all permutations of the
input list in lexicographic order.
::struct::list foreachperm var list body
This subcommand executes the script
body once for each permutation of
the specified list. The permutations
are visited in lexicographic order,
and the variable var is set to the
permutation for which body is cur‐
rently executed. The result of the
loop command is the empty string.
LONGEST COMMON SUBSEQUENCE AND FILE COMPARISON
The longestCommonSubsequence subcommand
forms the core of a flexible system for
doing differential comparisons of files,
similar to the capability offered by the
Unix command diff. While this procedure is
quite rapid for many tasks of file compari‐
son, its performance degrades severely if
sequence2 contains many equal elements (as,
for instance, when using this procedure to
compare two files, a quarter of whose lines
are blank. This drawback is intrinsic to
the algorithm used (see the Reference for
details).
One approach to dealing with the perfor‐
mance problem that is sometimes effective
in practice is arbitrarily to exclude ele‐
ments that appear more than a certain num‐
ber of times. This number is provided as
the maxOccurs parameter. If frequent lines
are excluded in this manner, they will not
appear in the common subsequence that is
computed; the result will be the longest
common subsequence of infrequent elements.
The procedure longestCommonSubsequence2
implements this heuristic. It functions as
a wrapper around longestCommonSubsequence;
it computes the longest common subsequence
of infrequent elements, and then subdivides
the subsequences that lie between the
matches to approximate the true longest
common subsequence.
TABLE JOIN
This is an operation from relational alge‐
bra for relational databases.
The easiest way to understand the regular
inner join is that it creates the cartesian
product of all the tables involved first
and then keeps only all those rows in the
resulting table for which the values in the
specified key columns are equal to each
other.
Implementing this description naively, i.e.
as described above will generate a huge
intermediate result. To avoid this the
cartesian product and the filtering of row
are done at the same time. What is required
is a fast way to determine if a key is
present in a table. In a true database this
is done through indices. Here we use arrays
internally.
An outer join is an extension of the inner
join for two tables. There are three vari‐
ants of outerjoins, called left, right, and
full outer joins. Their result always con‐
tains all rows from an inner join and then
some additional rows.
[1] For the left outer join the addi‐
tional rows are all rows from the
left table for which there is no key
in the right table. They are joined
to an empty row of the right table
to fit them into the result.
[2] For the right outer join the addi‐
tional rows are all rows from the
right table for which there is no
key in the left table. They are
joined to an empty row of the left
table to fit them into the result.
[3] The full outer join combines both
left and right outer join. In other
words, the additional rows are as
defined for left outer join, and
right outer join, combined.
We extend all the joins from two to n
tables (n > 2) by executing
(...((table1 join table2) join table3) ...) join tableN
Examples for all the joins:
Inner Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} inner join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver}
Left Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} left outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
Right Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} right outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {{} {} 3 driver}
Full Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} full outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
{{} {} 3 driver}
REFERENCES
[1] J. W. Hunt and M. D. McIlroy, "An
algorithm for differential file com‐
parison," Comp. Sci. Tech. Rep. #41,
Bell Telephone Laboratories (1976).
Available on the Web at the second
author's personal site:
http://www.cs.dartmouth.edu/~doug/
[2] Donald E. Knuth, "Fascicle 2b of
'The Art of Computer Programming'
volume 4". Available on the Web at
the author's personal site:
http://www-cs-faculty.stan‐
ford.edu/~knuth/fasc2b.ps.gz.
BUGS, IDEAS, FEEDBACK
This document, and the package it
describes, will undoubtedly contain bugs
and other problems. Please report such in
the category struct :: list of the Tcllib
SF Trackers [http://source‐
forge.net/tracker/?group_id=12883]. Please
also report any ideas for enhancements you
may have for either package and/or documen‐
tation.
KEYWORDS
assign, common, comparison, diff, differen‐
tial, equal, equality, filter, first permu‐
tation, flatten, folding, full outer join,
generate permutations, inner join, join,
left outer join, list, longest common sub‐
sequence, map, next permutation, outer
join, permutation, reduce, repeating, repe‐
tition, reverse, right outer join, subse‐
quence, swapping
CATEGORY
Data structures
COPYRIGHT
Copyright (c) 2003-2005 by Kevin B. Kenny. All rights reserved
Copyright (c) 2003-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>
struct 1.7 struct::list(n)