Writer(3) User Contributed Perl Documentation Writer(3)NAMEXML::SAX::Writer - SAX2 Writer
SYNOPSIS
use XML::SAX::Writer;
use XML::SAX::SomeDriver;
my $w = XML::SAX::Writer->new;
my $d = XML::SAX::SomeDriver->new(Handler => $w);
$d->parse('some options...');
DESCRIPTION
Why yet another XML Writer ?
A new XML Writer was needed to match the SAX2 effort because quite
naturally no existing writer understood SAX2. My first intention had
been to start patching XML::Handler::YAWriter as it had previously been
my favourite writer in the SAX1 world.
However the more I patched it the more I realised that what I thought
was going to be a simple patch (mostly adding a few event handlers and
changing the attribute syntax) was turning out to be a rewrite due to
various ideas I'd been collecting along the way. Besides, I couldn't
find a way to elegantly make it work with SAX2 without breaking the
SAX1 compatibility which people are probably still using. There are of
course ways to do that, but most require user interaction which is
something I wanted to avoid.
So in the end there was a new writer. I think it's in fact better this
way as it helps keep SAX1 and SAX2 separated.
METHODS
· new(%hash)
This is the constructor for this object. A It takes a number of
parameters, all of which are optional.
· -- Output
This parameter can be one of several things. A If it is a simple
scalar, it is interpreted as a filename which will be opened for
writing. A If it is a scalar reference, output will be appended to
this scalar. A If it is an array reference, output will be pushed
onto this array as it is generated. A If it is a filehandle, then
output will be sent to this filehandle.
Finally, it is possible to pass an object for this parameter, in
which case it is assumed to be an object that implements the
consumer interface described later in the documentation.
If this parameter is not provided, then output is sent to STDOUT.
· -- Escape
This should be a hash reference where the keys are characters
sequences that should be escaped and the values are the escaped
form of the sequence. A By default, this module will escape the
ampersand (&), less than (<), greater than (>), double quote ("),
and apostrophe ('). Note that some browsers don't support the
' escape used for apostrophes so that you should be careful
when outputting XHTML.
If you only want to add entries to the Escape hash, you can first
copy the contents of %XML::SAX::Writer::DEFAULT_ESCAPE.
· -- CommentEscape
Comment content often needs to be escaped differently from other
content. This option works exactly as the previous one except that
by default it only escapes the double dash (--) and that the
contents can be copied from %XML::SAX::Writer::COMMENT_ESCAPE.
· -- EncodeFrom
The character set encoding in which incoming data will be provided.
This defaults to UTF-8, which works for US-ASCII as well.
· -- EncodeTo
The character set encoding in which output should be encoded.
A Again, this defaults to UTF-8.
THE CONSUMER INTERFACEXML::SAX::Writer can receive pluggable consumer objects that will be in
charge of writing out what is formatted by this module. Setting a
Consumer is done by setting the Output option to the object of your
choice instead of to an array, scalar, or file handle as is more
commonly done (internally those in fact map to Consumer classes and and
simply available as options for your convienience).
If you don't understand this, don't worry. You don't need it most of
the time.
That object can be from any class, but must have two methods in its
API. It is also strongly recommended that it inherits from
XML::SAX::Writer::ConsumerInterface so that it will not break if that
interface evolves over time. There are examples at the end of
XML::SAX::Writer's code.
The two methods that it needs to implement are:
· output STRING
(Required)
This is called whenever the Writer wants to output a string
formatted in XML. Encoding conversion, character escaping, and
formatting have already taken place. It's up to the consumer to do
whatever it wants with the string.
· finalize()
(Optional)
This is called once the document has been output in its entirety,
during the end_document event. end_document will in fact return
whatever finalize() returns, and that in turn should be returned by
parse() for whatever parser was invoked. It might be useful if you
need to provide feedback of some sort.
Here's an example of a custom consumer. Note the extra "$" signs in
front of $self; the base class is optimized for the overwhelmingly
common case where only one data member is required and $self is a
reference to that data member.
package MyConsumer;
@ISA = qw( XML::SAX::Writer::ConsumerInterface );
use strict;
sub new {
my $self = shift->SUPER::new( my $output );
$$self = ''; # Note the extra '$'
return $self;
}
sub output {
my $self = shift;
$$self .= uc shift;
}
sub get_output {
my $self = shift;
return $$self;
}
And here's one way to use it:
my $c = MyConsumer->new;
my $w = XML::SAX::Writer->new( Output => $c );
## ... send events to $w ...
print $c->get_output;
If you need to store more that one data member, pass in an array or
hash reference:
my $self = shift->SUPER::new( {} );
and access it like:
sub output {
my $self = shift;
$$self->{Output} .= uc shift;
}
THE ENCODER INTERFACE
Encoders can be plugged in to allow one to use one's favourite encoder
object. Presently there are two encoders: Iconv and NullEncoder, and
one based on "Encode" ought to be out soon. They need to implement two
methods, and may inherit from XML::SAX::Writer::NullConverter if they
wish to
new FROM_ENCODING, TO_ENCODING
Creates a new Encoder. The arguments are the chosen encodings.
convert STRING
Converts that string and returns it.
CUSTOM OUTPUT
This module is generally used to write XML -- which it does most of the
time -- but just like the rest of SAX it can be used as a generic
framework to output data, the opposite of a non-XML SAX parser.
Of course there's only so much that one can abstract, so depending on
your format this may or may not be useful. If it is, you'll need to
know the followin API (and probably to have a look inside
"XML::SAX::Writer::XML", the default Writer).
init
Called before the writing starts, it's a chance for the subclass to
do some initialisation if it needs it.
setConverter
This is used to set the proper converter for character encodings.
The default implementation should suffice but you can override it.
It must set "$self-"{Encoder}> to an Encoder object. Subclasses
*should* call it.
setConsumer
Same as above, except that it is for the Consumer object, and that
it must set "$self-"{Consumer}>.
setEscaperRegex
Will initialise the escaping regex "$self-"{EscaperRegex}> based on
what is needed.
escape STRING
Takes a string and escapes it properly.
setCommentEscaperRegex and escapeComment STRING
These work exactly the same as the two above, except that they are
meant to operate on comment contents, which often have different
escaping rules than those that apply to regular content.
TODO
- proper UTF-16 handling
- make the quote character an option. By default it is here ', but
I know that a lot of people (for reasons I don't understand but
won't question :-) prefer to use ". (on most keyboards " is more
typing, on the rest it's often as much typing).
- the formatting options need to be developed.
- test, test, test (and then some tests)
- doc, doc, doc (actually this part is in better shape)
- add support for Perl 5.7's Encode module so that we can use it
instead of Text::Iconv. Encode is more complete and likely to be
better supported overall. This will be done using a pluggable
encoder (so that users can provide their own if they want to)
and detecter both in Makefile.PL requirements and in the module
at runtime.
- remove the xml_decl and replace it with intelligent logic, as
discussed on perl-xml
- make a the Consumer selecting code available in the API, to avoid
duplicating
- add an Apache output Consumer, triggered by passing $r as Output
CREDITS
Michael Koehne (XML::Handler::YAWriter) for much inspiration and Barrie
Slaymaker for the Consumer pattern idea, the coderef output option and
miscellaneous bugfixes and performance tweaks. Of course the usual
suspects (Kip Hampton and Matt Sergeant) helped in the usual ways.
AUTHOR
Robin Berjon, robin@knowscape.com
COPYRIGHT
Copyright (c) 2001-2006 Robin Berjon nad Perl XML project. All rights
reserved. This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.
SEE ALSO
XML::SAX::*
POD ERRORS
Hey! The above document had some coding errors, which are explained
below:
Around line 417:
Expected '=item *'
Around line 433:
Expected '=item *'
Around line 445:
Expected '=item *'
Around line 452:
Expected '=item *'
Around line 457:
Expected '=item *'
perl v5.14.1 2011-07-19 Writer(3)