DTDReader(3) User Contributed Perl Documentation DTDReader(3)NAMEXML::Simple::DTDReader - Simple XML file reading based on their DTDs
SYNOPSIS
use XML::Simple::DTDReader;
my $ref = XMLin("data.xml");
Or the object oriented way:
require XML::Simple::DTDReader;
my $xsd = XML::Simple::DTDReader->new;
my $ref = $xsd->XMLin("data.xml");
DESCRIPTIONXML::Simple::DTDReader aims to be a XML::Simple drop-in replacement,
but with several aspects of the module controlled by the XML's DTD.
Specifically, array folding and array forcing are inferred from the
DTD.
Currently, only "XMLin" is supported; support for "XMLout" is planned
for later releases.
XMLin()
Parses XML formatted data and returns a reference to a data structure
which contains the same information in a more readily accessible form.
(Skip down to "EXAMPLES" for sample code). The XML must have a valid
<!DOCTYPE> element.
"XMLin()" accepts an optional XML specifier, which can be one of the
following:
A filename
If the filename contains no directory components "XMLin()" will
look for the file in the current directory. Note, the filename '-'
can be used to parse from STDIN. eg:
$ref = XMLin('/etc/params.xml');
undef
If there is no XML specifier, "XMLin()" will check the script
directory for a file with the same name as the script but with the
extension '.xml'. eg:
$ref = XMLin();
A string of XML
A string containing XML (recognized by the presence of '<' and '>'
characters) will be parsed directly. eg:
$ref = XMLin('<opt username="bob" password="flurp" />');
An IO::Handle object
An IO::HAndle object will be read to EOF and its contents parsed.
eg:
$fh = new IO::File('/etc/params.xml');
$ref = XMLin($fh);
OPTIONS
Currently, none of XML::Simple's myriad of options are supported.
Support for "ContentKey", "ForceContent", "KeepRoot", "SearchPath", and
"ValueAttr" are planned for future releases.
DTD CONFIGURATIONXML::Simple::DTDReader is able to deal with inline and external DTDs.
Inline DTDs take the form:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE greeting [
<!ELEMENT greeting (#PCDATA)>
]>
<greeting>Hello, world!</greeting>
External DTDs are either "system" DTDs or "public" DTDs. System DTDs
are of the form:
<?xml version="1.0"?>
<!DOCTYPE greeting SYSTEM "hello.dtd">
<greeting>Hello, world!</greeting>
The path in the external system identifier "hello.dtd" is relative to
the path to the XML file in question, or to the current working
directory if the XML does not come from a file, or the path to the file
cannot be determined.
Public DTDs take the form:
<?xml version="1.0"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg>
<path d="M202,702l1,-3l7,-3l3,1l3,7l-1,3l-7,4l-3,-1l-3,-8z" />
</svg>
Two properties of the DTD are used by XML::Simple::DTDReader when
determining the final structure of the data; repeated elements, and ID
attributes. In the DTD, specifications of the form "element+" or
"element*" will lead to the key "element" mapping to an anonymous
array. This is perhaps best illustrated with an example:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE data [
<!ELEMENT data (stuff+)>
<!ELEMENT stuff (name,other*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT other (#PCDATA)>
]>
<data>
<stuff>
<name>Moose</name>
<other>Value</other>
</stuff>
<stuff>
<name>Thingy</name>
<other>Value</other>
<other>Value2</other>
</stuff>
</data>
...will map to the data structure:
{
stuff => [
{
name => "Moose",
other => ["Value"],
},
{
name => "Thingy",
other => ["Value", "Value2"],
}
]
}
The other element of the DTD that impacts the data structure is ID
attributes. In XML, ID attributes are unique across a file, which is a
more general case of Perl's restriction that keys be unique in a hash.
Hence, the presence of attributes of type ID will cause that layer of
the data to be folded into a hash, based on the value of the ID
attribute as the key. This is again, best illustrated by example:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE data [
<!ELEMENT data (stuff+)>
<!ELEMENT stuff (name)>
<!ATTLIST stuff attrib ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
]>
<data>
<stuff attrib="first">
<name>Moose</name>
</stuff>
<stuff attrib="second">
<name>Thingy</name>
</stuff>
</data>
...will lead to the data structure:
{
stuff => {
first => {
name => "Moose",
attrib => "first"
},
second => {
name => "Thingy",
attrib => "second"
}
}
}
XML::Simple::DTDReader recognizes most ELEMENT types, with the
exception of mixed data (#PCDATA intermixed with elements) or ANY data.
Attempts to parse DTDs describing elements with these types will result
in an error.
ERROR HANDLINGXML::Simple::DTDReader is more strict than XML::Simple in parsing of
documents; not only must the documents be compliant, they must also
follow the DTD specified. XML::Simple::DTDReader will die with an
appropriate message if it encounters a parsing of validation error.
EXAMPLES
See the "t/" directory of the distribution for a number of example XML
files, and the perl data structures they map to.
BUGS
None currently known, but I'm sure there are several.
AUTHOR
Contact Info
Alex Vandiver : alexmv@mit.edu
Copyright
Copyright (C) 2003 Alex Vandiver. All rights reserved. This package
is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.
perl v5.14.1 2005-07-30 DTDReader(3)