HTML_FMT(1) User Contributed Perl Documentation HTML_FMT(1)NAME
"html_fmt" - Reformat HTML, indented according to structure
SYNOPSIShtml_fmt [uri|file]
EXAMPLEhtml_fmt http://perl.org
DESCRIPTION
Given the URI or the name of a file, writes it to "STDOUT" reformatted
and indented according to the HTML structure. Missing start and end
tags are supplied and comments added to indicate this. Text inside
"<pre>" elements is not altered.
html_fmt tries to parse everything that is actually out there on the
Web. In fact, html_fmt will assume any file fed to it was intended as
HTML, and will produce its best guess of the author's intent.
html_fmt supplies missing start and end tags. html_fmt's parser is
extremely liberal in what it accepts. When its liberalization of the
standards is not sufficient to make a document into valid HTML,
html_fmt will pick characters to treat as noise or "cruft". The parser
ignores cruft in determining the structure of the document.
When html_fmt adds a missing start tag, it precedes the new start tag
with a comment. When html_fmt adds a missing end tag, it follows the
new end tag with a comment. When html_fmt classifies characters as
"cruft", it adds a comment to that effect before the "cruft".
"pre" elements receive special treatment. The contents of "pre"
elements are not reformatted. When missing tags or cruft occur inside
a "pre" element, the comments to that effect are placed before the
"<pre>" start tag.
The argument to html_score can be either as a URI or a file name. If
it starts with alphanumerics followed by a colon, it is treated as a
URI. Otherwise it is treated as file name.
SAMPLE OUTPUT
Given this input:
<title>Test page<tr>x<head attr="I am cruft"><p>Final graf
html_fmt returns
<!-- Following start tag is replacement for a missing one -->
<html>
<!-- Following start tag is replacement for a missing one -->
<head>
<title>
Test page
</title>
<!-- Preceding end tag is replacement for a missing one -->
</head>
<!-- Preceding end tag is replacement for a missing one -->
<!-- Following start tag is replacement for a missing one -->
<body>
<!-- Following start tag is replacement for a missing one -->
<table>
<!-- Following start tag is replacement for a missing one -->
<tbody>
<tr>
<!-- Following start tag is replacement for a missing one -->
<td>
x
<!-- Next line is cruft -->
<head attr="I am cruft">
<p>
Final graf
</p>
<!-- Preceding end tag is replacement for a missing one -->
</td>
<!-- Preceding end tag is replacement for a missing one -->
</tr>
<!-- Preceding end tag is replacement for a missing one -->
</tbody>
<!-- Preceding end tag is replacement for a missing one -->
</table>
<!-- Preceding end tag is replacement for a missing one -->
</body>
<!-- Preceding end tag is replacement for a missing one -->
</html>
<!-- Preceding end tag is replacement for a missing one -->
PURPOSE
This program is a demo of a demo. It purpose is to show how easy it is
to write applications which look at the structure of web pages using
Marpa::HTML. And the purpose of Marpa::HTML is to demonstrate the
power of its parse engine, Marpa. Marpa::HTML was written in a few
days, and its logic is a straightforward, natural expression of the
structure of HTML.
ACKNOWLEDGMENTS
The starting template for this code was HTML::TokeParser, by Gisle Aas.
See also the acknowledgments for Marpa as a whole.
LICENSE AND COPYRIGHT
Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free
software under the Perl license. For details see the LICENSE file in
the Marpa distribution.
perl v5.20.2 2015-09-16 HTML_FMT(1)