HTML::PrettyPrinter man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

PrettyPrinter(3)      User Contributed Perl Documentation     PrettyPrinter(3)

NAME
	HTML::PrettyPrinter - generate nice HTML files from HTML syntax trees

SYNOPSIS
	 use HTML::TreeBuilder;
	 # generate a HTML syntax tree
	 my $tree = new HTML::TreeBuilder;
	 $tree->parse_file($file_name);
	 # modify the tree if you want

	 use HTML::PrettyPrinter;
	 my $hpp = new HTML::PrettyPrinter ('linelength' => 130,
					    'quote_attr' => 1);
	 # configure
	 $tree->address("0.1.0")->attr(_hpp_indent,0);	  # for an individual element
	 $hpp->set_force_nl(1,qw(body head));		  # for tags
	 $hpp->set_force_nl(1,qw(@SECTIONS));		  # as above
	 $hpp->set_nl_inside(0,'default!');		  # for all tags

	 # format the source
	 my $linearray_ref = $hpp->format($tree);
	 print @$linearray_ref;

	 # alternative: print directly to filehandle
	 use FileHandle;
	 my $fh = new FileHandel ">$filenaem2";
	 if (defined $fh) {
	   $hpp->select($fh);
	   $hpp->format();
	   undef $fh;
	   $hpp->select(undef),
	 }

DESCRIPTION
       HTML::PrettyPrinter produces nicely formatted HTML code from a HTML
       syntax tree. It is especially usefull if the produced HTML file shall
       be read or edited manually afterwards. Various parameters let you adapt
       the output to different styles and requirements.

       If you don't care how the HTML source looks like as long as it is valid
       and readable by browsers, you should use the as_HTML() method of
       HTML::Element instead of the pretty printer. It is about five times
       faster.

       The pretty printer will handle line wrapping, indention and structuring
       by the way the whitespace in the tree is represented in the output.
       Furthermore upper/lowercase markup and markup minimization, quoting of
       attribute values, the encoding of entities and the presence of optional
       end tags are configurable.

       There are two types of parameters to influence the output, individual
       parameters that are set on a per element and per tag basis and common
       parameters that are set only once for each instance of a pretty
       printer.

       In order to faciliate the configuration a mechanism to handle tag
       groups is provided. Thus, it is possible to modify a parameter for a
       group of tags (e.g. all known block elements) without writing each tag
       name explicitly.	 Perhaps the code for tag groups will move to an other
       Perl module in the future.

       For HTML::Elements that require a special treatment like <PRE>, <XMP>,
       <SCRIPT>, comments and declarations, pretty printer will fall back to
       the method "as_HTML()" of the HTML elements.

INDIVIDUAL PARAMETERS
       Following individual paramters exist

       indent n
	   The indent of new lines inside the element is increased by n
	   coloumns. Default is 2 for all tags.

       skip bool
	   If true, the element and its content is skipped from output.
	   Default is false.

       nl_before n
	   Number of newlines before the start tag. Default is 0 for inline
	   elements and 1 for other elements.

       nl_inside n
	   Number of newlines between the tags and the contents of an element.
	   Default is 0.

       nl_after n
	   Number of newlines after an element. Default is 0 for inline
	   elements and 1 for other elements.

       force_nl bool
	   Force linebreaks before and after an element even if the HTML tree
	   does not contain whitespace at this place. Default is false for
	   inline elements and true for all other elements. This parameter is
	   superseded if the common parameter allow_forced_nl is set to false.

       endtag bool
	   Print an optional endtag. Default is true.

   Access Methods
       Following access methods exist for each individual paramenter.  Replace
       parameter by the respective name.

       $hpp->parameter($element)
	   Takes a reference to an HTML element as argument. Returns the value
	   of the parameter for that element. The priority to retrieve the
	   value is:

	   1.  The value of the element's internal attribute "_hpp_parameter".

	   2.  The value specified inside the pretty printer for the tag of
	       the element.

	   3.  The value specified inside the pretty printer for 'default!'.

       $hpp->parameter('tag')
	   Like "parameter($element)", except that only priorities 2 and 3 are
	   evaluated.

       $hpp->set_parameter($value,'tag1','tag2',...)
	   Sets the parameter for each tag in the list to $value.

	   If $value is undefined, the entries for the tags are deleted.

	   Beside individual tags the list may include tag groups like
	   '@BLOCK' (see below) and '"default!"'. Individual tag names are
	   written in lower case, the names of tag groups start with an '@'
	   and are written in upper case letters. Tag groups are expanded
	   during the call of "set_parameter()".  '"default!"' sets the
	   default value, which is retrived if no value is defined for the
	   individual element or tag.

       $hpp->set_parameter($value,'all!')
	   Deletes all existing settings for parameter inside the pretty
	   printer and sets the default to $value..

COMMON PARAMETERS
       tabify n
	   If non zero, each n spaces at the beginnig of a line are converted
	   into one TAB. Default is 8.

       linelength n
	   The maximum number of character a line should have. Default is 80.

	   The linelength may be exceeded if there is no proper way to break a
	   line without modifying the content, e.g. inside <PRE> and other
	   special elements or if there is no whitespace.

       min_bool_attr bool
	   Minimize boolean attributes, e.g. print <UL COMPACT> instead of <UL
	   COMPACT=COMPACT>. Default is true.

       quote_attr bool
	   Always quote attribute values. If false, attribute values
	   consisting entirely of letters, digits, periods and hyphens only
	   are not put into quotes. Default is false.

       entities string
	   The string contains all characters that are escaped to their entity
	   names.  Default is the bare minimum of "&<>" plus the non breaking
	   space 'nbsp' (because otherwise it is difficult for the human eye
	   to distiguish it from a normal space in most editors).

       wrap_at_tagend NEVER|AFTER_ATTR|ALWAYS
	   May pretty printer wrap lines before the closing ankle of a start
	   tag?	 Supported values are the predifined constants NEVER (allow
	   line wraps at white space only ), AFTER_ATTR (allow line wraps at
	   the end of tags that contain attributes only) and ALWAYS (allow
	   line wraps at the end of every start tag). Default is AFTER_ATTR.

       allow_forced_nl bool
	   Allow the addition of white space, that is not in the HTML tree.
	   If set to false (the default) the force_nl parameter is ignored.
	   It is recomended to set this parameter to true if the HTML tree was
	   generated with ignore_ignorable_whitespace set to true.

       uppercase bool
	   Use uppercase letters for markup. Default is the value of
	   $HTML::Element::html_uc at the time the constructor is called.

   Access Method
       $hpp->paramter([value])
	   Retrieves and optionaly sets the parameter.

OTHER METHODS
       $hpp = HTML::PrettyPrinter->new(%common_paremeters)
	   This class method creates a new HTML::PrettyPrinter and returns it.
	   Key/value pair arguments may be provided to overwrite the default
	   settings of common parameters. There is currently no mechanism to
	   overwrite the default values for individual parameters at
	   construction. Use the "$hpp-"set_parameter()> methods instead.

       $hpp->select($fh)
	   Select a FileHandle object for output.

	   If a FileHandle is selected the generated HTML is printed directly
	   to that file. With $hpp->select(undef) you can switch back to the
	   default behaviour.

       $line_array_ref = $hpp->format($tree,[$indent],[$line_array_ref])
	   Format the HTML syntax (sub-) tree.

	   $tree is not restricted to the root of the HTML syntax tree. A
	   reference to any HTML::Element will do.

	   The optional $indent indents the first element by n characters

	   Return value is the reference to an array with the generated lines.
	   If such a reference is provided as third argument, the lines will
	   be appended to that array. Otherwise a new array will be created.

	   If a FileHandle is selected by a previous call of the
	   "$hpp-"select($fh)> method, the lines are printed to the FileHandle
	   object directly.  The array of lines is not changed in this case.

TAG GROUPS
       Tag groups are lists that contain the names of tags and other tag
       groups which are considered as subsets. This reflects the way allowed
       content is specified in HTML DTDs, where e.g. %flow consists of all
       %block and %inline elements and %inline covers several subsets like
       %phrase.

       If you add a tag name to a group A, it will be seen in any group that
       contains group A. Thus, it is easy to maintain groups of tags with
       similar properties. (and configure HTML pretty printer for these tags).

       The names of tag groups are written in upper case letters with a
       leading '@' (e.g. '@BLOCK'). The names of simple tags are written all
       lower case.

   Functions
       All the functions to handle and modify tag groups are included in the
       @EXPORT_OK list of "HTML::PrettyPrinter".

       @tag_groups = list_groups()
	   Returns a list with the names of all defined tag groups

       @tags = group_expand('tag_or_tag_group0',['tag_or_tag_group1',...])
	   Returns a list of every tag in the tag groups and their subgroups
	   Each tag is listed once only. The order of the list is not
	   specified.

       @tag_groups = sub_group('tag_group0',['tag_group1',...])
	   Returns a list of every tag group and sub group in the list.	 Each
	   group is listed once only. The order of the list is not specified.

       group_get('@NAME')
	   Return the (unexpanded) contents of a tag group.

       "group_set('@NAME',['tag_or_tag_group0',...])"
	   Set a tag group.

       "group_add('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
	   Add tags and tag groups to a group.

       "group_remove('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
	   Remove tags or tag groups from a group. Subgroups are not expanded.
	   Thus, "group_remove('@A','@B')" will remove '@B' from '@A' if it is
	   included directly. Tags included in '@B' will not be removed from
	   '@A'.  Nor will '@A' be changed if '@B' is included in a aubgroup
	   of '@A' but not in '@A' directly.

   Predefined Tag Groups
       There are a couple of predefined tag groups. Use "  foreach my $tg
       (list_groups()) {
	   printA "'$tg'A =>A qw(".join(',',group_get($tg)).")\n";
	 } " to get a list.

   Examples for tag groups
       1. create some groups
	   "
	     group_set('@A',qw(a1 a2 a3));
	     group_set('@B',qw(b1 b2));
	     group_set('@C',qw(@A @B c1 @D));
	     # @D needs to be defined when @C is expannded
	     group_set('@D',qw(d1 @B));
	     group_set('@E',qw(e1 @D));
	     group_set('@F',qw(f1 @A)); "

       2. add tags
	   "
	     group_add('@A',qw(a4 a5)); # @A contains (a1 a2 a3 a4 a5)
	     group_add('@D',qw(d1));	# @D contains (d1 @B d1)
	     group_add('@F',group_exapand('@B'),'@F');
	     # @F contains (f1 @A b1 b2 f1 @F) "

       3. evaluate
	   "
	     group_exapand('@E');    # returns e1, d1, b1, b2
	     sub_groups('@E');	     # returns @B, @D
	     sub_groups(qw(@E @F));  # returns @A, @B, @D
	     group_get('@F'));	     # returns f1, @A, b1, b2, f1, @F "

       4. remove tags
	   "
	     group_remove('@E','@C');  # @E not changed, because it doesn't
	   contain @C
	     group_remove('@E','@D');  # @D removed from @E
	     group_remove('@D','d1');  # all d1's are removed. Now @D contains
	   @B only
	     group_remove('@C','@B');  # @C now contains (@a c1 @D), Thus
	     sub_groups('@C');	       # still returns @A, @B, @D,
				       # because @B is included in @D, too "

       5. application
	   "
	     # set the indent for tags b1, b2, e1, g1 to 0
	     $hpp->set_indent(0,qw(@D @E g1)); "

	   If the groups @D or @E are modified afterwards, the configuration
	   of the pretty printer is not affected, because "set_indent()" will
	   expand the tag groups.

EXAMPLE
       Consider the following HTML tree

	   <html> @0
	     <head> @0.0
	       <title> @0.0.0
		 "Demonstrate HTML::PrettyPrinter"
	     <body> @0.1
	       <h1> @0.1.0
		 "Headline"
	       <p align="JUSTIFY"> @0.1.1
		 "Some text in "
		 <b> @0.1.1.1
		   "bold"
		 " and "
		 <i> @0.1.1.3
		   "italics"
		 " and with 'A~X' & 'A~X'."
	       <table align="LEFT" border=0> @0.1.2
		 <tr> @0.1.2.0
		   <td align="RIGHT"> @0.1.2.0.0
		     "top right"
		 <tr> @0.1.2.1
		   <td align="LEFT"> @0.1.2.1.0
		     "bottom left"
	       <hr noshade="NOSHADE" size=5> @0.1.3
	       <address> @0.1.4
		 <a href="mailto:schotten@gmx.de"> @0.1.4.0
		   "ClausA Schotten"

       and "
	 $hpp = HTML::PrettyPrinter-"new('uppercase' => 1);
	 print @{$hpp->format($tree)}; >

       will print

	 <HTML><HEAD><TITLE>Demonstrate
	       HTML::PrettyPrinter</TITLE></HEAD><BODY><H1>Headline</H1><P
	       ALIGN=JUSTIFY>Some text in <B>bold</B> and
	       <I>italics</I> and with 'A~X' & 'A~X'.</P><TABLE
	       ALIGN=LEFT BORDER=0><TR><TD ALIGN=RIGHT>top
		   right</TD></TR><TR><TD ALIGN=LEFT>bottom
		   left</TD></TR></TABLE><HR NOSHADE SIZE=5
	       ><ADDRESS><A HREF="mailto:schotten@gmx.de"
		 >Claus Schotten</A></ADDRESS></BODY></HTML>

       That doesn't look very nice. What went wrong? By default
       HTML::PrettyPrinter takes a conservative approach on whitespace. It
       will enlarge existing whitespace, but it will not introduce new
       whitespace outside of tags, because that might change the way a browser
       renders the HTML document. However the HTML tree was constructed with
       ""ignore_ignorable_whitespace> turned on.  Thus, there is no whitespace
       between block elements that the pretty printer could format. So pretty
       printer does line wrapping and indention only.  E.g. the title is in
       the third level of the tree. Thus, the second line is indented six
       characters. The table cells in the fifth level are indented by ten
       characters. Furthermore, you see that there is a whitespace inserted
       after the last attribute of the <A> tag.

       Let's set $hpp->allow_forced_nl(1);. Now the forced_nl parameters are
       enabled. By default, they are set for all non-inline tags. That creates

	<HTML>
	  <HEAD>
	    <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
	  </HEAD>
	  <BODY>
	    <H1>Headline</H1>
	    <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
	      <I>italics</I> and with 'A~X' & 'A~X'.</P>
	    <TABLE ALIGN=LEFT BORDER=0>
	      <TR>
		<TD ALIGN=RIGHT>top right</TD>
	      </TR>
	      <TR>
		<TD ALIGN=LEFT>bottom left</TD>
	      </TR>
	    </TABLE>
	    <HR NOSHADE SIZE=5>
	    <ADDRESS><A HREF="mailto:schotten@gmx.de"
		>Claus Schotten</A></ADDRESS>
	  </BODY>
	</HTML>

       Much better, isn't it? Now let's improve the structuring.
	 $hpp->set_nl_before(2,qw(body table));
	 $hpp->set_nl_after(2,qw(table)); will require two new lines in front
       of <body> and <table> tags and after <table> tags.

	<HTML>
	  <HEAD>
	    <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
	  </HEAD>

	  <BODY>
	    <H1>Headline</H1>
	    <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
	      <I>italics</I> and with 'A~X' & 'A~X'.</P>

	    <TABLE ALIGN=LEFT BORDER=0>
	      <TR>
		<TD ALIGN=RIGHT>top right</TD>
	      </TR>
	      <TR>
		<TD ALIGN=LEFT>bottom left</TD>
	      </TR>
	    </TABLE>

	    <HR NOSHADE SIZE=5>
	    <ADDRESS><A HREF="mailto:schotten@gmx.de"
		>Claus Schotten</A></ADDRESS>
	  </BODY>
	</HTML>

       Currently the mail address is the only attribute value which is quoted.
       Here the quotes are required by the '@' character. For all other
       attribute values quotes are optional and thus ommited by default.
       $hpp->quote_attr(1); will turn the quotes on.

       $hpp->set_endtag(0,'all!') turns all optional endtags off.  This
       affects the </p> (and should affect </tr> and </td>, see below).
       Alternatively, we could use $hpp->set_endtag(0,'default!'). That would
       turn the default off, too. But it wouldn't delete settings for
       individual tags that supersede the default.

       $hpp->set_nl_after(3,'head') requires three new lines after the <head>
       element. Because there are already two new lines required by the start
       of <body> only one additional line is added.

       $hpp->set_force_nl(0,'td') will inhibit the introduction of whitespace
       alround <td>. Thus, the table cells are now on the same line as the
       table rows.

	 <HTML>
	   <HEAD>
	     <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
	   </HEAD>

	   <BODY>
	     <H1>Headline</H1>
	     <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
	       <I>italics</I> and with 'A~X' & 'A~X'.

	     <TABLE ALIGN="LEFT" BORDER="0">
	       <TR><TD ALIGN="RIGHT">top right</TD></TR>
	       <TR><TD ALIGN="LEFT">bottom left</TD></TR>
	     </TABLE>

	     <HR NOSHADE SIZE="5">
	     <ADDRESS><A HREF="mailto:schotten@gmx.de"
		 >Claus Schotten</A></ADDRESS>
	   </BODY>
	 </HTML>

       The end tags </td> and </tr> are printed because HTML:Tagset says they
       are mandatory.
	 map {$HTML::Tagset::optionalEndTag{$_}=1} qw(td tr th); will fix
       that.

       The additional new line after </head> doesn't look nice. With
       $hpp->set_nl_after(undef,'head') we will reset the parameter for the
       <head> tag.

       $hpp->entities($hpp->entities().'A~X'); will enforce the entity
       encoding of 'A~X'.

       $hpp->min_bool_attr(0); will inhibt the minimizyation of the NOSHADE
       attribute to <hr>.

       Let's fiddle with the indention:
	 $hpp->set_indent(8,'@TEXTBLOCK');
	 $hpp->set_indent(0,'html');

       New lines inside text blocks (here inside <h1>, <p> and <address>) will
       be indented by 8 characters instead of two, whereas the code directly
       under <html> will not be indented.

	<HTML>
	<HEAD>
	  <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
	</HEAD>

	<BODY>
	  <H1>Headline</H1>
	  <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
		  <I>italics</I> and with 'ä' & 'A~X'.

	  <TABLE ALIGN="LEFT" BORDER="0">
	    <TR><TD ALIGN="RIGHT">top right
	    <TR><TD ALIGN="LEFT">bottom left
	  </TABLE>

	  <HR NOSHADE="NOSHADE" SIZE="5">
	  <ADDRESS><A HREF="mailto:schotten@gmx.de"
		    >Claus Schotten</A></ADDRESS>
	</BODY>
	</HTML>

       $hpp->wrap_at_tagend(HTML::PrettyPrinter::NEVER); will disable the line
       wrap between the attribute and the '>' of the <a> tag. The resulting
       line excedes the target line length by far, but the is no point left,
       where the pretty printer could legaly break this line.

       $hpp->set_endtag(1,'tr') will overwrite the default. Thus, the </tr>
       appears in the code whereas the other optional endtags are still
       omitted.

       Finally, we customize some individual elements:

       "$tree-"address('0.1.1')->attr('_hpp_skip',1)>
	   will skip the <p> and its content from the output

       "$tree-"address('0.1.2.1.0')->attr('_hpp_force_nl',1)>
	   will force new lines arround the second <td>, but will not affect
	   the first.  <td>.

	<HTML>
	<HEAD>
	  <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
	</HEAD>

	<BODY>
	  <H1>Headline</H1>

	  <TABLE ALIGN="LEFT" BORDER="0">
	    <TR><TD ALIGN="RIGHT">top right</TR>
	    <TR>
	      <TD ALIGN="LEFT">bottom left
	    </TR>
	  </TABLE>

	  <HR NOSHADE="NOSHADE" SIZE="5">
	  <ADDRESS><A
		    HREF="mailto:schotten@gmx.de">Claus Schotten</A></ADDRESS>
	</BODY>
	</HTML>

KNOWN BUGS
       ·   This is early alpha code. The interfaces are subject to changes.

       ·   The module is tested with perl 5.005_03 only. It should work with
	   perl 5.004 though.

       ·   The predefined tag groups are incomplete. Several tags need to be
	   added.

       ·   Attribute values from a fixed set given in the DTD (e.g.
	   ALIGN=LEFT|RIGHT etc.) should be converted to upper or lower case
	   depending on the value of the uppercase parameter. Currently, they
	   are printed as given in the HTML tree.

       ·   No optimization for performance was done.

SEE ALSO
       HTML::TreeBuilder, HTML::Element, HTML::Tagset

COPYRIGHT
       Copyright 2000 Claus Schotten  schotten@gmx.de

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

AUTHOR
       Claus Schotten <schotten@gmx.de>

perl v5.14.1			  2011-06-21		      PrettyPrinter(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net