HTML::Defang man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

HTML::Defang(3)	      User Contributed Perl Documentation      HTML::Defang(3)

NAME
       HTML::Defang - Cleans HTML as well as CSS of scripting and other
       executable contents, and neutralises XSS attacks.

SYNOPSIS
	 my $InputHtml = "<html><body></body></html>";

	 my $Defang = HTML::Defang->new(
	   context => $Self,
	   fix_mismatched_tags => 1,
	   tags_to_callback => [ br embed img ],
	   tags_callback => \&DefangTagsCallback,
	   url_callback => \&DefangUrlCallback,
	   css_callback => \&DefangCssCallback,
	   attribs_to_callback => [ qw(border src) ],
	   attribs_callback => \&DefangAttribsCallback
	 );

	 my $SanitizedHtml = $Defang->defang($InputHtml);

	 # Callback for custom handling specific HTML tags
	 sub DefangTagsCallback {
	   my ($Self, $Defang, $OpenAngle, $lcTag, $IsEndTag, $AttributeHash, $CloseAngle, $HtmlR, $OutR) = @_;

	   # Explicitly defang this tag, eventhough safe
	   return DEFANG_ALWAYS if $lcTag eq 'br';

	   # Explicitly whitelist this tag, eventhough unsafe
	   return DEFANG_NONE if $lcTag eq 'embed';

	   # I am not sure what to do with this tag, so process as HTML::Defang normally would
	   return DEFANG_DEFAULT if $lcTag eq 'img';
	 }

	 # Callback for custom handling URLs in HTML attributes as well as style tag/attribute declarations
	 sub DefangUrlCallback {
	   my ($Self, $Defang, $lcTag, $lcAttrKey, $AttrValR, $AttributeHash, $HtmlR) = @_;

	   # Explicitly allow this URL in tag attributes or stylesheets
	   return DEFANG_NONE if $$AttrValR =~ /safesite.com/i;

	   # Explicitly defang this URL in tag attributes or stylesheets
	   return DEFANG_ALWAYS if $$AttrValR =~ /evilsite.com/i;
	 }

	 # Callback for custom handling style tags/attributes
	 sub DefangCssCallback {
	   my ($Self, $Defang, $Selectors, $SelectorRules, $Tag, $IsAttr) = @_;
	   my $i = 0;
	   foreach (@$Selectors) {
	     my $SelectorRule = $$SelectorRules[$i];
	     foreach my $KeyValueRules (@$SelectorRule) {
	       foreach my $KeyValueRule (@$KeyValueRules) {
		 my ($Key, $Value) = @$KeyValueRule;

		 # Comment out any '!important' directive
		 $$KeyValueRule[2] = DEFANG_ALWAYS if $Value =~ '!important';

		 # Comment out any 'position=fixed;' declaration
		 $$KeyValueRule[2] = DEFANG_ALWAYS if $Key =~ 'position' && $Value =~ 'fixed';
	       }
	     }
	     $i++;
	   }
	 }

	 # Callback for custom handling HTML tag attributes
	 sub DefangAttribsCallback {
	   my ($Self, $Defang, $lcTag, $lcAttrKey, $AttrValR, $HtmlR) = @_;

	   # Change all 'border' attribute values to zero.
	   $$AttrValR = '0' if $lcAttrKey eq 'border';

	   # Defang all 'src' attributes
	   return DEFANG_ALWAYS if $lcAttrKey eq 'src';

	   return DEFANG_NONE;
	 }

DESCRIPTION
       This module accepts an input HTML and/or CSS string and removes any
       executable code including scripting, embedded objects, applets, etc.,
       and neutralises any XSS attacks. A whitelist based approach is used
       which means only HTML known to be safe is allowed through.

       HTML::Defang uses a custom html tag parser. The parser has been
       designed and tested to work with nasty real world html and to try and
       emulate as close as possible what browsers actually do with strange
       looking constructs. The test suite has been built based on examples
       from a range of sources such as http://ha.ckers.org/xss.html and
       http://imfo.ru/csstest/css_hacks/import.php to ensure that as many as
       possible XSS attack scenarios have been dealt with.

       HTML::Defang can make callbacks to client code when it encounters the
       following:

       ·   When a specified tag is parsed

       ·   When a specified attribute is parsed

       ·   When a URL is parsed as part of an HTML attribute, or CSS property
	   value.

       ·   When style data is parsed, as part of an HTML style attribute, or
	   as part of an HTML <style> tag.

       The callbacks include details about the current tag/attribute that is
       being parsed, and also gives a scalar reference to the input HTML.
       Querying pos() on the input HTML should indicate where the module is
       with parsing. This gives the client code flexibility in working with
       HTML::Defang.

       HTML::Defang can defang whole tags, any attribute in a tag, any URL
       that appear as an attribute or style property, or any CSS declaration
       in a declaration block in a style rule. This helps to precisely block
       the most specific unwanted elements in the contents(for example, block
       just an offending attribute instead of the whole tag), while retaining
       any safe HTML/CSS.

CONSTRUCTOR
       HTML::Defang->new(%Options)
	   Constructs a new HTML::Defang object. The following options are
	   supported:

	   Options
	       tags_to_callback
		   Array reference of tags for which a call back should be
		   made. If a tag in this array is parsed, the subroutine
		   tags_callback() is invoked.

	       attribs_to_callback
		   Array reference of tag attributes for which a call back
		   should be made. If an attribute in this array is parsed,
		   the subroutine attribs_callback() is invoked.

	       tags_callback
		   Subroutine reference to be invoked when a tag listed in
		   @$tags_to_callback is parsed.

	       attribs_callback
		   Subroutine reference to be invoked when an attribute listed
		   in @$attribs_to_callback is parsed.

	       url_callback
		   Subroutine reference to be invoked when a URL is detected
		   in an HTML tag attribute or a CSS property.

	       css_callback
		   Subroutine reference to be invoked when CSS data is found
		   either as the contents of a 'style' attribute in an HTML
		   tag, or as the contents of a <style> HTML tag.

	       fix_mismatched_tags
		   This property, if set, fixes mismatched tags in the HTML
		   input. By default, tags present in the default
		   %mismatched_tags_to_fix hash are fixed. This set of tags
		   can be overridden by passing in an array reference
		   $mismatched_tags_to_fix to the constructor. Any opened tags
		   in the set are automatically closed if no corresponding
		   closing tag is found. If an unbalanced closing tag is
		   found, that is commented out.

	       mismatched_tags_to_fix
		   Array reference of tags for which the code would check for
		   matching opening and closing tags. See the property
		   $fix_mismatched_tags.

	       context
		   You can pass an arbitrary scalar as a 'context' value
		   that's then passed as the first parameter to all callback
		   functions. Most commonly this is something like '$Self'

	       allow_double_defang
		   If this is true, then tag names and attribute names which
		   already begin with the defang string ("defang_" by default)
		   will have an additional copy of the defang string prepended
		   if they are flagged to be defanged by the return value of a
		   callback, or if the tag or attribute name is unknown.

		   The default is to assume that tag names and attribute names
		   beginning with the defang string are already made safe, and
		   need no further modification, even if they are flagged to
		   be defanged by the return value of a callback.  Any tag or
		   attribute modifications made directly by a callback are
		   still performed.

	       Debug
		   If set, prints debugging output.

CALLBACK METHODS
       COMMON PARAMETERS
	   A number of the callbacks share the same parameters. These common
	   parameters are documented here. Certain variables may have specific
	   meanings in certain callbacks, so be sure to check the
	   documentation for that method first before referring this section.

	   $context
	       You can pass an arbitrary scalar as a 'context' value that's
	       then passed as the first parameter to all callback functions.
	       Most commonly this is something like '$Self'

	   $Defang
	       Current HTML::Defang instance

	   $OpenAngle
	       Opening angle(<) sign of the current tag.

	   $lcTag
	       Lower case version of the HTML tag that is currently being
	       parsed.

	   $IsEndTag
	       Has the value '/' if the current tag is a closing tag.

	   $AttributeHash
	       A reference to a hash containing the attributes of the current
	       tag and their values. Each value is a scalar reference to the
	       value, rather than just a scalar value. You can add attributes
	       (remember to make it a scalar ref, eg $AttributeHash{"newattr"}
	       = \"newval"), delete attributes, or modify attribute values in
	       this hash, and any changes you make will be incorporated into
	       the output HTML stream.

	       The attribute values will have any entity references decoded
	       before being passed to you, and any unsafe values we be re-
	       encoded back into the HTML stream.

	       So for instance, the tag:

		 <div title="<"Hi there <">

	       Will have the attribute hash:

		 { title => \q[<"Hi there <] }

	       And will be turned back into the HTML on output:

		 <div title="<"Hi there <">

	   $CloseAngle
	       Anything after the end of last attribute including the closing
	       HTML angle(>)

	   $HtmlR
	       A scalar reference to the input HTML. The input HTML is parsed
	       using m/\G$SomeRegex/c constructs, so to continue from where
	       HTML:Defang left, clients can use m/\G$SomeRegex/c for further
	       processing on the input. This will resume parsing from where
	       HTML::Defang left. One can also use the pos() function to
	       determine where HTML::Defang left off. This combined with the
	       add_to_output() method should give reasonable flexibility for
	       the client to process the input.

	   $OutR
	       A scalar reference to the processed output HTML so far.

       tags_callback($context, $Defang, $OpenAngle, $lcTag, $IsEndTag,
       $AttributeHash, $CloseAngle, $HtmlR, $OutR)
	   If $Defang->{tags_callback} exists, and HTML::Defang has parsed a
	   tag preset in $Defang->{tags_to_callback}, the above callback is
	   made to the client code. The return value of this method determines
	   whether the tag is defanged or not. More details below.

	   Return values
	       DEFANG_NONE
		   The current tag will not be defanged.

	       DEFANG_ALWAYS
		   The current tag will be defanged.

	       DEFANG_DEFAULT
		   The current tag will be processed normally by HTML:Defang
		   as if there was no callback method specified.

       attribs_callback($context, $Defang, $lcTag, $lcAttrKey, $AttrVal,
       $HtmlR, $OutR)
	   If $Defang->{attribs_callback} exists, and HTML::Defang has parsed
	   an attribute present in $Defang->{attribs_to_callback}, the above
	   callback is made to the client code. The return value of this
	   method determines whether the attribute is defanged or not. More
	   details below.

	   Method parameters
	       $lcAttrKey
		   Lower case version of the HTML attribute that is currently
		   being parsed.

	       $AttrVal
		   Reference to the HTML attribute value that is currently
		   being parsed.

		   See $AttributeHash for details of decoding.

	   Return values
	       DEFANG_NONE
		   The current attribute will not be defanged.

	       DEFANG_ALWAYS
		   The current attribute will be defanged.

	       DEFANG_DEFAULT
		   The current attribute will be processed normally by
		   HTML:Defang as if there was no callback method specified.

       url_callback($context, $Defang, $lcTag, $lcAttrKey, $AttrVal,
       $AttributeHash, $HtmlR, $OutR)
	   If $Defang->{url_callback} exists, and HTML::Defang has parsed a
	   URL, the above callback is made to the client code. The return
	   value of this method determines whether the attribute containing
	   the URL is defanged or not. URL callbacks can be made from <style>
	   tags as well style attributes, in which case the particular style
	   declaration will be commented out. More details below.

	   Method parameters
	       $lcAttrKey
		   Lower case version of the HTML attribute that is currently
		   being parsed. However if this callback is made as a result
		   of parsing a URL in a style attribute, $lcAttrKey will be
		   set to the string style, or will be set to undef if this
		   callback is made as a result of parsing a URL inside a
		   style tag.

	       $AttrVal
		   Reference to the URL value that is currently being parsed.

	       $AttributeHash
		   A reference to a hash containing the attributes of the
		   current tag and their values. Each value is a scalar
		   reference to the value, rather than just a scalar value.
		   You can add attributes (remember to make it a scalar ref,
		   eg $AttributeHash{"newattr"} = \"newval"), delete
		   attributes, or modify attribute values in this hash, and
		   any changes you make will be incorporated into the output
		   HTML stream. Will be set to undef if the callback is made
		   due to URL in a <style> tag or attribute.

	   Return values
	       DEFANG_NONE
		   The current URL will not be defanged.

	       DEFANG_ALWAYS
		   The current URL will be defanged.

	       DEFANG_DEFAULT
		   The current URL will be processed normally by HTML:Defang
		   as if there was no callback method specified.

       css_callback($context, $Defang, $Selectors, $SelectorRules, $lcTag,
       $IsAttr, $OutR)
	   If $Defang->{css_callback} exists, and HTML::Defang has parsed a
	   <style> tag or style attribtue, the above callback is made to the
	   client code. The return value of this method determines whether a
	   particular declaration in the style rules is defanged or not. More
	   details below.

	   Method parameters
	       $Selectors
		   Reference to an array containing the selectors in a style
		   tag or attribute.

	       $SelectorRules
		   Reference to an array containing the style declaration
		   blocks of all selectors in a style tag or attribute.
		   Consider the below CSS:

		     a { b:c; d:e}
		     j { k:l; m:n}

		   The declaration blocks will get parsed into the following
		   data structure:

		     [
		       [
			 [ "b", "c", DEFANG_DEFAULT ],
			 [ "d", "e", DEFANG_DEFAULT ]
		       ],
		       [
			 [ "k", "l", DEFANG_DEFAULT ],
			 [ "m", "n", DEFANG_DEFAULT ]
		       ]
		     ]

		   So, generally each property:value pair in a declaration is
		   parsed into an array of the form

		     ["property", "value", X]

		   where X can be DEFANG_NONE, DEFANG_ALWAYS or
		   DEFANG_DEFAULT, and DEFANG_DEFAULT the default value. A
		   client can manipulate this value to instruct HTML::Defang
		   to defang this property:value pair.

		   DEFANG_NONE - Do not defang

		   DEFANG_ALWAYS - Defang the style:property value

		   DEFANG_DEFAULT - Process this as if there is no callback
		   specified

	       $IsAttr
		   True if the currently processed item is a style attribute.
		   False if the currently processed item is a style tag.

METHODS
       PUBLIC METHODS
	   defang($InputHtml)
	       Cleans up $InputHtml of any executable code including
	       scripting, embedded objects, applets, etc., and defang any XSS
	       attacks.

	       Method parameters
		   $InputHtml
		       The input HTML string that needs to be sanitized.

	       Returns the cleaned HTML. If fix_mismatched_tags is set, any
	       tags that appear in @$mismatched_tags_to_fix that are
	       unbalanced are automatically commented or closed.

	   add_to_output($String)
	       Appends $String to the output after the current parsed tag
	       ends. Can be used by client code in callback methods to add
	       HTML text to the processed output. If the HTML text needs to be
	       defanged, client code can safely call HTML::Defang->defang()
	       recursively from within the callback.

	       Method parameters
		   $String
		       The string that is added after the current parsed tag
		       ends.

       INTERNAL METHODS
	   Generally these methods never need to be called by users of the
	   class, because they'll be called internally as the appropriate tags
	   are encountered, but they may be useful for some users in some
	   cases.

	   defang_script($OutR, $HtmlR, $TagOps, $OpenAngle, $IsEndTag, $Tag,
	   $TagTrail, $Attributes, $CloseAngle)
	       This method is invoked when a <script> tag is parsed. Defangs
	       the <script> opening tag, and any closing tag. Any scripting
	       content is also commented out, so browsers don't display them.

	       Returns 1 to indicate that the <script> tag must be defanged.

	       Method parameters
		   $OutR
		       A reference to the processed output HTML before the tag
		       that is currently being parsed.

		   $HtmlR
		       A scalar reference to the input HTML.

		   $TagOps
		       Indicates what operation should be done on a tag. Can
		       be undefined, integer or code reference. Undefined
		       indicates an unknown tag to HTML::Defang, 1 indicates a
		       known safe tag, 0 indicates a known unsafe tag, and a
		       code reference indicates a subroutine that should be
		       called to parse the current tag. For example, <style>
		       and <script> tags are parsed by dedicated subroutines.

		   $OpenAngle
		       Opening angle(<) sign of the current tag.

		   $IsEndTag
		       Has the value '/' if the current tag is a closing tag.

		   $Tag
		       The HTML tag that is currently being parsed.

		   $TagTrail
		       Any space after the tag, but before attributes.

		   $Attributes
		       A reference to an array of the attributes and their
		       values, including any surrouding spaces. Each element
		       of the array is added by 'push' calls like below.

			 push @$Attributes, [ $AttributeName, $SpaceBeforeEquals, $EqualsAndSubsequentSpace, $QuoteChar, $AttributeValue, $QuoteChar, $SpaceAfterAtributeValue ];

		   $CloseAngle
		       Anything after the end of last attribute including the
		       closing HTML angle(>)

	   defang_style($OutR, $HtmlR, $TagOps, $OpenAngle, $IsEndTag, $Tag,
	   $TagTrail, $Attributes, $CloseAngle, $IsAttr)
	       Builds a list of selectors and declarations from HTML style
	       tags as well as style attributes in HTML tags and calls
	       defang_stylerule() to do the actual defanging.

	       Returns 0 to indicate that style tags must not be defanged.

	       Method parameters
		   $IsAttr
		       Whether we are currently parsing a style attribute or
		       style tag. $IsAttr will be true if we are currently
		       parsing a style attribute.

		   For a description of other parameters, see documentation of
		   defang_script() method

	   cleanup_style($StyleString)
	       Helper function to clean up CSS data. This function directly
	       operates on the input string without taking a copy.

	       Method parameters
		   $StyleString
		       The input style string that is cleaned.

	   defang_stylerule($SelectorsIn, $StyleRules, $lcTag, $IsAttr,
	   $HtmlR, $OutR)
	       Defangs style data.

	       Method parameters
		   $SelectorsIn
		       An array reference to the selectors in the style
		       tag/attribute contents.

		   $StyleRules
		       An array reference to the declaration blocks in the
		       style tag/attribute contents.

		   $lcTag
		       Lower case version of the HTML tag that is currently
		       being parsed.

		   $IsAttr
		       Whether we are currently parsing a style attribute or
		       style tag. $IsAttr will be true if we are currently
		       parsing a style attribute.

		   $HtmlR
		       A scalar reference to the input HTML.

		   $OutR
		       A scalar reference to the processed output so far.

	   defang_attributes($OutR, $HtmlR, $TagOps, $OpenAngle, $IsEndTag,
	   $Tag, $TagTrail, $Attributes, $CloseAngle)
	       Defangs attributes, defangs tags, does tag, attrib, css and url
	       callbacks.

	       Method parameters
		   For a description of the method parameters, see
		   documentation of defang_script() method

	   cleanup_attribute($AttributeString)
	       Helper function to cleanup attributes

	       Method parameters
		   $AttributeString
		       The value of the attribute.

SEE ALSO
       <http://mailtools.anomy.net/>, <http://htmlcleaner.sourceforge.net/>,
       HTML::StripScripts, HTML::Detoxifier, HTML::Sanitizer, HTML::Scrubber

AUTHOR
       Kurian Jose Aerthail <cpan@kurianja.fastmail.fm>. Thanks to Rob Mueller
       <cpan@robm.fastmail.fm> for initial code, guidance and support and bug
       fixes.

COPYRIGHT AND LICENSE
       Copyright (C) 2003-2010 by Opera Software Australia Pty Ltd

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.14.1			  2011-01-03		       HTML::Defang(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net