Test::utf8 man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Test::utf8(3)	      User Contributed Perl Documentation	 Test::utf8(3)

NAME
       Test::utf8 - handy utf8 tests

SYNOPSIS
	 is_valid_string($string);   # check the string is valid
	 is_sane_utf8($string);	     # check not double encoded
	 is_flagged_utf8($string);   # has utf8 flag set
	 is_within_latin_1($string); # but only has latin_1 chars in it

DESCRIPTION
       This module is a collection of tests that's useful when dealing with
       utf8 strings in Perl.

   Validity
       These two tests check if a string is valid, and if you've probably made
       a mistake with your string

       is_valid_string($string, $testname)
	   This passes and returns true true if and only if the scalar isn't a
	   invalid string; In short, it checks that the utf8 flag hasn't been
	   set for a string that isn't a valid utf8 encoding.

       is_sane_utf8($string, $name)
	   This test fails if the string contains something that looks like it
	   might be dodgy utf8, i.e. containing something that looks like the
	   multi-byte sequence for a latin-1 character but perl hasn't been
	   instructed to treat as such.	 Strings that are not utf8 always
	   automatically pass.

	   Some examples may help:

	     # This will pass as it's a normal latin-1 string
	     is_sane_utf8("Hello L\x{e9}eon");

	     # this will fail because the \x{c3}\x{a9} looks like the
	     # utf8 byte sequence for e-acute
	     my $string = "Hello L\x{c3}\x{a9}on";
	     is_sane_utf8($string);

	     # this will pass because the utf8 is correctly interpreted as utf8
	     Encode::_utf8_on($string)
	     is_sane_utf8($string);

	   Obviously this isn't a hundred percent reliable.  The edge case
	   where this will fail is where you have "\x{c2}" (which is "LATIN
	   CAPITAL LETTER WITH CIRCUMFLEX") or "\x{c3}" (which is "LATIN
	   CAPITAL LETTER WITH TILDE") followed by one of the latin-1
	   punctuation symbols.

	     # a capital letter A with tilde surrounded by smart quotes
	     # this will fail because it'll see the "\x{c2}\x{94}" and think
	     # it's actually the utf8 sequence for the end smart quote
	     is_sane_utf8("\x{93}\x{c2}\x{94}");

	   However, since this hardly comes up this test is reasonably
	   reliable in most cases.  Still, care should be applied in cases
	   where dynamic data is placed next to latin-1 punctuation to avoid
	   false negatives.

	   There exists two situations to cause this test to fail; The string
	   contains utf8 byte sequences and the string hasn't been flagged as
	   utf8 (this normally means that you got it from an external source
	   like a C library; When Perl needs to store a string internally as
	   utf8 it does it's own encoding and flagging transparently) or a
	   utf8 flagged string contains byte sequences that when translated to
	   characters themselves look like a utf8 byte sequence.  The test
	   diagnostics tells you which is the case.

   Checking the Range of Characters in a String
       These routines allow you to check the range of characters in a string.
       Note that these routines are blind to the actual encoding perl
       internally uses to store the characters, they just check if the string
       contains only characters that can be represented in the named encoding.

       is_within_ascii
	   Tests that a string only contains characters that are in the ASCII
	   charecter set.

       is_within_latin_1
	   Tests that a string only contains characters that are in latin-1.

   Simple utf8 Flag Tests
       Simply check if a scalar is or isn't flagged as utf8 by perl's
       internals.

       is_flagged_utf8($string, $name)
	   Passes if the string is flagged by perl's internals as utf8, fails
	   if it's not.

       isnt_flagged_utf8($string,$name)
	   The opposite of "is_flagged_utf8", passes if and only if the string
	   isn't flagged as utf8 by perl's internals.

	   Note: you can refer to this function as "isn't_flagged_utf8" if you
	   really want to.

AUTHOR
	 Copyright Mark Fowler 2004.  All rights reserved.

	 This program is free software; you can redistribute it
	 and/or modify it under the same terms as Perl itself.

BUGS
       None known.  Please report any to me via the CPAN RT system.  See
       http://rt.cpan.org/ for more details.

SEE ALSO
       Test::DoubleEncodedEntities for testing for double encoded HTML
       entities.

perl v5.14.1			  2011-06-20			 Test::utf8(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net