uni2ascii man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

uni2ascii(1)							  uni2ascii(1)

NAME
       uni2ascii  -  convert  UTF-8 Unicode to various 7-bit ASCII representa‐
       tions

SYNOPSIS
       uni2ascii [options] (<input file name>)

DESCRIPTION
       uni2ascii converts UTF-8 Unicode to  various  7-bit  ASCII  representa‐
       tions.  If  no  format  is specified, standard hexadecimal format (e.g.
       0x00e9) is used.	 It reads from the standard input and  writes  to  the
       standard output.

       Command line options are:

       -A     List  the	 single character approximations carried out by the -y
	      flag.

       -a <format>
	      Convert to the specified format. Formats	may  be	 specified  by
	      means  of	 the  following	 arbitrary  single character codes, by
	      means of names such as "SGML_decimal", and by  examples  of  the
	      desired format.

	      A	 Generate  hexadecimal numbers with prefix U in angle-brackets
	      (<U00E9>).

	      B Generate \x-escaped hex (e.g. \x00E9)

	      C Generate  \x  escaped  hexadecimal  numbers  in	 braces	 (e.g.
	      \x{00E9}).

	      D	 Generate  decimal  HTML  numeric  character  references (e.g.
	      é)

	      E Generate hexadecimal with prefix U (U00E9).

	      F Generate hexadecimal with prefix u (u00E9).

	      G Convert hexadecimal in	single	quotes	with  prefix  X	 (e.g.
	      X'00E9').

	      H	 Generate  hexadecimal HTML numeric character references (e.g.
	      é)

	      I Generate hexadecimal UTF-8 with each byte's hex preceded by an
	      =-sign  (e.g.  =C3=A9)  .	 This  is  the Quoted Printable format
	      defined by RFC 2045.

	      J Generate hexadecimal UTF-8 with each byte's hex preceded by  a
	      %-sign  (e.g.  %C3%A9). This is the URI escape format defined by
	      RFC 2396.

	      K Generate octal UTF-8 with each byte  escaped  by  a  backslash
	      (e.g.  \303\251)

	      L Generate \U-escaped hex outside the BMP, \u-escaped hex within
	      the BMP (U+0000-U+FFFF).

	      M Generate hexadecimal SGML numeric character  references	 (e.g.
	      \#xE9;)

	      N	 Generate  decimal  SGML  numeric  character  references (e.g.
	      \#233;)

	      O Generate octal escapes for the three low bytes	in  big-endian
	      order(e.g. \000\000\351))

	      P Generate hexadecimal numbers with prefix U+ (e.g. U+00E9)

	      Q	 Generate  character  entities (e.g. é) where possible,
	      otherwise hexadecimal numeric character references.

	      R Generate raw hexadecimal numbers (e.g. 00E9)

	      S Generate hexadecimal escapes for the three low bytes  in  big-
	      endian order (e.g. \x00\x00\xE9)

	      T Generate decimal escapes for the three low bytes in big-endian
	      order (e.g. \d000\d000\d233)

	      U Generate \u-escaped hexadecimal numbers (e.g. \u00E9).

	      V Generate \u-escaped decimal numbers (e.g. \u00233).

	      X Generate standard hexadecimal numbers (e.g. 0x00E9).

	      0 Generate hexadecimal  UTF-8  with  each	 byte's	 hex  enclosed
	      within angle brackets (e.g. <C3><A9>).

	      1 Generate Common Lisp format hexadecimal numbers (e.g. #x00E9).

	      2	 Generate  Perl	 format	 decimal  numbers  with prefix v (e.g.
	      v233).

	      3 Generate hexadecimal numbers with prefix $ (e.g. $00E9).

	      4 Generate Postscript format hexadecimal numbers with prefix 16#
	      (e.g. 16#00E9).

	      5	 Generate  Common  Lisp format hexadecimal numbers with prefix
	      #16r (e.g. #16r00E9).

	      6 Generate ADA format hexadecimal numbers with  prefix  16#  and
	      suffix # (e.g. 16#00E9#).

	      7	 Generate Apache log format hexadecimal UTF-8 with each byte's
	      hex preceded by a backslash-x (e.g.  \xC3\xA9).

	      8 Generate Microsoft OOXML format hexadecimal numbers with  pre‐
	      fix _x and suffix _ (e.g. _x00E9_).

	      9 Generate %\u-escaped hexadecimal numbers (e.g. %\u00E9).

       -B     Transform to ASCII if possible. This option is equivalent to the
	      combination cdefx.

       -c     Convert circled and parenthesized characters to their unenclosed
	      counterparts.

       -d     Strip  diacritics.  This converts single codepoints representing
	      characters with diacritics to the corresponding ASCII  character
	      and deletes separately encoded diacritics.

       -e     Convert  characters  to  their approximate ASCII equivalents, as
	      follows:
	      U+0085  next line					   0x0A	  new‐
	      line
	      U+00A0  no break space				  0x20	space
	      U+00AB  left-pointing double angle quotation mark	  0x22	double
	      quote
	      U+00AD  soft hyphen				  0x2D	minus
	      U+00AF  macron					  0x2D	minus
	      U+00B7  middle dot				  0x2E	period
	      U+00BB  right-pointing double angle quotation mark  0x22	double
	      quote
	      U+1361  ethiopic word space			  0x20	space
	      U+1680  ogham space				  0x20	space
	      U+2000  en quad					  0x20	space
	      U+2001  em quad					  0x20	space
	      U+2002  en space					  0x20	space
	      U+2003  em space					  0x20	space
	      U+2004  three-per-em space			  0x20	space
	      U+2005  four-per-em space				  0x20	space
	      U+2006  six-per-em space				  0x20	space
	      U+2007  figure space				  0x20	space
	      U+2008  punctuation space				  0x20	space
	      U+2009  thin space				  0x20	space
	      U+200A  hair space				  0x20	space
	      U+200B  zero-width space				  0x20	space
	      U+2010  hyphen					  0x2D	minus
	      U+2011  non-breaking hyphen			  0x2D	minus
	      U+2012  figure dash				  0x2D	minus
	      U+2013  en dash					  0x2D	minus
	      U+2014  em dash					  0x2D	minus
	      U+2018   left  single quotation mark		    0x60  left
	      single quote
	      U+2019  right single quotation mark		  0x27	 right
	      or neutral single quote
	      U+201A   single  low-9 quotation mark		    0x60  left
	      single quote
	      U+201B  single high-reversed-9 quotation mark	   0x60	  left
	      single quote
	      U+201C  left double quotation mark		  0x22	double
	      quote
	      U+201D  right double quotation mark		  0x22	double
	      quote
	      U+201E  double low-9 quotation mark		  0x22	double
	      quote
	      U+201F  double high-reversed-9 quotation mark	  0x22	double
	      quote
	      U+2022   bullet					   0x6F	 small
	      letter o
	      U+2028  line separator				   0x0A	  new‐
	      line
	      U+2033  double prime				  0x22	double
	      quote
	      U+2039  single left-pointing angle quotation mark	   0x60	  left
	      single quote
	      U+203A   single right-pointing angle quotation mark  0x27	 right
	      or neutral single quote
	      U+204E  low asterisk				  0x2A	aster‐
	      isk
	      U+2212  minus sign				  0x2D	minus
	      U+2216   set minus				   0x5C	 back‐
	      slash
	      U+2217  asterisk operator				  0x2A	aster‐
	      isk
	      U+2223  divides					  0x7C	verti‐
	      cal line
	      U+2500  box drawing light horizontal		  0x2D	minus
	      U+2501  box drawing heavy horizontal		  0x2D	minus
	      U+2502  box drawing light vertical		  0x7C	verti‐
	      cal line
	      U+2503  box drawing heavy vertical		  0x7C	verti‐
	      cal line
	      U+2731  heavy asterisk				  0x2A	aster‐
	      isk
	      U+275D  heavy double turned comma quotation mark	  0x22	double
	      quote
	      U+275E  heavy double comma quotation mark		  0x22	double
	      quote
	      U+3000  ideographic space				  0x20	space
	      U+FE60  small ampersand				  0x26	amper‐
	      sand
	      U+FE61  small asterisk				  0x2A	aster‐
	      isk
	      U+FE62   small  plus sign				    0x2B  plus
	      sign

       -E     List the expansions performed by the -x flag.

       -f     Convert stylistic variants to plain  ASCII.   Stylistic  equiva‐
	      lents  include:  superscript and subscript forms, small capitals
	      (e.g. U+1D04), script forms (e.g. U+212C),  black	 letter	 forms
	      (e.g.  U+212D),  fullwidth  forms (e.g. U+FF01), halfwidth forms
	      (e.g. U+FF7B), and the mathematical alphanumeric	symbols	 (e.g.
	      U+1D400).

       -h     Help. Print the usage message and exit.

       -l     Use lowercase a-f when generating hexadecimal numbers.

       -n     Convert newlines too. By default, they are left alone.

       -P     Pass  through Unicode rather than converting to ASCII escapes if
	      the character is not converted to an ASCII character by a trans‐
	      formation	 such as diacritic stripping. Note that if this option
	      is used the output may not be pure ASCII.

       -p     Pure. Convert characters within the ASCII range except for space
	      and newline as well as those above.

       -q     Quiet. Do not chat unnecessarily while working.

       -s     Convert space characters too. By default, they are left alone.

       -S <Unicode:ASCII>
	      Define a custom substitution. The argument should consist of the
	      Unicode codepoint to be replaced followed by the ASCII  code  of
	      the  character  to be used as replacement, separated by a colon.
	      If no ASCII code follows the colon, the specified Unicode	 char‐
	      acter  will  be deleted.	The code values may be in hexadecimal,
	      octal, or decimal following the usual conventions	 (to  be  pre‐
	      cise,those  of strtoul(3)).  This option may be repeated as many
	      times as desired to define multiple substitutions.

       -v     Print program version information and exit.

       -w     Add a space after each converted item.

       -x     Expand certain  characters  to  multicharacter  sequences.   The
	      characters  affected  are	 the  same as those affected by the -y
	      option.
	      U+00A2 CENT SIGN			      -> cent
	      U+00A3 POUND SIGN			      -> pound
	      U+00A5 YEN SIGN			      -> yen
	      U+00A9 COPYRIGHT SYMBOL		      -> (c)
	      U+00AE REGISTERED SYMBOL		      -> (R)
	      U+00BC ONE QUARTER		      -> 1/4
	      U+00BD ONE HALF			      -> 1/2
	      U+00BE THREE QUARTERS		      -> 3/4
	      U+00C6 CAPITAL LETTER ASH		      -> AE
	      U+00DF SMALL LETTER SHARP S	      -> ss
	      U+00E6 SMALL LETTER ASH		      -> ae
	      U+0132 LIGATURE IJ		      -> IJ
	      U+0133 LIGATURE ij		      -> ij
	      U+0152 LIGATURE OE		      -> OE
	      U+0153 LIGATURE oe		      -> oe
	      U+01F1 CAPITAL LETTER DZ		      -> DZ
	      U+01F2 MIXED LETTER Dz		      -> Dz
	      U+01F3 SMALL LETTER DZ		      -> dz
	      U+02A6 SMALL LETTER TS DIGRAPH	      -> ts
	      U+2026 HORIZONTAL ELLIPSIS	      -> ...
	      U+20AC EURO SIGN			      -> euro
	      U+22EF MIDLINE HORIZONTAL ELLIPSIS      -> ...
	      U+2190 LEFTWARDS ARROW		      -> <-
	      U+2192 RIGHTWARDS ARROW		      -> ->
	      U+21D0 LEFTWARDS DOUBLE ARROW	      -> <=
	      U+21D2 RIGHTWARDS DOUBLE ARROW	      -> =>
	      U+FB00 LATIN SMALL LIGATURE FF	      -> ff
	      U+FB01 LATIN SMALL LIGATURE FI	      -> fi
	      U+FB02 LATIN SMALL LIGATURE FL	      -> fl
	      U+FB03 LATIN SMALL LIGATURE FFI	      -> ffi
	      U+FB04 LATIN SMALL LIGATURE FFL	      -> ffl
	      U+FB06 LATIN SMALL LIGATURE ST	      -> st

       -y     Convert certain characters having multi-character expansions  to
	      single-character	ascii approximations instead (e.g. to maintain
	      character-positioning). The characters affected are the same  as
	      those affected by the -x option.
	      U+00A2 CENT SIGN			      -> c
	      U+00A3 POUND SIGN			      -> #
	      U+00A5 YEN SIGN			      -> Y
	      U+00A9 COPYRIGHT SYMBOL		      -> C
	      U+00AE REGISTERED SYMBOL		      -> R
	      U+00BC ONE QUARTER		      -> -
	      U+00BD ONE HALF			      -> -
	      U+00BE THREE QUARTERS		      -> -
	      U+00C6 CAPITAL LETTER ASH		      -> A
	      U+00DF SMALL LETTER SHARP S	      -> s
	      U+00E6 SMALL LETTER ASH		      -> a
	      U+0132 LIGATURE IJ		      -> I
	      U+0133 LIGATURE ij		      -> i
	      U+0152 LIGATURE OE		      -> O
	      U+0153 LIGATURE oe		      -> o
	      U+01F1 CAPITAL LETTER DZ		      -> D
	      U+01F2 MIXED LETTER Dz		      -> D
	      U+01F3 SMALL LETTER DZ		      -> d
	      U+02A6 SMALL LETTER TS DIGRAPH	      -> t
	      U+2026 HORIZONTAL ELLIPSIS	      -> .
	      U+20AC EURO SIGN			      -> E
	      U+22EF MIDLINE HORIZONTAL ELLIPSIS      -> .
	      U+2190 LEFTWARDS ARROW		      -> <
	      U+2192 RIGHTWARDS ARROW		      -> >
	      U+21D0 LEFTWARDS DOUBLE ARROW	      -> <
	      U+21D2 RIGHTWARDS DOUBLE ARROW	      -> >

       -Z <format>
	      Generate	output using the supplied format. The format specified
	      will be used as the format string in a call to printf(3) with  a
	      single  argument	consisting  of	an  unsigned long integer. For
	      example, to obtain the same output as with the -U flag, the for‐
	      mat would be: \u%04X.

       If  conversion  of  spaces  is disabled (as it is by default), if space
       characters outside the ASCII range are encountered (U+3000  ideographic
       space,  U+1351  Ethiopic word space, and U+1680 ogham space mark), they
       are replaced with the ASCII space character (0x20) so as	 to  keep  the
       output pure 7-bit ASCII.

       Note  that  XML	and XHTML numeric character entities are like those of
       HTML with two restrictions. First, in  X(HT)ML  the  terminating	 semi-
       colon  may  not	be omitted.  Second, in X(HT)ML the "x" must be lower-
       case, while in HTML it may be either upper- or  lower-case.  We	always
       generate	 the  terminating  semi-colon and use a lower-case "x", so the
       option dubbed "HTML" produces valid XML and XHTML as well.

EXIT STATUS
       The following values are returned on exit:

       0 SUCCESS
	      The input was successfully converted.

       2 I/O ERROR
	      A system error ocurred during input or output.

       3 INFO The user requested information such as  the  version  number  or
	      usage synopsis and this has been provided.

       5 BAD OPTION
	      An incorrect option flag was given on the command line.

       8 BAD RECORD
	      Ill-formed UTF-8 was detected in the input.

SEE ALSO
       ascii2uni(1), Text::Unidecode

AUTHOR
       Bill Poser <billposer@alum.mit.edu>

LICENSE
       GNU General Public License

				  April, 2011			  uni2ascii(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net