uniconv man page on SuSE

uniconv man page on SuSE
Man page or keyword search:
man Server 14857 pages
apropos Keyword Search (all sections)
Output format
UNICONV(1)			LINUX COMMANDS			    UNICONV(1)

NAME
       uniconv - convert text to native formats through unicode

SYNOPSIS
       uniconv	-out  output-file [ -decode input-encoding ] [ -encode output-
       encoding ] [ input-file ] [ -todos ] [ -fromdos ] [ -tomac ] [ -frommac
       ]

DESCRIPTION
       uniconv	program	 decodes  scripts with a certain encoding encodes them
       with some other encoding.  The scipt is a 16,8 or  7  bit-byte  stream.
       The  converted  text  will be sent to the standard output, even in case
       of 16-bit encodings,unless the output file is  specified	 by  the  -out
       option.

       The  -decode and -encode options are optional, the default converter is
       utf-8.  The program reads the Unicode map helper files (*.my) from  the
       default	directory  /usr/share/data.   Simple  1-to-1  encodings can be
       added on the fly by adding a a my-file, or setting your	yudit.datapath
       property	   in	~/.yudit/yudit.properties   or	 /usr/share/yudit/con‐
       fig/yudit.properties.  By default /usr/share/yudit/data is searched.

       My-files can be created by a program called The files can be  converted
       between	dos/unix/mac  line-ending  variants  with  -fromdos, -frommac,
       -todos, -tomac options. the  default  (not  scpecified  one)  is	 Unix.
       makeumap.

ENCODING
       If you received this program through the Yudit distribution, then as of
       today you can convert between the encodings below.

       utf-8  Yudit  recommends	 this  format  for  international  information
	      exchange.	  ASCII	 text	will  get through  intact, while other
	      unicode characters will get their 8th bit set and the length  of
	      the   code   will depend on how far away they are in the Unicode
	      space.  This is the only transformation format that  can	encode
	      both 16-bit (ucs-2) and 31-bit (ucs-4) unicode.

	      utf-8-s Hackers utf-8 format - it does not give an error message
	      when a surrogate pair is decoded and it can encode  a  surrogate
	      pair  'as	 is'.	This  is  not  a  recommended  encoding format
	      although this format is used to encode/decode clipboard data, in
	      order to preserve input.

       utf-16 Although	16  is bigger than 8 this is still a compromise
	      required by OSes like Windows that can not handle ucs-4 -
	      this  encoding produces 16-bit unicode streams.  In addi‐
	      tion to BMP it can convert 16 planes  using  the	Unicode
	      Surrogate	 Area.	 This encoding can not convert anything
	      above U+10FFFF (Plane 16).  The input byte order is  rec‐
	      ognized by the first two characters BEM (byte-order-mark)
	      U+FEFF. This format is used in Windows NT	 for  documents
	      like notepad .txt files.

       utf-16-be
	      Big endian utf-16 converter.

       utf-16-le
	      Littlen endian utf-16 converter.

	      utf-7  This  is  the recommended format for international
	      information exchange, when 7-bit can only be used. It can
	      only  handle  16-bit  (utf-16)  unicode, for ucs-4 (above
	      U+10FFFF) you should use utf-8 encoding.

       iso-8859-1
	      This is the ISO 8859-1 character	encoding format. It  is
	      also known as "Latin-1" encoding.

       iso-8859-2
	      This  is	the ISO 8859-2 character encoding format. It is
	      also known as "Central European" encoding.

       iso-8859-5
	      This is the ISO 8859-5 character encoding format.	 It  is
	      also known as "Cyrillic" encoding.

       iso-8859-7
	      This  is	the ISO 8859-7 character encoding format. It is
	      also known as "Greek" encoding.

       iso-8859-9
	      This is the ISO 8859-9 character encoding format.	 It  is
	      also known as "Turkish" encoding.

       koi8-r This  is	the  KOI8-R  character	encoding  format. It is
	      mainly used in Russia.

       cp-1251
	      This is the CP1251 cyrillic character encoding format. It
	      is mainly used in Microsoft Windows and some web sites.

       iso-2022-jp
	      This  is	a  Japanese  character encoding format. It is a
	      7-bit encoding format.

       iso-2022-jp-3
	      This is a Japanese character encoding  format.  It  is  a
	      7-bit  encoding format. It is base upon  JIS X 0213 stan‐
	      dard.

       euc-jp This is a Japanese character encoding format.  It	 is  an
	      8-bit encoding format.  Mainly used in UNIX systems.

       euc-jp-3
	      The official name is EUC-JISX0213 - I just could not read
	      this.  This is a Japanese character encoding  format.  It
	      is  a  8-bit encoding format. It is base upon  JIS X 0213
	      standard.

       shift-jis
	      This is a Japanese character encoding format.  It	 is  an
	      8-bit encoding format. Mainly used in MSDOS/Windows.

       shift-jis-3
	      The  official  name  is Shift_JISX0213 - I just could not
	      read this.  This is a Japanese character encoding format.
	      It is an 8-bit encoding format. Mainly used in MSDOS/Win‐
	      dows.

       iso-2022-jp
	      This is a Japanese 7-bit character encoding format.   The
	      iso-2022-jp  email messages can be decoded/encoded are in
	      this format.

       iso-2022-x11
	      This  is a Japanese character  encoding  format.	 It  is
	      also  known as "COMPOUND_TEXT" encoding for the X	 Window
	      System. This is a	 7-bit	encoding  format.   It	can  be
	      derived  from  the  ISO  2022-JP format with some differ‐
	      ences.

       ksc-5601-x11
	      This is a	 Korean	 character  encoding format used by the
	      X	  window   system(COMPOUND_TEXT	  encoding)  to	 encode
	      Korean(KS X 1001) and US-ASCII. This is a	 7bit  encoding
	      format  compliant	 to ISO-2022 specification for encoding
	      of multiple character sets.  Please, note	 that  this  is
	      DIFFERENT from ISO-2022-KR (defined in IETF RFC 1557).

       euc-kr This   is	  an  8bit   multibyte encoding for Korean.  It
	      encodes US-ASCII(7bit) in single byte range  and	charac‐
	      ters  in	KS  X  1001(formerly  KS C 5601) in double byte
	      range with MSB on(8bit). It's used in Unix and  Internet.
	      Korean   version of MS-DOS, MacOS and MS-Windows use com‐
	      patible (most cases, identical) variant of this encoding.

       johab  This  is	a   Korean   encoding	specified   in	 KS   X
	      1001(KS  C  5601-1992),	 Annex	 3  as	a supplementary
	      encoding.	 Widely used in Korean MS-DOS until mid-1990's.
	      It  can	encode	 all Hangul syllables(11,172) of modern
	      Korean as well as all the special symbols and Hanja (Chi‐
	      nese ideograms used in Korea) defined in KS X 1001.

       uhc    A	 variant   of	EUC-KR	 used	in   Korean  MS-Windows
	      95/98(proprietary encoding of Microsoft,CP949). Its char‐
	      acter  repertoire	 includes  all	modern	 syllables   of
	      Hangul,Korean   script as well as all the special symbols
	      and Hanja (Chinese ideograms used in Korea) defined in KS
	      X 1001.

       gb-18030
	      This is a Chinese character encoding format based upon GB
	      18030.   It  encodes  the	 whole	U+0000..U+10FFFF range,
	      while being compatible with gb-2312.

       gb-2312-x11
	      This is a Chinese character encoding format based upon GB
	      2312.  It is a 7-bit encoding format.

       gb-2312
	      This is a Chinese character encoding format based upon GB
	      2312.  It is an 8-bit encoding format.

       big-5  This is a Chinese character encoding  format  based  upon
	      BIG5 encoding.  It is an 8-bit encoding format.

       hz     This  is	a  Chinese character encoding format based upon
	      "Hanzi" encoding.	 It is a 7-bit encoding format.

       viscii This is a Vietnamese character encoding format.

       ucs-2-be
	      This converts 16-bit unicode (ucs-2) streams. The	 format
	      takes  care of big-endian variant.  Yudit does not recom‐
	      mend this format.

       ucs-2-le
	      This converts 16-bit unicode (ucs-2) streams. The	 format
	      takes care of little-endian variant.  Yudit does not rec‐
	      ommend this format.

       ucs-2  This converts 16-bit unicode (ucs-2) streams.  The  input
	      byte  order is recognized by the first two characters BEM
	      (byte-order-mark) U+FEFF.	 Yudit does not recommend  this
	      format.

       java   This  converts  \uxxxx  character escapes. When encoding,
	      all characters above U+0080 will be escaped with a string
	      like  '\u0080'.  When decoding the same format is decoded
	      but, in addition, utf-8 format is also recognized, so  it
	      can  also be used to recover data accidentally saved with
	      the wrong enconding. The U+10000..U+10FFFF area  is  con‐
	      verted to surrogates and vice versa.

       java-s This  converts  \uxxxx  character escapes. When encoding,
	      all characters above U+0080 will be escaped with a string
	      like  '\u0080'.  When decoding the same format is decoded
	      but, in addition, utf-8 format is also recognized, so  it
	      can  also be used to recover data accidentally saved with
	      the wrong enconding. Surrogates are not treated specially
	      during  conversion  -  this is why it is not a recommened
	      conversion.

FILES
       ~/.yudit/yudit.properties or /usr/share/yudit/config/yudit.prop‐
       erties
	      can  have	 yudit.datapath property. This is where the map
	      files are	 kept.	 By  default  /usr/share/yudit/data  is
	      searched.

SEE ALSO
	makeumap

AUTHOR
       This  program   was  written by gsinai@yudit.org (Gaspar Sinai),
       Tokyo, 2 January, 2001.

LINUX COMMANDS			  Nov 5 1997			    UNICONV(1)
[top]

List of man pages available for SuSE

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome