uniconv man page on OpenSuSE

uniconv man page on OpenSuSE
Man page or keyword search:
man Server 25941 pages
apropos Keyword Search (all sections)
Output format
UNICONV(1)			LINUX COMMANDS			    UNICONV(1)

NAME
       uniconv - convert text to native formats through unicode

SYNOPSIS
       uniconv	-out  output-file [ -decode input-encoding ] [ -encode output-
       encoding ] [ input-file ] [ -todos ] [ -fromdos ] [ -tomac ] [ -frommac
       ]

DESCRIPTION
       uniconv	program	 decodes  scripts with a certain encoding encodes them
       with some other encoding.  The scipt is a 16,8 or  7  bit-byte  stream.
       The  converted  text  will be sent to the standard output, even in case
       of 16-bit encodings,unless the output file is  specified	 by  the  -out
       option.

       The  -decode and -encode options are optional, the default converter is
       utf-8.  The program reads the Unicode map helper files (*.my) from  the
       default	directory  /usr/share/data.   Simple  1-to-1  encodings can be
       added on the fly by adding a a my-file, or setting your	yudit.datapath
       property	   in	~/.yudit/yudit.properties   or	 /usr/share/yudit/con‐
       fig/yudit.properties.  By default /usr/share/yudit/data is searched.

       My-files can be created by a program called The files can be  converted
       between	dos/unix/mac  line-ending  variants  with  -fromdos, -frommac,
       -todos, -tomac options. the  default  (not  scpecified  one)  is	 Unix.
       makeumap.

ENCODING
       If you received this program through the Yudit distribution, then as of
       today you can convert between the encodings below.

       utf-8  Yudit  recommends	 this  format  for  international  information
	      exchange.	  ASCII	 text	will  get through  intact, while other
	      unicode characters will get their 8th bit set and the length  of
	      the   code   will depend on how far away they are in the Unicode
	      space.  This is the only transformation format that  can	encode
	      both 16-bit (ucs-2) and 31-bit (ucs-4) unicode.

       utf-8-s
	      Hackers  utf-8 format - it does not give an error message when a
	      surrogate pair is decoded and it can encode a surrogate pair 'as
	      is'.   This  is  not a recommended encoding format although this
	      format is used to encode/decode clipboard data, in order to pre‐
	      serve input.

       utf-16 Although 16 is bigger than 8 this is still a compromise required
	      by OSes like Windows that can not handle ucs-4 -	this  encoding
	      produces 16-bit unicode streams.	In addition to BMP it can con‐
	      vert 16 planes using the Unicode Surrogate Area.	This  encoding
	      can  not	convert anything above U+10FFFF (Plane 16).  The input
	      byte order is recognized by the first two characters BEM	(byte-
	      order-mark)  U+FEFF. This format is used in Windows NT for docu‐
	      ments like notepad .txt files.

       utf-16-be
	      Big endian utf-16 converter.

       utf-16-le
	      Littlen endian utf-16 converter.

       utf-7  This is the recommended  format  for  international  information
	      exchange, when 7-bit can only be used. It can only handle 16-bit
	      (utf-16) unicode, for ucs-4  (above  U+10FFFF)  you  should  use
	      utf-8 encoding.

       iso-8859-1
	      This  is	the  ISO 8859-1 character  encoding format. It is also
	      known as "Latin-1" encoding.

       iso-8859-2
	      This  is	the ISO 8859-2 character encoding format. It  is  also
	      known as "Central European" encoding.

       iso-8859-5
	      This  is	the  ISO  8859-5 character encoding format. It is also
	      known as "Cyrillic" encoding.

       iso-8859-7
	      This is the ISO 8859-7 character encoding	 format.  It  is  also
	      known as "Greek" encoding.

       iso-8859-9
	      This  is	the  ISO  8859-9 character encoding format. It is also
	      known as "Turkish" encoding.

       koi8-r This is the KOI8-R character encoding format. It is mainly  used
	      in Russia.

       cp-1251
	      This  is	the  CP1251  cyrillic character encoding format. It is
	      mainly used in Microsoft Windows and some web sites.

       iso-2022-jp
	      This is a Japanese character encoding  format.  It  is  a	 7-bit
	      encoding format.

       iso-2022-jp-3
	      This  is	a  Japanese  character	encoding format. It is a 7-bit
	      encoding format. It is base upon	JIS X 0213 standard.

       euc-jp This is a Japanese character encoding format.  It	 is  an	 8-bit
	      encoding format.	Mainly used in UNIX systems.

       euc-jp-3
	      The  official name is EUC-JISX0213 - I just could not read this.
	      This is a Japanese character encoding  format.  It  is  a	 8-bit
	      encoding format. It is base upon	JIS X 0213 standard.

       shift-jis
	      This  is	a  Japanese character encoding format.	It is an 8-bit
	      encoding format. Mainly used in MSDOS/Windows.

       shift-jis-3
	      The official name is Shift_JISX0213 -  I	just  could  not  read
	      this.   This  is a Japanese character encoding format.  It is an
	      8-bit encoding format. Mainly used in MSDOS/Windows.

       iso-2022-jp
	      This  is	a  Japanese  7-bit  character  encoding	 format.   The
	      iso-2022-jp  email  messages  can be decoded/encoded are in this
	      format.

       iso-2022-x11
	      This  is a Japanese character encoding format.  It is also known
	      as  "COMPOUND_TEXT" encoding for the X  Window System. This is a
	      7-bit encoding format.  It can be derived from the  ISO  2022-JP
	      format with some differences.

       ksc-5601-x11
	      This is a	 Korean	 character  encoding format used by the X win‐
	      dow system(COMPOUND_TEXT encoding) to encode Korean(KS  X	 1001)
	      and  US-ASCII.  This  is	a  7bit	 encoding  format compliant to
	      ISO-2022 specification for encoding of multiple character	 sets.
	      Please, note that this is DIFFERENT from ISO-2022-KR (defined in
	      IETF RFC 1557).

       euc-kr This  is	an 8bit	 multibyte encoding for	 Korean.   It  encodes
	      US-ASCII(7bit)  in  single  byte	range  and  characters in KS X
	      1001(formerly KS C 5601) in double byte range with MSB on(8bit).
	      It's used in Unix and Internet. Korean  version of MS-DOS, MacOS
	      and MS-Windows use compatible (most cases, identical) variant of
	      this encoding.

       johab  This   is	  a   Korean  encoding	specified  in  KS  X 1001(KS C
	      5601-1992),   Annex  3  as  a  supplementary  encoding.	Widely
	      used  in	Korean	MS-DOS	until mid-1990's.  It can  encode  all
	      Hangul syllables(11,172) of modern Korean as  well  as  all  the
	      special  symbols	and  Hanja  (Chinese  ideograms used in Korea)
	      defined in KS X 1001.

       uhc    A variant	 of  EUC-KR  used  in  Korean	MS-Windows  95/98(pro‐
	      prietary	encoding of Microsoft,CP949). Its character repertoire
	      includes all modern  syllables   of  Hangul,Korean    script  as
	      well  as	all  the  special symbols and Hanja (Chinese ideograms
	      used in Korea) defined in KS X 1001.

       gb-18030
	      This is a Chinese character encoding format based upon GB 18030.
	      It encodes the whole U+0000..U+10FFFF range, while being compat‐
	      ible with gb-2312.

       gb-2312-x11
	      This is a Chinese character encoding format based upon GB	 2312.
	      It is a 7-bit encoding format.

       gb-2312
	      This  is a Chinese character encoding format based upon GB 2312.
	      It is an 8-bit encoding format.

       big-5  This is a Chinese character  encoding  format  based  upon  BIG5
	      encoding.	 It is an 8-bit encoding format.

       hz     This  is	a Chinese character encoding format based upon "Hanzi"
	      encoding.	 It is a 7-bit encoding format.

       viscii This is a Vietnamese character encoding format.

       ucs-2-be
	      This converts 16-bit unicode (ucs-2) streams. The	 format	 takes
	      care  of big-endian variant.  Yudit does not recommend this for‐
	      mat.

       ucs-2-le
	      This converts 16-bit unicode (ucs-2) streams. The	 format	 takes
	      care  of	little-endian  variant.	 Yudit does not recommend this
	      format.

       ucs-2  This converts 16-bit unicode (ucs-2) streams.   The  input  byte
	      order is recognized by the first two characters BEM (byte-order-
	      mark) U+FEFF.  Yudit does not recommend this format.

       java   This converts \uxxxx character escapes. When encoding, all char‐
	      acters above U+0080 will be escaped with a string like '\u0080'.
	      When decoding the same format is decoded but, in addition, utf-8
	      format  is  also	recognized,  so it can also be used to recover
	      data  accidentally  saved	 with	the   wrong   enconding.   The
	      U+10000..U+10FFFF	 area  is  converted  to  surrogates  and vice
	      versa.

       java-s This converts \uxxxx character escapes. When encoding, all char‐
	      acters above U+0080 will be escaped with a string like '\u0080'.
	      When decoding the same format is decoded but, in addition, utf-8
	      format  is  also	recognized,  so it can also be used to recover
	      data accidentally saved with the wrong enconding. Surrogates are
	      not  treated specially during conversion - this is why it is not
	      a recommened conversion.

FILES
       ~/.yudit/yudit.properties or /usr/share/yudit/config/yudit.properties
	      can have yudit.datapath property. This is where  the  map	 files
	      are kept.	 By default /usr/share/yudit/data is searched.

SEE ALSO
	makeumap

AUTHOR
       This  program  was written by gsinai@yudit.org (Gaspar Sinai), Tokyo, 2
       January, 2001.

LINUX COMMANDS			  Nov 5 1997			    UNICONV(1)
[top]

List of man pages available for OpenSuSE

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome