ure man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

UTF(3)									UTF(3)

NAME
       urecomp,	 ureexec, ureerror, urefree - UTF Regular Expression function‐
       ality

SYNOPSIS
       #include <ure.h>

       int urecomp(ure_t *up, char *exp, int cflags);

       int ureexec(ure_t *up, char *string, int matchc, urematch_t *matchv, int eflags, char *collseq);

       int ureerror(int errcode, ure_t *up, char *buf, int size);

       int urefree(ure_t *up);

DESCRIPTION
       The URE routines are utf(3)-aware regular expression routines.  urecomp
       is  used to compile an expression and ureexec is used to match the com‐
       piled expression against a character  string.   Matching	 can  be  done
       using a collation sequence other than English, which is the default. To
       do this, use the collseq argument to the ureexec function to point to a
       UTF  string  which  is the key to the desired collation sequence.  This
       collation sequence must correspond to the utf  representation  of  that
       language	 in the langcoll.utf file.  If this argument is NULL, then the
       environment variable UTFCOLLSEQ will be used to determine the collation
       sequence.   If  this  too  is NULL, then the default collation sequence
       (English) is used.  It is also possible, but not recommended,  to  call
       the urecollseq function directly.

       ureerror	 is used to format an error code which can be returned by ure‐
       comp or ureexec.	 urefree is used to free any space that was  allocated
       by urecomp.

       Character ranges are defined at execution time, not compile time.  Case
       insensitivity is defined at execution time, rather  than	 compile-time,
       which  obviates	the need to recompile expressions when case (in)sensi‐
       tivity is the only difference.

       These routines are by no means quick - the need	to  handle  characters
       which  may  be  more than 8 bits wide, plus the overhead of calculating
       ranges of characters at execution time make this unavoidable.  However,
       functionality  was  the	goal  with  these routines, not sheer blinding
       speed.

FLAGS
       The cflags flag to urecomp is there simply to provide a POSIX-interface
       to  the URE functions.  It can take the URE_ICASE value, meaning ignore
       case sensitivity when matching expressions every time  this  expression
       is used.	 This is not advised - it would be better to ignore this flag,
       and then use the URE_ICASE flag to ureexec, giving  more	 control  over
       case-sensitivity.   Note	 that  extended regular expressions are always
       used (there does not seem to be any point in providing  extended	 func‐
       tionality,  only	 to  provide a way of ignoring it).  In addition, new-
       line matching is always done, and case-sensitivity is best  decided  at
       ureexec time.

       The  eflags  flag  to ureexec can take the following values: URE_ICASE,
       URE_NOTBOL.  URE_ICASE means perform the matching of the expression  in
       a  case-insensitive  manner,  and  uses	the current language collation
       sequence (see below). If none is specified, English is the default.

       URE_NOTBOL is used when the string passed to ureexec should not match a
       '^' metacharacter.

RETURN VALUES
       A  successful  compilation will result in URE_SUCCESS being returned by
       urecomp.	 urecomp  returns  URE_ERR_NULL_ARG  if	 it's  passed  a  null
       expression  to  compile.	  urecomp returns URE_ERR_TOO_BIG if the given
       expression turns out to be too big when compiled (although this	should
       not  happen).   If  urecomp is unable to allocate enough storage on the
       heap to store the compiled  expression,	URE_ERR_OUT_OF_SPACE  will  be
       returned.  Other	 error	codes  are  possible,  depending  on the error
       encountered, usually as part of a badly-formed regular expression.

       ureexec returns URE_SUCCESS if a match was found, and URE_NOMATCH if no
       match  was  found.  Other  error codes are possibly returned, for self-
       explanatory reasons: URE_ERR_NULL_PARAM, URE_ERR_BAD_MAGIC.

       ureerror can be used to get a textual representation of the error  mes‐
       sage.

EXAMPLE
       /* get the file into memory */
       static char *
       fgetfile(FILE *fp, int *size)
       {
	    struct stat	   s;
	    char      *cp;
	    int	      cc;

	    (void) fstat(fileno(fp), &s);
	    *size = s.st_size;
	    cp = (char *) malloc(*size + 1);
	    if (cp == (char *) NULL) {
		 (void) fprintf(stderr, "Memory problems.0);
		 exit(1);
	    }
	    cc = fread(cp, sizeof(char), *size, fp);
	    if (cc != *size) {
		 free(cp);
		 return (char *) NULL;
	    }
	    cp[cc] = 0;
	    return cp;
       }

       /* do a utf regexp search for each file */
       int
       dofile(ure_t *sp, char *f, int eflags, int pname, int plineno, int pline, char *collseq)
       {
	    urematch_t	   matchv[10];
	    char *buf;
	    char *cp;
	    Rune r;
	    char ebuf[BUFSIZ];
	    char done;
	    FILE *fp;
	    int	 ucc;
	    int	 err;
	    int	 i;

	    if ((fp = fopen(f, "r")) == (FILE *) NULL) {
		 return 0;
	    }
	    if ((buf = fgetfile(fp, &ucc)) == (char *) NULL) {
		 return 0;
	    }
	    cp = buf;
	    for (done = 0 ; !done ; ) {
		 switch (err = ureexec(sp, cp, 10, matchv, eflags, collseq)) {
		 case URE_SUCCESS:
		      if (pname) {
			   printf("%s:", f);
		      }
		      if (plineno) {
			   printf("%d:", LineNum(buf, &cp[matchv[0].rm_so]));
		      }
		      if (!pline) {
			   (void) fclose(fp);
			   return 1;
		      }
		      PrintLine(cp, sp, &cp[matchv[0].rm_so], &cp[matchv[0].rm_eo]);
		      cp = utfrune(&cp[matchv[0].rm_eo], '0);
		      if (cp == (char *) NULL) {
			   done = 1;
		      }
		      i = chartorune(&r, cp);
		      cp += i;
		      if (r == 0) {
			   done = 1;
		      }
		      break;
		 case URE_NOMATCH:
		      done = 1;
		      break;
		 default:
		      ureerror(err, sp, ebuf, sizeof(ebuf));
		      (void) fprintf(stderr, "Bad execution: %s0, ebuf);
		      done = 1;
		 }
	    }
	    (void) fclose(fp);
	    free(buf);
	    return 1;
       }

       extern int     optind;
       extern char    *optarg;

       int
       main(int argc, char **argv)
       {
	    ure_t     u;
	    char errmsg[BUFSIZ];
	    char *collseq;
	    int	 plineno;
	    int	 pline;
	    int	 eflags;
	    int	 err;
	    int	 i;

	    eflags = 0;
	    plineno = 0;
	    pline = 1;
	    while ((i = getopt(argc, argv, "a:iln")) != -1) {
		 switch(i) {
		 case 'a':
		      collseq = optarg;
		      break;
		 case 'i':
		      eflags |= URE_ICASE;
		      break;
		 case 'l':
		      pline = 0;
		      break;
		 case 'n':
		      plineno = 1;
		      break;
		 }
	    }
	    if ((err = urecomp(&u, argv[optind], 0)) != URE_SUCCESS) {
		 (void) ureerror(err, &u, errmsg, sizeof(errmsg));
		 (void) fprintf(stderr, "can't compile ure `%s', %s0,
				argv[optind], errmsg);
		 exit(1);
	    }
	    for (i = optind + 1 ; i < argc ; i++) {
		 dofile(&u, argv[i], eflags, (optind < argc - 1), plineno, pline, collseq);
	    }
	    urefree(&u);
	    exit(0);
       }

BUGS
       What software would be complete without bugs?

AUTHOR
       Written	  by	Alistair    Crooks   (agc@amdahl.com,	or   agc@west‐
       ley.demon.co.uk), and based on Henry Spencer's original regular expres‐
       sion  code.  I very much doubt that he would recognise his code now, or
       that he would want to.

									UTF(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net