regerror man page on YellowDog

Man page or keyword search:  
man Server   18644 pages
apropos Keyword Search (all sections)
Output format
YellowDog logo
[printable version]

REGCOMP(P)		   POSIX Programmer's Manual		    REGCOMP(P)

NAME
       regcomp, regerror, regexec, regfree - regular expression matching

SYNOPSIS
       #include <regex.h>

       int regcomp(regex_t *restrict preg, const char *restrict pattern,
	      int cflags);
       size_t regerror(int errcode, const regex_t *restrict preg,
	      char *restrict errbuf, size_t errbuf_size);
       int regexec(const regex_t *restrict preg, const char *restrict string,
	      size_t nmatch, regmatch_t pmatch[restrict], int eflags);
       void regfree(regex_t *preg);

DESCRIPTION
       These  functions	 interpret  basic  and extended regular expressions as
       described in the Base Definitions volume of IEEE Std 1003.1-2001, Chap‐
       ter 9, Regular Expressions.

       The regex_t structure is defined in <regex.h> and contains at least the
       following member:

	  Member Type  Member Name  Description
	  size_t       re_nsub	    Number of parenthesized subexpressions.

       The regmatch_t structure is defined in <regex.h> and contains at	 least
       the following members:

	  Member Type Member Name Description
	  regoff_t    rm_so	  Byte offset from start of string to
				  start of substring.
	  regoff_t    rm_eo	  Byte offset from start of string of the
				  first character after the end of sub‐
				  string.

       The regcomp() function shall compile the regular	 expression  contained
       in  the string pointed to by the pattern argument and place the results
       in the structure pointed to by preg.  The cflags argument is  the  bit‐
       wise-inclusive  OR  of  zero  or more of the following flags, which are
       defined in the <regex.h> header:

       REG_EXTENDED
	      Use Extended Regular Expressions.

       REG_ICASE
	      Ignore case in  match.  (See  the	 Base  Definitions  volume  of
	      IEEE Std 1003.1-2001, Chapter 9, Regular Expressions.)

       REG_NOSUB
	      Report only success/fail in regexec().

       REG_NEWLINE
	      Change the handling of <newline>s, as described in the text.

       The  default  regular  expression  type	for pattern is a Basic Regular
       Expression. The application can specify	Extended  Regular  Expressions
       using the REG_EXTENDED cflags flag.

       If  the	REG_NOSUB flag was not set in cflags, then regcomp() shall set
       re_nsub to the number of	 parenthesized	subexpressions	(delimited  by
       "\(\)" in basic regular expressions or "()" in extended regular expres‐
       sions) found in pattern.

       The regexec() function compares the null-terminated string specified by
       string  with the compiled regular expression preg initialized by a pre‐
       vious call to regcomp().	 If it finds a match, regexec()	 shall	return
       0; otherwise, it shall return non-zero indicating either no match or an
       error. The eflags argument is the bitwise-inclusive OR of zero or  more
       of the following flags, which are defined in the <regex.h> header:

       REG_NOTBOL
	      The  first  character  of the string pointed to by string is not
	      the beginning of the line. Therefore, the circumflex character (
	      '^'  ),  when  taken as a special character, shall not match the
	      beginning of string.

       REG_NOTEOL
	      The last character of the string pointed to by string is not the
	      end  of the line. Therefore, the dollar sign ( '$' ), when taken
	      as a special character, shall not match the end of string.

       If nmatch is 0 or REG_NOSUB was set in  the  cflags  argument  to  reg‐
       comp(), then regexec() shall ignore the pmatch argument. Otherwise, the
       application shall ensure that the pmatch argument points	 to  an	 array
       with at least nmatch elements, and regexec() shall fill in the elements
       of that array with offsets of the substrings of string that  correspond
       to the parenthesized subexpressions of pattern: pmatch[ i]. rm_so shall
       be the byte offset of the beginning and pmatch[ i]. rm_eo shall be  one
       greater	than the byte offset of the end of substring i. (Subexpression
       i begins at the ith matched open parenthesis, counting from 1.) Offsets
       in pmatch[0] identify the substring that corresponds to the entire reg‐
       ular expression. Unused elements of  pmatch  up	to  pmatch[  nmatch-1]
       shall  be  filled with -1. If there are more than nmatch subexpressions
       in pattern ( pattern itself counts as a subexpression), then  regexec()
       shall  still  do the match, but shall record only the first nmatch sub‐
       strings.

       When matching a basic or extended regular expression, any given	paren‐
       thesized	 subexpression	of  pattern  might participate in the match of
       several different substrings of string, or it might not match any  sub‐
       string  even  though  the  pattern  as a whole did match. The following
       rules shall be used to determine which substrings to report  in	pmatch
       when matching regular expressions:

	1. If  subexpression i in a regular expression is not contained within
	   another subexpression, and it participated  in  the	match  several
	   times,  then	 the byte offsets in pmatch[ i] shall delimit the last
	   such match.

	2. If subexpression i is not contained within  another	subexpression,
	   and	it  did	 not participate in an otherwise successful match, the
	   byte offsets in pmatch[ i] shall be -1. A  subexpression  does  not
	   participate	in  the	 match when: '*' or "\{\}" appears immediately
	   after the subexpression in a basic regular expression, or '*' , '?'
	   ,  or  "{}"	appears	 immediately  after  the  subexpression	 in an
	   extended regular expression, and the subexpression  did  not	 match
	   (matched 0 times)

       or: '|' is used in an extended regular expression to select this subex‐
       pression or another, and the other subexpression matched.

	3. If subexpression i is contained within another subexpression j, and
	   i is not contained within any other subexpression that is contained
	   within j, and a match of subexpression j is reported in pmatch[ j],
	   then	 the match or non-match of subexpression i reported in pmatch[
	   i] shall be as described in 1. and 2.  above, but within  the  sub‐
	   string  reported  in	 pmatch[  j] rather than the whole string. The
	   offsets in pmatch[ i] are still relative to the start of string.

	4. If subexpression i is contained in subexpression j,	and  the  byte
	   offsets in pmatch[ j] are -1, then the pointers in pmatch[ i] shall
	   also be -1.

	5. If subexpression i matched a zero-length  string,  then  both  byte
	   offsets  in pmatch[ i] shall be the byte offset of the character or
	   null terminator immediately following the zero-length string.

       If, when regexec() is called, the locale is  different  from  when  the
       regular expression was compiled, the result is undefined.

       If  REG_NEWLINE	is  not	 set in cflags, then a <newline> in pattern or
       string shall be treated as an ordinary  character.  If  REG_NEWLINE  is
       set, then <newline> shall be treated as an ordinary character except as
       follows:

	1. A <newline> in string shall not be matched by a  period  outside  a
	   bracket  expression	or by any form of a non-matching list (see the
	   Base Definitions volume of IEEE Std 1003.1-2001, Chapter 9, Regular
	   Expressions).

	2. A  circumflex  (  '^' ) in pattern, when used to specify expression
	   anchoring (see the Base Definitions volume of IEEE Std 1003.1-2001,
	   Section  9.3.8,  BRE	 Expression  Anchoring), shall match the zero-
	   length string immediately after a <newline> in  string,  regardless
	   of the setting of REG_NOTBOL.

	3. A  dollar  sign ( '$' ) in pattern, when used to specify expression
	   anchoring, shall match the zero-length string immediately before  a
	   <newline> in string, regardless of the setting of REG_NOTEOL.

       The  regfree() function frees any memory allocated by regcomp() associ‐
       ated with preg.

       The following constants are defined as error return values:

       REG_NOMATCH
	      regexec() failed to match.

       REG_BADPAT
	      Invalid regular expression.

       REG_ECOLLATE
	      Invalid collating element referenced.

       REG_ECTYPE
	      Invalid character class type referenced.

       REG_EESCAPE
	      Trailing '\' in pattern.

       REG_ESUBREG
	      Number in "\digit" invalid or in error.

       REG_EBRACK
	      "[]" imbalance.

       REG_EPAREN
	      "\(\)" or "()" imbalance.

       REG_EBRACE
	      "\{\}" imbalance.

       REG_BADBR
	      Content of "\{\}" invalid: not a number, number too large,  more
	      than two numbers, first larger than second.

       REG_ERANGE
	      Invalid endpoint in range expression.

       REG_ESPACE
	      Out of memory.

       REG_BADRPT
	      '?' , '*' , or '+' not preceded by valid regular expression.

       The regerror() function provides a mapping from error codes returned by
       regcomp() and regexec() to unspecified printable strings. It  generates
       a  string corresponding to the value of the errcode argument, which the
       application shall ensure is the last non-zero value  returned  by  reg‐
       comp()  or  regexec()  with  the given value of preg. If errcode is not
       such a value, the content of the generated string is unspecified.

       If preg is a null pointer, but errcode is a value returned by a	previ‐
       ous  call  to regexec() or regcomp(), the regerror() still generates an
       error string corresponding to the value of errcode, but it might not be
       as detailed under some implementations.

       If the errbuf_size argument is not 0, regerror() shall place the gener‐
       ated string into the buffer of size errbuf_size	bytes  pointed	to  by
       errbuf.	If  the	 string (including the terminating null) cannot fit in
       the buffer, regerror() shall truncate the string and null-terminate the
       result.

       If  errbuf_size	is 0, regerror() shall ignore the errbuf argument, and
       return the size of the buffer needed to hold the generated string.

       If the preg argument to regexec() or regfree() is not a compiled	 regu‐
       lar  expression	returned by regcomp(), the result is undefined. A preg
       is no longer treated as a compiled regular expression after it is given
       to regfree().

RETURN VALUE
       Upon successful completion, the regcomp() function shall return 0. Oth‐
       erwise, it shall	 return	 an  integer  value  indicating	 an  error  as
       described in <regex.h>, and the content of preg is undefined. If a code
       is returned, the interpretation shall be as given in <regex.h>.

       If regcomp() detects an invalid RE, it may return REG_BADPAT, or it may
       return one of the error codes that more precisely describes the error.

       Upon successful completion, the regexec() function shall return 0. Oth‐
       erwise, it shall return REG_NOMATCH to indicate no match.

       Upon successful completion, the regerror() function  shall  return  the
       number  of  bytes needed to hold the entire generated string, including
       the null termination. If the return value is greater than  errbuf_size,
       the  string  returned in the buffer pointed to by errbuf has been trun‐
       cated.

       The regfree() function shall not return a value.

ERRORS
       No errors are defined.

       The following sections are informative.

EXAMPLES
	      #include <regex.h>

	      /*
	       * Match string against the extended regular expression in
	       * pattern, treating errors as no match.
	       *
	       * Return 1 for match, 0 for no match.
	       */

	      int
	      match(const char *string, char *pattern)
	      {
		  int	 status;
		  regex_t    re;

		  if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
		      return(0);      /* Report error. */
		  }
		  status = regexec(&re, string, (size_t) 0, NULL, 0);
		  regfree(&re);
		  if (status != 0) {
		      return(0);      /* Report error. */
		  }
		  return(1);
	      }

       The following demonstrates how the REG_NOTBOL flag could be  used  with
       regexec()  to  find  all substrings in a line that match a pattern sup‐
       plied by a user. (For simplicity of  the	 example,  very	 little	 error
       checking is done.)

	      (void) regcomp (&re, pattern, 0);
	      /* This call to regexec() finds the first match on the line. */
	      error = regexec (&re, &buffer[0], 1, &pm, 0);
	      while (error == 0) {  /* While matches found. */
		  /* Substring found between pm.rm_so and pm.rm_eo. */
		  /* This call to regexec() finds the next match. */
		  error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
	      }

APPLICATION USAGE
       An application could use:

	      regerror(code,preg,(char *)NULL,(size_t)0)

       to  find	 out how big a buffer is needed for the generated string, mal‐
       loc() a buffer to hold the string, and then call	 regerror()  again  to
       get the string. Alternatively, it could allocate a fixed, static buffer
       that is big enough to hold most strings, and then use malloc() to allo‐
       cate a larger buffer if it finds that this is too small.

       To  match  a  pattern as described in the Shell and Utilities volume of
       IEEE Std 1003.1-2001, Section 2.13, Pattern Matching Notation, use  the
       fnmatch() function.

RATIONALE
       The  regexec()  function	 must  fill  in all nmatch elements of pmatch,
       where nmatch and pmatch are supplied by the application, even  if  some
       elements	 of pmatch do not correspond to subexpressions in pattern. The
       application writer should note that there is  probably  no  reason  for
       using a value of nmatch that is larger than preg-> re_nsub+1.

       The  REG_NEWLINE	 flag  supports a use of RE matching that is needed in
       some applications like text editors. In	such  applications,  the  user
       supplies	 an  RE asking the application to find a line that matches the
       given expression. An anchor in such an RE anchors at the	 beginning  or
       end  of	any  line.  Such  an  application can pass a sequence of <new‐
       line>-separated lines to regexec() as a single long string and  specify
       REG_NEWLINE  to	regcomp() to get the desired behavior. The application
       must ensure that there are no explicit  <newline>s  in  pattern	if  it
       wants to ensure that any match occurs entirely within a single line.

       The  REG_NEWLINE	 flag  affects the behavior of regexec(), but it is in
       the cflags parameter to regcomp() to allow flexibility  of  implementa‐
       tion.  Some  implementations will want to generate the same compiled RE
       in  regcomp()  regardless  of  the  setting  of	REG_NEWLINE  and  have
       regexec()  handle anchors differently based on the setting of the flag.
       Other implementations will generate different compiled REs based on the
       REG_NEWLINE.

       The  REG_ICASE flag supports the operations taken by the grep -i option
       and the historical implementations of ex and vi.	 Including  this  flag
       will  make  it  easier for application code to be written that does the
       same thing as these utilities.

       The substrings reported in pmatch[] are defined using offsets from  the
       start  of  the  string rather than pointers. Since this is a new inter‐
       face, there should be no impact on historical implementations or appli‐
       cations,	 and  offsets  should  be just as easy to use as pointers. The
       change to offsets was made to facilitate future extensions in which the
       string  to  be searched is presented to regexec() in blocks, allowing a
       string to be searched that is not all in memory at once.

       The type regoff_t is used for the elements of pmatch[] to  ensure  that
       the application can represent either the largest possible array in mem‐
       ory (important for an application conforming to the Shell and Utilities
       volume of IEEE Std 1003.1-2001) or the largest possible file (important
       for an application using the extension where  a	file  is  searched  in
       chunks).

       The  standard  developers rejected the inclusion of a regsub() function
       that would be used to do substitutions for a matched RE. While  such  a
       routine would be useful to some applications, its utility would be much
       more limited than the matching function described here. Both RE parsing
       and  substitution  are possible to implement without support other than
       that required by the ISO C standard, but matching is much more  complex
       than  substituting.  The only difficult part of substitution, given the
       information supplied by regexec(), is finding the next character	 in  a
       string  when  there can be multi-byte characters. That is a much larger
       issue, and one that needs a more general solution.

       The errno variable has not been used for error returns to avoid filling
       the errno name space for this feature.

       The interface is defined so that the matched substrings rm_sp and rm_ep
       are in a separate regmatch_t structure  instead	of  in	regex_t.  This
       allows  a  single compiled RE to be used simultaneously in several con‐
       texts; in main() and a signal handler, perhaps, or in multiple  threads
       of  lightweight	processes. (The preg argument to regexec() is declared
       with type const, so the implementation is  not  permitted  to  use  the
       structure to store intermediate results.) It also allows an application
       to request an arbitrary number of substrings from an RE. The number  of
       subexpressions  in  the	RE  is reported in re_nsub in preg.  With this
       change to regexec(), consideration was given to dropping the  REG_NOSUB
       flag since the user can now specify this with a zero nmatch argument to
       regexec().  However, keeping REG_NOSUB allows an implementation to  use
       a different (perhaps more efficient) algorithm if it knows in regcomp()
       that no subexpressions need be reported.	 The  implementation  is  only
       required	 to  fill  in pmatch if nmatch is not zero and if REG_NOSUB is
       not specified. Note that the size_t type, as defined in the ISO C stan‐
       dard,  is  unsigned,  so	 the description of regexec() does not need to
       address negative values of nmatch.

       REG_NOTBOL was added to allow an application to	do  repeated  searches
       for  the	 same  pattern in a line. If the pattern contains a circumflex
       character that should match the beginning of a line, then  the  pattern
       should only match when matched against the beginning of the line. With‐
       out the REG_NOTBOL flag, the application could rewrite  the  expression
       for  subsequent	matches,  but  in  the general case this would require
       parsing the expression. The need for REG_NOTEOL is not as clear; it was
       added for symmetry.

       The  addition  of the regerror() function addresses the historical need
       for conforming application programs to have access to error information
       more  than  "Function  failed to compile/match your RE for unknown rea‐
       sons".

       This interface provides for two different methods of dealing with error
       conditions. The specific error codes (REG_EBRACE, for example), defined
       in <regex.h>, allow an application to recover from an error if it is so
       able. Many applications, especially those that use patterns supplied by
       a user, will not try to deal with specific error cases, but  will  just
       use  regerror()	to obtain a human-readable error message to present to
       the user.

       The regerror() function uses a scheme similar to confstr() to deal with
       the  problem  of	 allocating  memory  to hold the generated string. The
       scheme used by strerror() in the ISO C standard	was  considered	 unac‐
       ceptable since it creates difficulties for multi-threaded applications.

       The  preg argument is provided to regerror() to allow an implementation
       to generate a more descriptive message  than  would  be	possible  with
       errcode alone. An implementation might, for example, save the character
       offset of the offending character of the pattern in a  field  of	 preg,
       and  then include that in the generated message string. The implementa‐
       tion may also ignore preg.

       A REG_FILENAME flag was	considered,  but  omitted.  This  flag	caused
       regexec()  to  match  patterns  as described in the Shell and Utilities
       volume of IEEE Std 1003.1-2001, Section 2.13, Pattern Matching Notation
       instead of REs. This service is now provided by the fnmatch() function.

       Notice	that   there   is  a  difference  in  philosophy  between  the
       ISO POSIX-2:1993 standard and IEEE Std 1003.1-2001 in how to  handle  a
       "bad"  regular expression. The ISO POSIX-2:1993 standard says that many
       bad constructs "produce undefined results", or that "the interpretation
       is undefined". IEEE Std 1003.1-2001, however, says that the interpreta‐
       tion of such REs is unspecified. The term "undefined"  means  that  the
       action by the application is an error, of similar severity to passing a
       bad pointer to a function.

       The regcomp() and regexec() functions are required to accept any	 null-
       terminated string as the pattern argument. If the meaning of the string
       is  "undefined",	 the  behavior	of  the	 function  is	"unspecified".
       IEEE Std 1003.1-2001  does not specify how the functions will interpret
       the pattern; they might return error codes, or they  might  do  pattern
       matching	 in  some  completely  unexpected  way, but they should not do
       something like abort the process.

FUTURE DIRECTIONS
       None.

SEE ALSO
       fnmatch()   ,   glob()	,    Shell    and    Utilities	  volume    of
       IEEE Std 1003.1-2001,  Section  2.13,  Pattern  Matching Notation, Base
       Definitions volume of IEEE Std 1003.1-2001, Chapter 9, Regular  Expres‐
       sions, <regex.h>, <sys/types.h>

COPYRIGHT
       Portions	 of  this text are reprinted and reproduced in electronic form
       from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
       --  Portable  Operating	System	Interface (POSIX), The Open Group Base
       Specifications Issue 6, Copyright (C) 2001-2003	by  the	 Institute  of
       Electrical  and	Electronics  Engineers, Inc and The Open Group. In the
       event of any discrepancy between this version and the original IEEE and
       The  Open Group Standard, the original IEEE and The Open Group Standard
       is the referee document. The original Standard can be  obtained	online
       at http://www.opengroup.org/unix/online.html .

IEEE/The Open Group		     2003			    REGCOMP(P)
[top]

List of man pages available for YellowDog

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net