PCRE2SERIALIZE(3)PCRE2SERIALIZE(3)NAME
PCRE2 - Perl-compatible regular expressions (revised API)
SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS
int32_t pcre2_serialize_decode(pcre2_code **codes,
int32_t number_of_codes, const uint32_t *bytes,
pcre2_general_context *gcontext);
int32_t pcre2_serialize_encode(pcre2_code **codes,
int32_t number_of_codes, uint32_t **serialized_bytes,
PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
void pcre2_serialize_free(uint8_t *bytes);
int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);
If you are running an application that uses a large number of regular
expression patterns, it may be useful to store them in a precompiled
form instead of having to compile them every time the application is
run. However, if you are using the just-in-time optimization feature,
it is not possible to save and reload the JIT data, because it is posi‐
tion-dependent. In addition, the host on which the patterns are
reloaded must be running the same version of PCRE2, with the same code
unit width, and must also have the same endianness, pointer width and
PCRE2_SIZE type. For example, patterns compiled on a 32-bit system
using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
can they be reloaded using the 8-bit library.
SAVING COMPILED PATTERNS
Before compiled patterns can be saved they must be serialized, that is,
converted to a stream of bytes. A single byte stream may contain any
number of compiled patterns, but they must all use the same character
tables. A single copy of the tables is included in the byte stream (its
size is 1088 bytes). For more details of character tables, see the sec‐
tion on locale support in the pcre2api documentation.
The function pcre2_serialize_encode() creates a serialized byte stream
from a list of compiled patterns. Its first two arguments specify the
list, being a pointer to a vector of pointers to compiled patterns, and
the length of the vector. The third and fourth arguments point to vari‐
ables which are set to point to the created byte stream and its length,
respectively. The final argument is a pointer to a general context,
which can be used to specify custom memory mangagement functions. If
this argument is NULL, malloc() is used to obtain memory for the byte
stream. The yield of the function is the number of serialized patterns,
or one of the following negative error codes:
PCRE2_ERROR_BADDATA the number of patterns is zero or less
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
PCRE2_ERROR_MEMORY memory allocation failed
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been cor‐
rupted, or that a slot in the vector does not point to a compiled pat‐
tern.
Once a set of patterns has been serialized you can save the data in any
appropriate manner. Here is sample code that compiles two patterns and
writes them to a file. It assumes that the variable fd refers to a file
that is open for output. The error checking that should be present in a
real application has been omitted for simplicity.
int errorcode;
uint8_t *bytes;
PCRE2_SIZE erroroffset;
PCRE2_SIZE bytescount;
pcre2_code *list_of_codes[2];
list_of_codes[0] = pcre2_compile("first pattern",
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
list_of_codes[1] = pcre2_compile("second pattern",
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
&bytescount, NULL);
errorcode = fwrite(bytes, 1, bytescount, fd);
Note that the serialized data is binary data that may contain any of
the 256 possible byte values. On systems that make a distinction
between binary and non-binary data, be sure that the file is opened for
binary output.
Serializing a set of patterns leaves the original data untouched, so
they can still be used for matching. Their memory must eventually be
freed in the usual way by calling pcre2_code_free(). When you have fin‐
ished with the byte stream, it too must be freed by calling pcre2_seri‐
alize_free().
RE-USING PRECOMPILED PATTERNS
In order to re-use a set of saved patterns you must first make the
serialized byte stream available in main memory (for example, by read‐
ing from a file). The management of this memory block is up to the
application. You can use the pcre2_serialize_get_number_of_codes()
function to find out how many compiled patterns are in the serialized
data without actually decoding the patterns:
uint8_t *bytes = <serialized data>;
int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
The pcre2_serialize_decode() function reads a byte stream and recreates
the compiled patterns in new memory blocks, setting pointers to them in
a vector. The first two arguments are a pointer to a suitable vector
and its length, and the third argument points to a byte stream. The
final argument is a pointer to a general context, which can be used to
specify custom memory mangagement functions for the decoded patterns.
If this argument is NULL, malloc() and free() are used. After deserial‐
ization, the byte stream is no longer needed and can be discarded.
int32_t number_of_codes;
pcre2_code *list_of_codes[2];
uint8_t *bytes = <serialized data>;
int32_t number_of_codes =
pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
If the vector is not large enough for all the patterns in the byte
stream, it is filled with those that fit, and the remainder are
ignored. The yield of the function is the number of decoded patterns,
or one of the following negative error codes:
PCRE2_ERROR_BADDATA second argument is zero or less
PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE2 version
PCRE2_ERROR_MEMORY memory allocation failed
PCRE2_ERROR_NULL first or third argument is NULL
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was
compiled on a system with different endianness.
Decoded patterns can be used for matching in the usual way, and must be
freed by calling pcre2_code_free() as normal. A single copy of the
character tables is used by all the decoded patterns. A reference count
is used to arrange for its memory to be automatically freed when the
last pattern is freed.
If a pattern was processed by pcre2_jit_compile() before being serial‐
ized, the JIT data is discarded and so is no longer available after a
save/restore cycle. You can, however, process a restored pattern with
pcre2_jit_compile() if you wish.
AUTHOR
Philip Hazel
University Computing Service
Cambridge, England.
REVISION
Last updated: 20 January 2015
Copyright (c) 1997-2015 University of Cambridge.
PCRE2 10.10 20 January 2015 PCRE2SERIALIZE(3)