btree(3)btree(3)NAMEbtree - btree database access method
SYNOPSIS
#include <sys/types.h> #include <db.h>
DESCRIPTION
The routine dbopen() is the library interface to database files. One of
the supported file formats is btree files. The general description of
the database access methods is in dbopen(3), this reference page
describes only the btree-specific information.
The btree data structure is a sorted, balanced tree structure storing
associated key/data pairs.
The btree access method specific data structure provided to dbopen() is
defined in the <db.h> include file as follows:
typedef struct { u_long flags; u_int cachesize; int maxkeypage; int
minkeypage; u_int psize; int (*compare)(const DBT *key1, const DBT
*key2); size_t (*prefix)(const DBT *key1, const DBT *key2); int lorder;
} BTREEINFO;
The elements of this structure are as follows: The flags value is spec‐
ified by ORing any of the following values: Permits duplicate keys in
the tree; that is, it permits insertion if the key to be inserted
already exists in the tree. The default behavior, as described in
dbopen(3), is to overwrite a matching key when inserting a new key or
to fail if the R_NOOVERWRITE flag is specified. The R_DUP flag is over‐
ridden by the R_NOOVERWRITE flag, and if the R_NOOVERWRITE flag is
specified, attempts to insert duplicate keys into the tree will fail.
If the get() routine is used and the database contains duplicate
keys, the order of retrieval of key/data pairs is undefined;
however, seq() routine calls with the R_CURSOR flag set will
always return the logical ``first'' of any group of duplicate
keys. A suggested maximum size, in bytes, of the memory cache.
This value is only advisory, and the access method will allocate
more memory rather than fail. Caching the most recently used
pages substantially improves access time because every search
examines the root page of the tree. In addition, a moderate
cache can reduce the number of I/O operations significantly
because physical writes are delayed as long as possible. Obvi‐
ously, using a cache increases (but only increases) the likeli‐
hood of corruption or lost data if the system crashes while a
tree is being modified. If cachesize is 0 (no size is speci‐
fied), a default cache is used. The maximum number of keys
which will be stored on any single page. Not currently imple‐
mented. The minimum number of keys that will be stored on any
single page. This value is used to determine which keys will be
stored on overflow pages, that is, if a key or data item is
longer than the pagesize divided by the minkeypage value, it
will be stored on overflow pages instead of in the page itself.
If minkeypage is 0 (no minimum number of keys is specified), a
value of 2 is used. Page size is the size (in bytes) of the
pages used for nodes in the tree. The minimum page size is 512
bytes and the maximum page size is 64K. If psize is 0 (no page
size is specified), a page size is chosen based on the underly‐
ing file system I/O block size. Compare is the key comparison
function. It must return an integer less than, equal to, or
greater than zero if the first key argument is considered to be
respectively less than, equal to, or greater than the second key
argument. The same comparison function must be used on a given
tree every time it is opened. If compare is NULL (no comparison
function is specified), the keys are compared lexically, with
shorter keys considered less than longer keys. Prefix is the
prefix comparison function. If specified, this routine must
return the number of bytes of the second key argument that are
necessary to determine whether it is greater than the first key
argument. If the keys are equal, the key length should be
returned. Note that the usefulness of this routine is very data
dependent, but, in some data sets, it can produce significantly
reduced tree sizes and search times. If prefix is NULL (no pre‐
fix function is specified) and no comparison function is speci‐
fied, a default lexical comparison routine is used. If prefix is
NULL and a comparison routine is specified, no prefix comparison
is done. The byte order for integers in the stored database
metadata. The number should represent the order as an integer;
for example, big endian order would be the number 4,321. If
lorder is 0 (no order is specified), the current host order is
used.
If the file already exists (and the O_TRUNC flag is not specified), the
values specified for the parameters flags, lorder, and psize are
ignored in favor of the values used when the tree was created.
Forward sequential scans of a tree are from the least key to the great‐
est.
Space freed up by deleting key/data pairs from the tree is never
reclaimed, although it is normally made available for reuse. This means
that the btree storage structure is grow-only. The only solutions are
to avoid excessive deletions or to create a fresh tree periodically
from a scan of an existing one.
Searches, insertions, and deletions in a btree will all complete in O
lg base N, where base is the average fill factor. Often, inserting
ordered data into btrees results in a low fill factor. This implementa‐
tion has been modified to make ordered insertion the best case, result‐
ing in a much better than normal page fill factor.
RESTRICTIONS
Only big and little endian byte order is supported.
ERRORS
The btree access method routines may fail and set errno for any of the
errors specified for the library routine dbopen(3).
SEE ALSO
Functions: dbopen(3), hash(3), mpool(3), recno(3)
The Ubiquitous B-tree, Douglas Comer, ACM Comput. Surv. 11, 2 (June
1979), 121-138
Prefix B-trees, Bayer and Unterauer, ACM Transactions on Database Sys‐
tems, Vol. 2, 1 (March 1977), 11-26
The Art of Computer Programming Vol. 3: Sorting and Searching, D.E.
Knuth, 1968, pp 471-480
btree(3)