FILEPRUNE(1)FILEPRUNE(1)NAMEfileprune - prune a file set according to a given age distribution
SYNOPSISfileprune [-n|-N|-p] [-c count|-s size[k|m|g|t]|-a age[w|m|y]] [-e
base|-g standard deviation|-f] [-t a|m|c] [-FK] file ...
DESCRIPTION
Fileprune will delete files from the specified set targeting a given
distribution of the files within time as well as size, number, and age
constraints. Its main purpose is to keep a set of daily-created backup
files in manageable size, while still providing reasonable access to
older versions. Specifying a size, file number, or age constraint will
simply remove files starting from the oldest, until the constraint is
met. The distribution specification (exponential, Gaussian (normal),
or Fibonacci) provides finer control of the files to delete, allowing
the retention of recent copies and the increasingly aggressive pruning
of the older files. The retention schedule specifies the age intervals
for which files will be retained. As an example, an exponential reten‐
tion schedule for 10 files with a base of 2 will be
1 2 4 8 16 32 64 128 256 512 1024
The above schedule specifies that for the interval of 65 to 128 days
there should be (at least) one retained file (unless constraints and
options override this setting). Retention schedules are always calcu‐
lated and evaluated in integer days. By default fileprune will keep
the oldest file within each day interval allowing files to migrate from
one interval to the next as time goes by. It may also keep additional
files, if the complete file set satisfies the specified constraint.
The algorithm used for pruning does not assume that the files are uni‐
formly distributed; fileprune will successfully prune file collections
stored at irregular intervals.
OPTIONS-n Do not delete files; only print file names that would be
deleted.
-N Do not delete files; only print file names that would be
retained.
-p Do not process files. Print the specified schedule for count
elements.
-c count
Keep count files.
-s size
Keep files totaling size bytes. The size argument can be fol‐
lowed by a k, m, g, or t uppercase or lowercase suffix to
express quantities from kilobytes to terabytes.
-a age Keep files up to the specified age. The age argument can be
followed by a w, m, or y suffix to specify weeks, months, or
years.
-e base
Use an exponential distribution of the specified base b for
pruning. Each successive interval n will end at b sup n. As an
example, a base of 2 will retain 10 files in a period of 1024
days. To determine the exponent for keeping n files in a period
of d days use the formula exponent = e sup {ln d over n}
-g sd Use a Gaussian (normal) distribution with the given standard
deviation for the pruning schedule. The height of the curve
with a standard deviation of σ is given by the formula f(x) = 1
over { sqrt{2 pi } sigma } e sup {-x sup 2 over {2 sigma sup
2}} All intervals from a to b are calculated to have the same
int from a to b f(x) dx The standard deviation is specified in
day units; as a rule of a thumb the oldest file retained will
have an age of twice the standard deviation.
-f Use a Fibonacci distribution for the pruning schedule. The
Fibonacci sequence starts with 1, 1, and each subsequent term is
the sum of the two previous ones.
-t a|fP|c
For determining a file's age use its access, modification, or
creation time. By default the modification time is used.
-F Force file pruning even if the size or count constraint has not
been exceeded.
-K Keep files scheduled in each pruning interval, even if the size
or count constraint has been exceeded.
EXAMPLE
ssh remotehost tar cf - /datafiles >backup/`date +'%Y%m%d'`
fileprune-e 2 backup/*
Backup remotehost, storing the result in a file named with today's
timestamp (e.g. 20021219). Prune the files in the backup directory so
that each retained file's age will be double that of its immediately
younger neighbor.
fileprune-g 365 -c 30 *
Keep at most 30 files with their ages following a Gaussian (normal)
distribution with a standard deviation of one year.
fileprune-e 2 -s 5G *
Prune the specified files following an exponential schedule so that no
more than 5GB are occupied. More than one file may be left in an
interval, if the size constraint is met. Alternatively, some old
intervals may be emptied in order to satisfy the size constraint.
fileprune-F -e 2 -s 5G *
As above, but leave no more than one file in each scheduled interval.
fileprune-K -e 2 -s 5G *
As in the first example of the %g-constrained series, but leave exactly
one file in each interval, even if this will violate the size con‐
straint.
fileprune-a 1m -f
Delete all files older than one month use; use a Fibonacci distribution
for pruning the remaining ones.
SEE ALSOnewsyslog(8)AUTHOR
(C) Copyright 2002 Diomidis Spinellis.
BUGS
The Gaussian (normal) distribution is calculated by trying successive
increments of the normal function's distribution function. If the file
number or count is large compared to the specified standard deviation,
the calculation may take an exceedingly long time. To get results in a
reasonable time, day increments are bounded at 10 times the increment
of the previous interval and a total age of 100 years. It is advisable
to first calculate and print the pruning schedule with a command like
fileprune-g 100 -p -c 20
to ensure that the schedule can be calculated.
13 October 2003 FILEPRUNE(1)