GREPMAIL(1) User Contributed Perl Documentation GREPMAIL(1)NAMEgrepmail - search mailboxes for mail matching a regular expression
SYNOPSISgrepmail [--help|--version] [-abBDFhHilLmrRuvVw] [-C <cache-file>]
[-j <status>] [-s <sizespec>] [-d <date-specification>]
[-X <signature-pattern>] [-Y <header-pattern>]
[[-e] <pattern>|-E <expr>|-f <pattern-file>] <files...>
DESCRIPTIONgrepmail looks for mail messages containing a pattern, and prints the
resulting messages on standard out.
By default grepmail looks in both header and body for the specified
pattern.
When redirected to a file, the result is another mailbox, which can,
in turn, be handled by standard User Agents, such as elm, or even
used as input for another instance of grepmail.
At least one of -E, -e, -d, -s, or -u must be specified. The pattern
is optional if -d, -s, and/or -u is used. The -e flag is optional if
there is no file whose name is the pattern. The -E option can be used
to specify complex search expressions involving logical operators.
(See below.)
If a mailbox can not be found, grepmail first searches the directory
specified by the MAILDIR environment variable (if one is defined),
then searches the $HOME/mail, $HOME/Mail, and $HOME/Mailbox
directories.
OPTIONS AND ARGUMENTS
Many of the options and arguments are analogous to those of grep.
pattern
The pattern to search for in the mail message. May be any Perl
regular expression, but should be quoted on the command line to
protect against globbing (shell expansion). To search for more than
one pattern, use the form "(pattern1|pattern2|...)".
Note that complex pattern features such as "(?>...)" require that you
use a version of perl which supports them. You can use the pattern
"()" to indicate that you do not want to match anything. This is
useful if you want to initialize the cache without printing any
output.
mailbox
Mailboxes must be traditional, UNIX "/bin/mail" mailbox format. The
mailboxes may be compressed by gzip, or bzip2, in which case gunzip,
or bzip2 must be installed on the system.
If no mailbox is specified, takes input from stdin, which can be
compressed or not. grepmail's behavior is undefined when ASCII and
binary data is piped together as input.
-a
Use arrival date instead of sent date.
-b
Asserts that the pattern must match in the body of the email.
-B
Print the body but with only minimal ('From ', 'From:', 'Subject:',
'Date:') headers. This flag can be used with -H, in which case it
will print only short headers and no email bodies.
-C
Specifies the location of the cache file. The default is
$HOME/.grepmail-cache.
-D
Enable debug mode, which prints diagnostic messages.
-d
Date specifications must be of the form of:
- a date like "today", "yesterday", "5/18/93", "5 days ago", "5
weeks ago",
- OR "before", "after", or "since", followed by a date as defined
above,
- OR "between <date> and <date>", where <date> is defined as above.
Simple date expressions will first be parsed by Date::Parse. If this
fails, grepmail will attempt to parse the date with Date::Manip, if
the module is installed on the system. Use an empty pattern (i.e. -d
"") to find emails without a "Date: ..." line in the header.
Date specifications without times are interpreted as having a time of
midnight of that day (which is the morning), except for "after" and
"since" specifications, which are interpreted as midnight of the
following day. For example, "between today and tomorrow" is the same
as simply "today", and returns emails whose date has the current day.
("now" is interpreted as "today".) The date specification "after July
5th" will return emails whose date is midnight July 6th or later.
-E
Specify a complex search expression using logical operators. The
current syntax allows the user to specify search expressions using
Perl syntax. Three values can be used: $email (the entire email
message), $email_header (just the header), or $email_body (just the
body). A search is specified in the form "$email =~ /pattern/", and
multiple searches can be combined using "&&" and "||" for "and" and
"or".
For example, the expression
$email_header =~ /^From: .*\@coppit.org/ && $email =~ /grepmail/i
will find all emails which originate from coppit.org (you must escape
the "@" sign with a backslash), and which contain the keyword
"grepmail" anywhere in the message, in any capitalization.
-E is incompatible with -b, -h, and -e. -i, -M, -S, and -Y have not
yet been implemented.
NOTE: The syntax of search expressions may change in the future. In
particular, support for size, date, and other constraints may be
added. The syntax may also be simplified in order to make expression
formation easier to use (and perhaps at the expense of reduced
functionality).
-e
Explicitly specify the search pattern. This is useful for specifying
patterns that begin with "-", which would otherwise be interpreted as
a flag.
-f
Obtain patterns from FILE, one per line. The empty file contains
zero patterns, and therefore matches nothing.
-F
Force grepmail to process all files and streams as though they were
mailboxes. (i.e. Skip checks for non-mailbox ASCII files or binary
files that don't look like they are compressed using known schemes.)
-h
Asserts that the pattern must match in the header of the email.
-H
Print the header but not body of matching emails.
-i
Make the search case-insensitive (by analogy to grep -i).
-j
Asserts that the email "Status:" header must contain the given flags.
Order and case are not important, so use -j AR or -j ra to search for
emails which have been read and answered.
-l
Output the names of files having an email matching the expression,
(by analogy to grep -l).
-L
Follow symbolic links. (Implies -R)
-M
Causes grepmail to ignore non-text MIME attachments. This removes
false positives resulting from binaries encoded as ASCII attachments.
-m
Append "X-Mailfolder: <folder>" to all email headers, indicating
which folder contained the matched email.
-n
Prefix each line with line number information. If multiple files are
specified, the filename will precede the line number. NOTE: When used
in conjunction with -m, the X-Mailfolder header has the same line
number as the next (blank) line.
-q
Quiet mode. Suppress the output of warning messages about non-mailbox
files, directories, etc.
-r
Generate a report of the names of the files containing emails
matching the expression, along with a count of the number of matching
emails.
-R
Causes grepmail to recurse any directories encountered.
-s
Return emails which match the size (in bytes) specified with this
flag. Note that this size includes the length of the header.
Size constraints must be of the form of:
- 12345: match size of exactly 12345
- <12345, <=12345, >12345, >=12345: match size less than, less than
or equal,
greater than, or greater than or equal to 12345
- 10000-12345: match size between 10000 and 12345 inclusive
-S
Ignore signatures. The signature consists of everything after a line
consisting of "-- ".
-u
Output only unique emails, by analogy to sort -u. Grepmail determines
email uniqueness by the Message-ID header.
-v
Invert the sense of the search, by analogy to grep -v. This results
in the set of emails printed being the complement of those that would
be printed without the -v switch.
-V
Print the version and exit.
-w
Search for only those lines which contain the pattern as part of a
word group. That is, the start of the pattern must match the start
of a word, and the end of the pattern must match the end of a word.
(Note that the start and end need not be for the same word.)
If you are familiar with Perl regular expressions, this flag simply
puts a "\b" before and after the search pattern.
-X
Specify a regular expression for the signature separator. By default
this pattern is '^-- $'.
-Y
Specify a pattern which indicates specific headers to be searched.
The search will automatically treat headers which span multiple lines
as one long line. This flag implies -h.
In the style of procmail, special strings in the pattern will be
expanded as follows:
If the regular expression contains "^TO:" it will be substituted by
^((Original-)?(Resent-)?(To|Cc|Bcc)|(X-Envelope|Apparently(-Resent)?)-To):
which should match all headers with destination addresses.
If the regular expression contains "^FROM_DAEMON:" it will be
substituted by
(^(Mailing-List:|Precedence:.*(junk|bulk|list)|To: Multiple recipients of |(((Resent-)?(From|Sender)|X-Envelope-From):|>?From )([^>]*[^(.%@a-z0-9])?(Post(ma?(st(e?r)?|n)|office)|(send)?Mail(er)?|daemon|m(mdf|ajordomo)|n?uucp|LIST(SERV|proc)|NETSERV|o(wner|ps)|r(e(quest|sponse)|oot)|b(ounce|bs\.smtp)|echo|mirror|s(erv(ices?|er)|mtp(error)?|ystem)|A(dmin(istrator)?|MMGR|utoanswer))(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?
which should catch mails coming from most daemons.
If the regular expression contains "^FROM_MAILER:" it will be
substituted by
(^(((Resent-)?(From|Sender)|X-Envelope-From):|>?From)([^>]*[^(.%@a-z0-9])?(Post(ma(st(er)?|n)|office)|(send)?Mail(er)?|daemon|mmdf|n?uucp|ops|r(esponse|oot)|(bbs\.)?smtp(error)?|s(erv(ices?|er)|ystem)|A(dmin(istrator)?|MMGR))(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t][^<)]*(\(.*\).*)?)?$([^>]|$))
(a stripped down version of "^FROM_DAEMON:"), which should catch
mails coming from most mailer-daemons.
So, to search for all emails to or from "Andy":
grepmail-Y '(^TO:|^From:)' Andy mailbox
--help
Print a help message summarizing the usage.
--
All arguments following -- are treated as mail folders.
EXAMPLES
Count the number of emails. ("." matches every email.)
grepmail-r . sent-mail
Get all email between 2000 and 3000 bytes about books
grepmail books -s 2000-3000 sent-mail
Get all email that you mailed yesterday
grepmail-d yesterday sent-mail
Get all email that you mailed before the first thursday in June 1998
that pertains to research (requires Date::Manip):
grepmail research -d "before 1st thursday in June 1992" sent-mail
Get all email that you mailed before the first of June 1998 that
pertains to research:
grepmail research -d "before 6/1/92" sent-mail
Get all email you received since 8/20/98 that wasn't about research or
your job, ignoring case:
grepmail-iv "(research|job)" -d "since 8/20/98" saved-mail
Get all email about mime but not about Netscape. Constrain the search
to match the body, since most headers contain the text "mime":
grepmail-b mime saved-mail | grepmail Netscape -v
Print a list of all mailboxes containing a message from Rodney.
Constrain the search to the headers, since quoted emails may match the
pattern:
grepmail-hl "^From.*Rodney" saved-mail*
Find all emails with the text "Pilot" in both the header and the body:
grepmail-hb "Pilot" saved-mail*
Print a count of the number of messages about grepmail in all saved-
mail mailboxes:
grepmail-brgrepmail saved-mail*
Remove any duplicates from a mailbox:
grepmail-u saved-mail
Convert a Gnus mailbox to mbox format:
grepmail . gnus-mailbox-dir/* > mbox
Search for all emails to or from an address (taking into account
wrapped headers and different header names):
grepmail-Y '(^TO:|^From:)' my@email.address saved-mail
Find all emails from postmasters:
grepmail-Y '^FROM_MAILER:' . saved-mail
FILESgrepmail will not create temporary files while decompressing compressed
archives. The last version to do this was 3.5. While the new design
uses more memory, the code is much simpler, and there is less chance
that email can be read by malicious third parties. Memory usage is
determined by the size of the largest email message in the mailbox.
ENVIRONMENT
The MAILDIR environment variable can be used to specify the default
mail directory. This directory will be searched if the specified
mailbox can not be found directly.
The HOME environment variable is also used to find mailboxes if they
can not be found directly. It is also used to store grepmail state
information such as its cache file.
BUGS AND LIMITATIONS
Patterns containing "$" may cause problems
Currently I look for "$" followed by a non-word character and replace
it with the line ending for the current file (either "\n" or "\r\n").
This may cause problems with complex patterns specified with -E, but
I'm not aware of any.
Mails without bodies cause problems
According to RFC 822, mail messages need not have message bodies.
I've found and removed one bug related to this. I'm not sure if there
are others.
Complex single-point dates not parsed correctly
If you specify a point date like "September 1, 2004", grepmail
creates a date range that includes the entire day of September 1,
2004. If you specify a complex point date such as "today", "1st
Monday in July", or "9/1/2004 at 0:00" grepmail may parse the time
incorrectly.
The reason for this problem is that Date::Manip, as of version 5.42,
forces default values for parsed dates and times. This means that
grepmail has a hard time determining whether the user supplied
certain time/date fields. (e.g. Did Date::Manip provide a default
time of 0:00, or did the user specify it?) grepmail tries to work
around this problem, but the workaround is inherently incomplete in
some rare cases.
File names that look like flags cause problems.
In some special circumstances, grepmail will be confused by files
whose names look like flags. In such cases, use the -e flag to
specify the search pattern.
AUTHOR
David Coppit, <david@coppit.org>, http://coppit.org/
SEE ALSOelm(1), mail(1), grep(1), perl(1), printmail(1), Mail::Internet(3),
procmailrc(5). Crocker, D. H., Standard for the Format of Arpa
Internet Text Messages, RFC 822.
perl v5.20.2 2007-03-01 GREPMAIL(1)