pdfgrep man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

PDFGREP(1)			Pdfgrep Manual			    PDFGREP(1)

NAME
       pdfgrep - search pdf files for a regular expression

SYNOPSIS
       pdfgrep [OPTION...] PATTERN [FILE...]

DESCRIPTION
       Search for PATTERN in each FILE. PATTERN is an extended regular
       expression.

       pdfgrep works much like grep, with one distinction: It operates on
       pages and not on lines.

OPTIONS
       -i, --ignore-case
	   Ignore case distinctions in both the PATTERN and the input files.

       -F, --fixed-strings
	   Interpret PATTERN as a list of fixed strings separated by newlines,
	   any of which is to be matched.

       -P, --perl-regexp
	   Interpret PATTERN as a Perl compatible regular expression (PCRE).
	   See pcresyntax(3) for a quick overview.

       -H, --with-filename
	   Print the file name for each match. This is the default setting
	   when there is more than one file to search.

       -h, --no-filename
	   Suppress the prefixing of file name on output. This is the default
	   setting when there is only one file to search.

       -n, --page-number
	   Prefix each match with the number of the page where it was found.

       -c, --count
	   Suppress normal output. Instead print the number of matches for
	   each input file. Note that unlike grep, multiple matches on the
	   same page will be counted individually.

       -p, --page-count
	   Like -c, but prints the number of matches per page.

       -C, --context NUM
	   Print at most INUM characters of context around each match. The
	   exact number will vary, because pdfgrep tries to respect word
	   boundaries. If NUM is "line", the whole line will be printed. If
	   this option is not set, pdfgrep tries to print lines that are not
	   longer than the terminal width.

       --color WHEN
	   Surround file names, page numbers and matched text with escape
	   sequences to display them in color on the terminal. (The default
	   setting is auto).  WHEN can be:

	   always
	       Always use colors, even when stdout is not a terminal.

	   never
	       Do not use colors.

	   auto
	       Use colors only when stdout is a terminal.

       -o, --only-matching
	   Print only the matched part of a line without any surrounding
	   context.

       -r, --recursive
	   Recursively search all files (restricted by --include and
	   --exclude) under each directory, following symlinks only if they
	   are on the command line.

       -R, --dereference-recursive
	   Same as -r, but follows all symlinks.

       --exclude=GLOB
	   Skip files whose base name matches GLOB. See glob(7) for wildcards
	   you can use. You can use this option multiple times to exclude more
	   patterns. It takes precedence over --include. Note, that in- and
	   excludes apply only to files found via --recursive and not to the
	   argument list.

       --include=GLOB
	   Only search files whose base name matches GLOB. See --exclude for
	   details. The default is *.pdf.

       --password=PASSWORD
	   Use PASSWORD to decrypt the PDF-files. Can be specified multiple
	   times; all passwords will be tried on all PDFs.  Note that this
	   password will show up in your command history and the output of
	   ps(1). So please do not use this if the security of PASSWORD is
	   important.

       -m, --max-count NUM
	   Stop reading a file after NUM matches. When the -c or --count
	   option is also used, pdfgrep does not output a count greater than
	   NUM.

       -Z, --null
	   Output a null byte (called NUL in ASCII and '\0' in C) instead of
	   the colon that usually separates a filename from the rest of the
	   line. This option makes the output unambiguous in the presence of
	   colons, spaces or newlines in the filename. It can be used in
	   conjunction with commands such as xargs -0 or perl -0.

       --match-prefix-separator SEP
	   Changes the colon used to separate filename, line number and text
	   in the output to SEP, which can be an arbitrary string. This is
	   useful when filenames contain colons, but only for interactive
	   usage. For scripting, --null should be used.

       --debug
	   Enable debug output.	 Note: Due to limitations of poppler before
	   version 0.30.0, some debug output is also printed without --debug
	   when using such a poppler version.

       --warn-empty
	   Print a warning to stderr if a PDF contains no searchable text.
	   This is the case for PDFs that consist only of images, for example
	   scanned documents.

       --unac
	   Remove accents and ligatures from both the search pattern and the
	   PDF documents. This is useful if you want to search for a word
	   containing "ae", but the PDF uses the single character "æ" instead.
	   See unac(3) and unaccent(1) for details.

	   This option is experimental and only available if pdfgrep is
	   compiled with unac support.

       -q, --quiet
	   Suppress all normal output to stdout. Errors will be printed and
	   the exit codes will be returned (see below).

       --help
	   Print a short summary of the options.

       -V, --version Show version information.

EXIT STATUS
       Normally, the exit status is 0 if at least one match is found, 1 if no
       match is found and 2 if an error occurred. But if the --quiet or -q
       option is used and a match was found, pdfgrep will return 0 regardless
       of errors.

ENVIRONMENT VARIABLES
       The behavior of pdfgrep is affected by the following environment
       variable.

       GREP_COLORS
	   Specifies the colors and other attributes used to highlight various
	   parts of the output. The syntax and values are like GREP_COLORS of
	   grep. See grep(1) for more details. Currently only the capabilities
	   mt, ms, mc, fn, ln and se are used by pdfgrep, where mt, ms and mc
	   have the same effect.

EXAMPLES
       Print the first ten lines matching pattern and print their page number

	       pdfgrep -n --max-count 10 pattern foo.pdf

       Search all .pdf files whose names begin with foo recursively in the
       current directory

	       pdfgrep -r --include "foo*.pdf" pattern

       Search all .pdf files that are smaller than 12M recursively in the
       current directory

	       find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern

	   Note that in contrast to the previous examples, this task could not
	   be solved with pdfgrep alone, but the Unix tools find(1) and
	   xargs(1) had to be used. That’s because pdfgrep itself doesn’t
	   include options to exclude files by their size. But as you see, it
	   doesn’t have to!

BUGS
   Reporting Bugs
       Bugs can either be reportet to the mailing list
       (pdfgrep-users@pdfgrep.org) or to the bugtracker on gitlab
       (https://gitlab.com/pdfgrep/pdfgrep/issues).

   Known Bugs
       pdfgrep prints a single line multiple times, if there is more than one
       match in that line. That doesn’t mirror to the behavior of grep.

       Also, the current context options don’t have the same semantics as the
       grep ones.

AUTHORS
       pdfgrep is maintained by Hans-Peter Deifel.

       See the AUTHORS file in the source for a full list of contributors.

SEE ALSO
       grep(1), pcre(3), regex(7)

       See pdfgrep’s website https://pdfgrep.org for more information,
       downloads, git repository and more.

Pdfgrep 1.4.1			  09/26/2015			    PDFGREP(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net