phmmer man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

phmmer(1)			 HMMER Manual			     phmmer(1)

NAME
       phmmer - search protein sequence(s) against a protein sequence database

SYNOPSIS
       phmmer [options] <seqfile> <seqdb>

DESCRIPTION
       phmmer  is used to search one or more query protein sequences against a
       protein sequence database.  For each query sequence in  <seqfile>,  use
       that  sequence  to  search the target database of sequences in <seqdb>,
       and output ranked lists of the  sequences  with	the  most  significant
       matches to the query.

       The  output  format  is	designed to be human-readable, but is often so
       voluminous that reading it is impractical, and parsing it  is  a	 pain.
       The --tblout and --domtblout options save output in simple tabular for‐
       mats that are concise and easier to parse.  The -o option allows	 redi‐
       recting the main output, including throwing it away in /dev/null.

OPTIONS
       -h     Help;  print  a  brief  reminder	of  command line usage and all
	      available options.

OPTIONS FOR CONTROLLING OUTPUT
       -o <f> Direct the main human-readable output to a file <f>  instead  of
	      the default stdout.

       -A <f> Save  a multiple alignment of all significant hits (those satis‐
	      fying inclusion thresholds) to the file <f> in Stockholm format.

       --tblout <f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-target  output,  with	 one  data  line per homologous target
	      sequence found.

       --domtblout <f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-domain  output,  with	 one  data  line per homologous domain
	      detected in a query sequence for each homologous model.

       --acc  Use accessions instead of names in the main output, where avail‐
	      able for profiles and/or sequences.

       --noali
	      Omit  the	 alignment  section  from  the	main  output. This can
	      greatly reduce the output volume.

       --notextw
	      Unlimit the length of each line in the main output. The  default
	      is a limit of 120 characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw <n>
	      Set  the	main  output's line length limit to <n> characters per
	      line. The default is 120.

OPTIONS CONTROLLING SCORING SYSTEM
       The probability model in phmmer is  constructed	by  inferring  residue
       probabilities from a standard 20x20 substitution score matrix, plus two
       additional parameters for position-independent gap open and gap	extend
       probabilities.

       --popen <x>
	      Set  the	gap open probability for a single sequence query model
	      to <x>.  The default is 0.02.  <x> must be >= 0 and < 0.5.

       --pextend <x>
	      Set the gap extend probability for a single sequence query model
	      to <x>.  The default is 0.4.  <x> must be >= 0 and < 1.0.

       --mxfile <mxfile>
	      Obtain  residue  alignment  probabilities	 from the substitution
	      matrix in file <mxfile>.	The default score matrix  is  BLOSUM62
	      (this matrix is internal to HMMER and does not have to be avail‐
	      able as a file).	The format of a substitution  matrix  <mxfile>
	      is  the  standard	 format	 accepted  by  BLAST, FASTA, and other
	      sequence analysis software.

OPTIONS CONTROLLING REPORTING THRESHOLDS
       Reporting thresholds control which hits are reported  in	 output	 files
       (the main output, --tblout, and --domtblout).  Sequence hits and domain
       hits are ranked by statistical significance  (E-value)  and  output  is
       generated  in  two sections called per-target and per-domain output. In
       per-target output, by default, all sequence hits with an E-value <=  10
       are reported. In the per-domain output, for each target that has passed
       per-target reporting  thresholds,  all  domains	satisfying  per-domain
       reporting  thresholds  are reported. By default, these are domains with
       conditional E-values of <= 10.  The  following  options	allow  you  to
       change  the  default  E-value reporting thresholds, or to use bit score
       thresholds instead.

       -E <x> In the per-target output, report target  sequences  with	an  E-
	      value  of <= <x>.	 The default is 10.0, meaning that on average,
	      about 10 false positives will be reported per query, so you  can
	      see  the top of the noise and decide for yourself if it's really
	      noise.

       -T <x> Instead of thresholding per-profile output on  E-value,  instead
	      report target sequences with a bit score of >= <x>.

       --domE <x>
	      In the per-domain output, for target sequences that have already
	      satisfied the per-profile reporting threshold, report individual
	      domains  with  a	conditional E-value of <= <x>.	The default is
	      10.0.  A conditional E-value means the expected number of	 addi‐
	      tional  false  positive  domains	in the smaller search space of
	      those comparisons that already satisfied the per-target  report‐
	      ing threshold (and thus must have at least one homologous domain
	      already).

       --domT <x>
	      Instead of thresholding per-domain output	 on  E-value,  instead
	      report domains with a bit score of >= <x>.

OPTIONS CONTROLLING INCLUSION THRESHOLDS
       Inclusion  thresholds are stricter than reporting thresholds. They con‐
       trol which hits are included in any output multiple alignment  (the  -A
       option) and which domains are marked as significant ("!") as opposed to
       questionable ("?")  in domain output.

       --incE <x>
	      Use an E-value of <= <x> as the per-target inclusion  threshold.
	      The default is 0.01, meaning that on average, about 1 false pos‐
	      itive would be expected in every	100  searches  with  different
	      query sequences.

       --incT <x>
	      Instead  of  using E-values for setting the inclusion threshold,
	      instead use a bit score of >= <x> as  the	 per-target  inclusion
	      threshold.  By default this option is unset.

       --incdomE <x>
	      Use  a conditional E-value of <= <x> as the per-domain inclusion
	      threshold, in targets that have already  satisfied  the  overall
	      per-target inclusion threshold.  The default is 0.01.

       --incdomT <x>
	      Instead of using E-values, use a bit score of >= <x> as the per-
	      domain inclusion threshold.  By default this option is unset.

OPTIONS CONTROLLING THE ACCELERATION PIPELINE
       HMMER3 searches are accelerated in a three-step	filter	pipeline:  the
       MSV  filter, the Viterbi filter, and the Forward filter. The first fil‐
       ter is the fastest and most approximate; the last is the	 full  Forward
       scoring algorithm, slowest but most accurate. There is also a bias fil‐
       ter step between MSV and Viterbi. Targets that pass all	the  steps  in
       the  acceleration  pipeline  are	 then  subjected  to postprocessing --
       domain identification and scoring using the Forward/Backward algorithm.

       Essentially the only free parameters  that  control  HMMER's  heuristic
       filters are the P-value thresholds controlling the expected fraction of
       nonhomologous sequences that pass  the  filters.	 Setting  the  default
       thresholds  higher  will	 pass  a  higher  proportion  of nonhomologous
       sequence, increasing sensitivity at the expense of  speed;  conversely,
       setting	lower  P-value	thresholds  will  pass	a  smaller proportion,
       decreasing sensitivity and increasing speed. Setting a filter's P-value
       threshold  to  1.0 means it will passing all sequences, and effectively
       disables the filter.

       Changing filter thresholds only removes or includes targets  from  con‐
       sideration;  changing  filter  thresholds does not alter bit scores, E-
       values, or alignments, all of which are determined solely  in  postpro‐
       cessing.

       --max  Maximum  sensitivity.   Turn off all filters, including the bias
	      filter, and run full Forward/Backward  postprocessing  on	 every
	      target.  This increases sensitivity slightly, at a large cost in
	      speed.

       --F1 <x>
	      First filter threshold; set the P-value threshold	 for  the  MSV
	      filter  step.   The  default is 0.02, meaning that roughly 2% of
	      the highest scoring nonhomologous targets are expected  to  pass
	      the filter.

       --F2 <x>
	      Second  filter  threshold;  set  the  P-value  threshold for the
	      Viterbi filter step.  The default is 0.001.

       --F3 <x>
	      Third filter threshold; set the P-value threshold for  the  For‐
	      ward filter step.	 The default is 1e-5.

       --nobias
	      Turn  off	 the bias filter. This increases sensitivity somewhat,
	      but can come at a high cost in speed, especially	if  the	 query
	      has  biased  residue  composition (such as a repetitive sequence
	      region, or if it is a membrane protein  with  large  regions  of
	      hydrophobicity). Without the bias filter, too many sequences may
	      pass the filter with biased  queries,  leading  to  slower  than
	      expected	performance  as	 the  computationally  intensive  For‐
	      ward/Backward algorithms shoulder an abnormally heavy load.

OPTIONS CONTROLLING E-VALUE CALIBRATION
       Estimating the location parameters for the expected score distributions
       for  MSV	 filter	 scores,  Viterbi  filter  scores,  and Forward scores
       requires three short random sequence simulations.

       --EmL <n>
	      Sets the sequence length in simulation that estimates the	 loca‐
	      tion parameter mu for MSV filter E-values. Default is 200.

       --EmN <n>
	      Sets  the	 number	 of sequences in simulation that estimates the
	      location parameter mu for MSV filter E-values. Default is 200.

       --EvL <n>
	      Sets the sequence length in simulation that estimates the	 loca‐
	      tion parameter mu for Viterbi filter E-values. Default is 200.

       --EvN <n>
	      Sets  the	 number	 of sequences in simulation that estimates the
	      location parameter mu for Viterbi filter	E-values.  Default  is
	      200.

       --EfL <n>
	      Sets  the sequence length in simulation that estimates the loca‐
	      tion parameter tau for Forward E-values. Default is 100.

       --EfN <n>
	      Sets the number of sequences in simulation  that	estimates  the
	      location parameter tau for Forward E-values. Default is 200.

       --Eft <x>
	      Sets  the tail mass fraction to fit in the simulation that esti‐
	      mates the location parameter tau for Forward evalues. Default is
	      0.04.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z <x> Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-sequence	E-value	 calculations,	rather
	      than the actual number of targets seen.

       --domZ <x>
	      Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-domain conditional E-value calculations,
	      rather  than  the	 number	 of  targets that passed the reporting
	      thresholds.

       --seed <n>
	      Seed the random number generator with <n>, an integer >= 0.   If
	      <n>  is >0, any stochastic simulations will be reproducible; the
	      same command will give the same results.	If <n> is 0, the  ran‐
	      dom number generator is seeded arbitrarily, and stochastic simu‐
	      lations will vary from run to run	 of  the  same	command.   The
	      default seed is 42.

       --qformat <s>
	      Declare  that  the  input	 <seqfile> is in format <s>.  Accepted
	      formats include fasta, embl, genbank, ddbj, uniprot,  stockholm,
	      pfam,  a2m, and afa.  The default is to autodetect the format of
	      the file.

       --tformat <s>
	      Declare that the input <seqdb> is in format <s>.	Accepted  for‐
	      mats  include  fasta,  embl,  genbank, ddbj, uniprot, stockholm,
	      pfam, a2m, and afa.  The default is to autodetect the format  of
	      the file.

       --cpu <n>
	      Set  the	number of parallel worker threads to <n>.  By default,
	      HMMER sets this to the number of CPU cores it  detects  in  your
	      machine  -  that is, it tries to maximize the use of your avail‐
	      able processor cores. Setting <n>	 higher	 than  the  number  of
	      available	 cores	is of little if any value, but you may want to
	      set it to something less. You can also control  this  number  by
	      setting an environment variable, HMMER_NCPU.

	      This  option  is only available if HMMER was compiled with POSIX
	      threads support. This is the  default,  but  it  may  have  been
	      turned  off  at  compile-time  for your site or machine for some
	      reason.

	      --stall For debugging the MPI master/worker version: pause after
	      start,  to  enable the developer to attach debuggers to the run‐
	      ning master and worker(s)	 processes.  Send  SIGCONT  signal  to
	      release  the  pause.   (Under  gdb:  (gdb) signal SIGCONT) (Only
	      available if optional MPI support was enabled at compile-time.)

       --mpi  Run in MPI master/worker mode, using mpirun.  (Only available if
	      optional MPI support was enabled at compile-time.)

SEE ALSO
       See  hmmer(1)  for  a master man page with a list of all the individual
       man pages for programs in the HMMER package.

       For complete documentation, see the user	 guide	that  came  with  your
       HMMER   distribution   (Userguide.pdf);	or  see	 the  HMMER  web  page
       (@HMMER_URL@).

COPYRIGHT
       @HMMER_COPYRIGHT@
       @HMMER_LICENSE@

       For additional information on copyright and  licensing,	see  the  file
       called  COPYRIGHT  in  your HMMER source distribution, or see the HMMER
       web page (@HMMER_URL@).

AUTHOR
       Eddy/Rivas Laboratory
       Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147 USA
       http://eddylab.org

HMMER @HMMER_VERSION@		 @HMMER_DATE@			     phmmer(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net