jackhmmer man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

jackhmmer(1)			 HMMER Manual			  jackhmmer(1)

NAME
       jackhmmer - iteratively search sequence(s) against a protein database

SYNOPSIS
       jackhmmer [options] <seqfile> <seqdb>

DESCRIPTION
       jackhmmer iteratively searches each query sequence in <seqfile> against
       the target sequence(s) in <seqdb>.  The first iteration is identical to
       a  phmmer  search.  For the next iteration, a multiple alignment of the
       query together with all target sequences satisfying  inclusion  thresh‐
       olds  is assembled, a profile is constructed from this alignment (iden‐
       tical to using hmmbuild on the alignment), and profile  search  of  the
       <seqdb> is done (identical to an hmmsearch with the profile).

       The  output  format  is	designed to be human-readable, but is often so
       voluminous that reading it is impractical, and parsing it  is  a	 pain.
       The --tblout and --domtblout options save output in simple tabular for‐
       mats that are concise and easier to parse.  The -o option allows	 redi‐
       recting the main output, including throwing it away in /dev/null.

OPTIONS
       -h     Help;  print  a  brief  reminder	of  command line usage and all
	      available options.

       -N <n> Set the maximum number of iterations to <n>.  The default is  5.
	      If N=1, the result is equivalent to a phmmer search.

OPTIONS CONTROLLING OUTPUT
       By  default,  output for each iteration appears on stdout in a somewhat
       human readable, somewhat parseable format. These	 options  allow	 redi‐
       recting	that  output  or  saving  additional kinds of output to files,
       including checkpoint files for each iteration.

       -o <f> Direct the human-readable output to a file <f>.

       -A <f> After the final iteration, save an annotated multiple  alignment
	      of  all hits satisfying inclusion thresholds (also including the
	      original query) to <f> in Stockholm format.

       --tblout <f>
	      After the	 final	iteration,  save  a  tabular  summary  of  top
	      sequence	hits  to  <f> in a readily parseable, columnar, white‐
	      space-delimited format.

       --domtblout <f>
	      After the final iteration, save a tabular summary of top	domain
	      hits  to <f> in a readily parseable, columnar, whitespace-delim‐
	      ited format.

       --chkhmm <prefix>
	      At the start of each iteration, checkpoint the query HMM, saving
	      it  to  a file named <prefix>-<n>.hmm where <n> is the iteration
	      number (from 1..N).

       --chkali <prefix>
	      At the end of each iteration, checkpoint	an  alignment  of  all
	      domains  satisfying  inclusion thresholds (e.g. what will become
	      the query HMM for the next iteration), saving it to a file named
	      <checkpoint  file prefix>-<n>.sto in Stockholm format, where <n>
	      is the iteration number (from 1..N).

       --acc  Use accessions instead of names in the main output, where avail‐
	      able for profiles and/or sequences.

       --noali
	      Omit  the	 alignment  section  from  the	main  output. This can
	      greatly reduce the output volume.

       --notextw
	      Unlimit the length of each line in the main output. The  default
	      is a limit of 120 characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw <n>
	      Set  the	main  output's line length limit to <n> characters per
	      line. The default is 120.

OPTIONS CONTROLLING SINGLE SEQUENCE SCORING (FIRST ITERATION)
       By default, the first iteration uses a search model constructed from  a
       single query sequence. This model is constructed using a standard 20x20
       substitution matrix  for	 residue  probabilities,  and  two  additional
       parameters  for position-independent gap open and gap extend probabili‐
       ties. These options allow the default single-sequence  scoring  parame‐
       ters to be changed.

       --popen <x>
	      Set  the	gap open probability for a single sequence query model
	      to <x>.  The default is 0.02.  <x> must be >= 0 and < 0.5.

       --pextend <x>
	      Set the gap extend probability for a single sequence query model
	      to <x>.  The default is 0.4.  <x> must be >= 0 and < 1.0.

       --mxfile <mxfile>
	      Obtain  residue  alignment  probabilities	 from the substitution
	      matrix in file <mxfile>.	The default score matrix  is  BLOSUM62
	      (this matrix is internal to HMMER and does not have to be avail‐
	      able as a file).	The format of a substitution  matrix  <mxfile>
	      is  the  standard	 format	 accepted  by  BLAST, FASTA, and other
	      sequence analysis software.

OPTIONS CONTROLLING REPORTING THRESHOLDS
       Reporting thresholds control which hits are reported  in	 output	 files
       (the  main  output,  --tblout,  and  --domtblout).   In each iteration,
       sequence hits and domain hits are ranked	 by  statistical  significance
       (E-value) and output is generated in two sections called per-target and
       per-domain output. In per-target output, by default, all sequence  hits
       with  an E-value <= 10 are reported. In the per-domain output, for each
       target that has passed per-target  reporting  thresholds,  all  domains
       satisfying  per-domain  reporting  thresholds are reported. By default,
       these are domains with conditional E-values of  <=  10.	The  following
       options	allow  you to change the default E-value reporting thresholds,
       or to use bit score thresholds instead.

       -E <x> Report sequences with E-values <= <x>  in	 per-sequence  output.
	      The default is 10.0.

       -T <x> Use  a bit score threshold for per-sequence output instead of an
	      E-value  threshold  (any	setting	 of  -E	 is  ignored).	Report
	      sequences with a bit score of >= <x>.  By default this option is
	      unset.

       -Z <x> Declare the total size of the database to be <x> sequences,  for
	      purposes	of  E-value calculation.  Normally E-values are calcu‐
	      lated relative to the size of the database you actually searched
	      (e.g.  the  number  of sequences in target_seqdb). In some cases
	      (for instance, if you've split  your  target  sequence  database
	      into multiple files for parallelization of your search), you may
	      know better what the actual size of your search space is.

       --domE <x>
	      Report domains with conditional E-values <=  <x>	in  per-domain
	      output,  in  addition  to the top-scoring domain per significant
	      sequence hit. The default is 10.0.

       --domT <x>
	      Use a bit score threshold for per-domain output instead of an E-
	      value  threshold	(any  setting  of  --domT  is ignored). Report
	      domains with a bit score of >=  <x>  in  per-domain  output,  in
	      addition to the top-scoring domain per significant sequence hit.
	      By default this option is unset.

       --domZ <x>
	      Declare the number of significant sequences to be <x> sequences,
	      for  purposes  of conditional E-value calculation for additional
	      domain significance.  Normally conditional E-values  are	calcu‐
	      lated  relative  to the number of sequences passing per-sequence
	      reporting threshold.

OPTIONS CONTROLLING INCLUSION THRESHOLDS
       Inclusion thresholds control which hits are included  in	 the  multiple
       alignment  and  profile	constructed for the next search iteration.  By
       default, a sequence must have a per-sequence E-value of <=  0.001  (see
       -E option) to be included, and any additional domains in it besides the
       top-scoring one must have a conditional E-value of <= 0.001 (see --domE
       option).	 The  difference  between  reporting  thresholds and inclusion
       thresholds is that inclusion thresholds control which hits actually get
       used  in	 the next iteration (or the final output multiple alignment if
       the -A option is used), whereas reporting thresholds control  what  you
       see in output. Reporting thresholds are generally more loose so you can
       see borderline hits in the top of the noise that might be of interest.

       --incE <x>
	      Include sequences with E-values <= <x> in	 subsequent  iteration
	      or final alignment output by -A.	The default is 0.001.

       --incT <x>
	      Use  a bit score threshold for per-sequence inclusion instead of
	      an E-value threshold (any setting of --incE is ignored). Include
	      sequences with a bit score of >= <x>.  By default this option is
	      unset.

       --incdomE <x>
	      Include domains with conditional E-values <= <x>	in  subsequent
	      iteration	 or  final  alignment output by -A, in addition to the
	      top-scoring domain per significant sequence hit.	The default is
	      0.001.

       --incdomT <x>
	      Use a bit score threshold for per-domain inclusion instead of an
	      E-value threshold (any setting of --incT	is  ignored).  Include
	      domains  with  a bit score of >= <x>.  By default this option is
	      unset.

OPTIONS CONTROLLING ACCELERATION HEURISTICS
       HMMER3 searches are accelerated in a three-step	filter	pipeline:  the
       MSV  filter, the Viterbi filter, and the Forward filter. The first fil‐
       ter is the fastest and most approximate; the last is the	 full  Forward
       scoring algorithm, slowest but most accurate. There is also a bias fil‐
       ter step between MSV and Viterbi. Targets that pass all	the  steps  in
       the  acceleration  pipeline  are	 then  subjected  to postprocessing --
       domain identification and scoring using the Forward/Backward algorithm.

       Essentially the only free parameters  that  control  HMMER's  heuristic
       filters are the P-value thresholds controlling the expected fraction of
       nonhomologous sequences that pass  the  filters.	 Setting  the  default
       thresholds  higher  will	 pass  a  higher  proportion  of nonhomologous
       sequence, increasing sensitivity at the expense of  speed;  conversely,
       setting	lower  P-value	thresholds  will  pass	a  smaller proportion,
       decreasing sensitivity and increasing speed. Setting a filter's P-value
       threshold  to  1.0 means it will passing all sequences, and effectively
       disables the filter.

       Changing filter thresholds only removes or includes targets  from  con‐
       sideration;  changing  filter  thresholds does not alter bit scores, E-
       values, or alignments, all of which are determined solely  in  postpro‐
       cessing.

       --max  Maximum  sensitivity.   Turn off all filters, including the bias
	      filter, and run full Forward/Backward  postprocessing  on	 every
	      target.  This increases sensitivity slightly, at a large cost in
	      speed.

       --F1 <x>
	      First filter threshold; set the P-value threshold	 for  the  MSV
	      filter  step.   The  default is 0.02, meaning that roughly 2% of
	      the highest scoring nonhomologous targets are expected  to  pass
	      the filter.

       --F2 <x>
	      Second  filter  threshold;  set  the  P-value  threshold for the
	      Viterbi filter step.  The default is 0.001.

       --F3 <x>
	      Third filter threshold; set the P-value threshold for  the  For‐
	      ward filter step.	 The default is 1e-5.

       --nobias
	      Turn  off	 the bias filter. This increases sensitivity somewhat,
	      but can come at a high cost in speed, especially	if  the	 query
	      has  biased  residue  composition (such as a repetitive sequence
	      region, or if it is a membrane protein  with  large  regions  of
	      hydrophobicity). Without the bias filter, too many sequences may
	      pass the filter with biased  queries,  leading  to  slower  than
	      expected	performance  as	 the  computationally  intensive  For‐
	      ward/Backward algorithms shoulder an abnormally heavy load.

OPTIONS CONTROLLING PROFILE CONSTRUCTION (LATER ITERATIONS)
       These options control how consensus columns  are	 defined  in  multiple
       alignments   when  building  profiles.  By  default,  jackhmmer	always
       includes your original query sequence in the alignment result at	 every
       iteration,  and consensus positions are defined by that query sequence:
       that is, a default jackhmmer profile is always the same length as  your
       original query, at every iteration.

       --fast Define  consensus	 columns as those that have a fraction >= sym‐
	      frac of residues as opposed to gaps. (See below for  the	--sym‐
	      frac  option.) Although this is the default profile construction
	      option elsewhere (in hmmbuild, in particular), it may have unde‐
	      sirable  effects	in  jackhmmer,	because a profile could itera‐
	      tively walk in sequence space away  from	your  original	query,
	      leaving  few  or	no  consensus  columns	corresponding  to  its
	      residues.

       --hand Define consensus columns in next profile using reference annota‐
	      tion  to the multiple alignment.	jackhmmer propagates reference
	      annotation from the previous profile to the multiple  alignment,
	      and thence to the next profile. This is the default.

       --symfrac <x>
	      Define the residue fraction threshold necessary to define a con‐
	      sensus column when using the --fast option. The default is  0.5.
	      The  symbol  fraction  in each column is calculated after taking
	      relative sequence weighting into account, and ignoring gap char‐
	      acters  corresponding  to ends of sequence fragments (as opposed
	      to internal insertions/deletions).  Setting this	to  1.0	 means
	      that every alignment column will be assigned as consensus, which
	      may be useful in some cases. Setting it to 0.0 is	 a  bad	 idea,
	      because no columns will be assigned as consensus, and you'll get
	      a model of zero length.

       --fragthresh <x>
	      We only want to count terminal gaps as deletions if the  aligned
	      sequence	is  known  to  be full-length, not if it is a fragment
	      (for instance, because only part of  it  was  sequenced).	 HMMER
	      uses  a simple rule to infer fragments: if the sequence length L
	      is less than a fraction <x> times the mean  sequence  length  of
	      all the sequences in the alignment, then the sequence is handled
	      as a fragment. The default is 0.5.

OPTIONS CONTROLLING RELATIVE WEIGHTS
       Whenever a profile is built from a multiple alignment, HMMER uses an ad
       hoc   sequence	weighting  algorithm  to  downweight  closely  related
       sequences and upweight distantly related ones. This has the  effect  of
       making  models  less  biased by uneven phylogenetic representation. For
       example, two identical sequences would typically each receive half  the
       weight  that  one  sequence would (and this is why jackhmmer isn't con‐
       cerned about always including your  original  query  sequence  in  each
       iteration's alignment, even if it finds it again in the database you're
       searching). These options control which algorithm gets used.

       --wpb  Use  the	Henikoff  position-based  sequence  weighting	scheme
	      [Henikoff	 and  Henikoff, J. Mol. Biol. 243:574, 1994].  This is
	      the default.

       --wgsc Use the Gerstein/Sonnhammer/Chothia  weighting  algorithm	 [Ger‐
	      stein et al, J. Mol. Biol. 235:1067, 1994].

       --wblosum
	      Use  the	same clustering scheme that was used to weight data in
	      calculating BLOSUM subsitution matrices [Henikoff and  Henikoff,
	      Proc.  Natl.  Acad.  Sci	89:10915, 1992]. Sequences are single-
	      linkage clustered at an identity threshold  (default  0.62;  see
	      --wid)  and  within  each	 cluster of c sequences, each sequence
	      gets relative weight 1/c.

       --wnone
	      No relative weights. All sequences are assigned uniform weight.

       --wid <x>
	      Sets the identity threshold used	by  single-linkage  clustering
	      when  using --wblosum.  Invalid with any other weighting scheme.
	      Default is 0.62.

OPTIONS CONTROLLING EFFECTIVE SEQUENCE NUMBER
       After relative weights are determined, they are normalized to sum to  a
       total  effective	 sequence  number,  eff_nseq.	This number may be the
       actual number of sequences in the alignment, but it  is	almost	always
       smaller	than  that.   The  default  entropy  weighting method (--eent)
       reduces the effective sequence number to reduce the information content
       (relative entropy, or average expected score on true homologs) per con‐
       sensus position. The target relative entropy is controlled  by  a  two-
       parameter  function,  where  the two parameters are settable with --ere
       and --esigma.

       --eent Adjust effective sequence number to achieve a specific  relative
	      entropy per position (see --ere).	 This is the default.

       --eclust
	      Set  effective  sequence	number to the number of single-linkage
	      clusters at a specific identity  threshold  (see	--eid).	  This
	      option  is  not recommended; it's for experiments evaluating how
	      much better --eent is.

       --enone
	      Turn off effective sequence number determination	and  just  use
	      the  actual number of sequences. One reason you might want to do
	      this is to try to maximize the relative entropy/position of your
	      model, which may be useful for short models.

       --eset <x>
	      Explicitly  set  the effective sequence number for all models to
	      <x>.

       --ere <x>
	      Set  the	minimum	 relative  entropy/position  target  to	  <x>.
	      Requires	--eent.	 Default depends on the sequence alphabet; for
	      protein sequences, it is 0.59 bits/position.

       --esigma <x>
	      Sets the minimum relative entropy contributed by an entire model
	      alignment,  over its whole length. This has the effect of making
	      short models have higher	relative  entropy  per	position  than
	      --ere alone would give. The default is 45.0 bits.

       --eid <x>
	      Sets  the	 fractional  pairwise  identity	 cutoff used by single
	      linkage clustering with the  --eclust  option.  The  default  is
	      0.62.

OPTIONS CONTROLLING E-VALUE CALIBRATION
       Estimating the location parameters for the expected score distributions
       for MSV filter  scores,	Viterbi	 filter	 scores,  and  Forward	scores
       requires three short random sequence simulations.

       --EmL <n>
	      Sets  the sequence length in simulation that estimates the loca‐
	      tion parameter mu for MSV filter E-values. Default is 200.

       --EmN <n>
	      Sets the number of sequences in simulation  that	estimates  the
	      location parameter mu for MSV filter E-values. Default is 200.

       --EvL <n>
	      Sets  the sequence length in simulation that estimates the loca‐
	      tion parameter mu for Viterbi filter E-values. Default is 200.

       --EvN <n>
	      Sets the number of sequences in simulation  that	estimates  the
	      location	parameter  mu  for Viterbi filter E-values. Default is
	      200.

       --EfL <n>
	      Sets the sequence length in simulation that estimates the	 loca‐
	      tion parameter tau for Forward E-values. Default is 100.

       --EfN <n>
	      Sets  the	 number	 of sequences in simulation that estimates the
	      location parameter tau for Forward E-values. Default is 200.

       --Eft <x>
	      Sets the tail mass fraction to fit in the simulation that	 esti‐
	      mates the location parameter tau for Forward evalues. Default is
	      0.04.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z <x> Assert that the total number of targets in your searches is <x>,
	      for  the	purposes  of per-sequence E-value calculations, rather
	      than the actual number of targets seen.

       --domZ <x>
	      Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-domain conditional E-value calculations,
	      rather than the number of	 targets  that	passed	the  reporting
	      thresholds.

       --seed <n>
	      Seed  the random number generator with <n>, an integer >= 0.  If
	      <n> is >0, any stochastic simulations will be reproducible;  the
	      same  command will give the same results.	 If <n> is 0, the ran‐
	      dom number generator is seeded arbitrarily, and stochastic simu‐
	      lations  will  vary  from	 run  to run of the same command.  The
	      default seed is 42.

       --qformat <s>
	      Declare that the input query_seqfile is in format <s>.  Accepted
	      sequence	file  formats  include	FASTA,	EMBL,  Genbank,	 DDBJ,
	      Uniprot, Stockholm, and SELEX. Default is to autodetect the for‐
	      mat of the file.

       --tformat <s>
	      Declare  that the input target_seqdb is in format <s>.  Accepted
	      sequence	file  formats  include	FASTA,	EMBL,  Genbank,	 DDBJ,
	      Uniprot, Stockholm, and SELEX. Default is to autodetect the for‐
	      mat of the file.

       --cpu <n>
	      Set the number of parallel worker threads to <n>.	  By  default,
	      HMMER  sets  this	 to the number of CPU cores it detects in your
	      machine - that is, it tries to maximize the use of  your	avail‐
	      able  processor  cores.  Setting	<n>  higher than the number of
	      available cores is of little if any value, but you may  want  to
	      set  it  to  something less. You can also control this number by
	      setting an environment variable, HMMER_NCPU.

	      This option is only available if HMMER was compiled  with	 POSIX
	      threads  support.	 This  is  the	default,  but it may have been
	      turned off at compile-time for your site	or  machine  for  some
	      reason.

	      --stall For debugging the MPI master/worker version: pause after
	      start, to enable the developer to attach debuggers to  the  run‐
	      ning  master  and	 worker(s)  processes.	Send SIGCONT signal to
	      release the pause.  (Under  gdb:	(gdb)  signal  SIGCONT)	 (Only
	      available if optional MPI support was enabled at compile-time.)

       --mpi  Run in MPI master/worker mode, using mpirun.  (Only available if
	      optional MPI support was enabled at compile-time.)

SEE ALSO
       See hmmer(1) for a master man page with a list of  all  the  individual
       man pages for programs in the HMMER package.

       For  complete  documentation,  see  the	user guide that came with your
       HMMER  distribution  (Userguide.pdf);  or  see  the  HMMER   web	  page
       (@HMMER_URL@).

COPYRIGHT
       @HMMER_COPYRIGHT@
       @HMMER_LICENSE@

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT in your HMMER source distribution, or  see  the	 HMMER
       web page (@HMMER_URL@).

AUTHOR
       Eddy/Rivas Laboratory
       Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147 USA
       http://eddylab.org

HMMER @HMMER_VERSION@		 @HMMER_DATE@			  jackhmmer(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net