hmmbuild man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

hmmbuild(1)			 HMMER Manual			   hmmbuild(1)

NAME
       hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)

SYNOPSIS
       hmmbuild [options] hmmfile msafile

DESCRIPTION
       Build  a	 profile  HMM for each multiple sequence alignment in msafile,
       and save it to a new file hmmfile.

OPTIONS
       -h     Help; print a brief reminder  of	command	 line  usage  and  all
	      available options.

       -n <s> Name the new profile <s>.	 The default is to use the name of the
	      alignment (if one is present in the msafile, or,	failing	 that,
	      the  name	 of  the  hmmfile.   If msafile contains more than one
	      alignment, -n doesn't work, and every alignment must have a name
	      annotated in the msafile (as in Stockholm #=GF ID annotation).

       -o <f> Direct the summary output to file <f>, rather than to stdout.

       -O <f> After each model is constructed, resave annotated, possibly mod‐
	      ified source alignments to a file <f> in Stockholm format.   The
	      alignments  are annotated with a reference annotation line indi‐
	      cating which columns were assigned as consensus,	and  sequences
	      are annotated with what relative sequence weights were assigned.
	      Some residues of the alignment may have been shifted to accommo‐
	      date  restrictions of the Plan7 profile architecture, which dis‐
	      allows transitions between insert and delete states.

OPTIONS FOR SPECIFYING THE ALPHABET
       The alphabet type (amino, DNA, or RNA) is autodetected by  default,  by
       looking	at  the composition of the msafile.  Autodetection is normally
       quite reliable, but occasionally alphabet type  may  be	ambiguous  and
       autodetection  can fail (for instance, on tiny toy alignments of just a
       few residues). To avoid this, or to increase  robustness	 in  automated
       analysis	 pipelines,  you may specify the alphabet type of msafile with
       these options.

       --amino
	      Specify that all sequences in msafile are proteins.

       --dna  Specify that all sequences in msafile are DNAs.

       --rna  Specify that all sequences in msafile are RNAs.

OPTIONS CONTROLLING PROFILE CONSTRUCTION
       These options control how consensus columns are defined	in  an	align‐
       ment.

       --fast Define  consensus	 columns as those that have a fraction >= sym‐
	      frac of residues as opposed to gaps. (See below for  the	--sym‐
	      frac option.) This is the default.

       --hand Define consensus columns in next profile using reference annota‐
	      tion to the multiple alignment.  This allows you to  define  any
	      consensus columns you like.

       --symfrac <x>
	      Define the residue fraction threshold necessary to define a con‐
	      sensus column when using the default --fast construction option.
	      The  default  for	 --symfrac is 0.5. The symbol fraction in each
	      column is calculated after taking	 relative  sequence  weighting
	      into  account, and ignoring gap characters corresponding to ends
	      of sequence fragments (as opposed to  internal  insertions/dele‐
	      tions).	Setting	 this to 0.0 means that every alignment column
	      will be assigned as consensus,  which  may  be  useful  in  some
	      cases.  Setting  it  to 1.0 means that only columns that have no
	      gap characters at all will be assigned as consensus.

       --fragthresh <x>
	      We only want to count terminal gaps as deletions if the  aligned
	      sequence	is  known  to  be full-length, not if it is a fragment
	      (for instance, because only part of  it  was  sequenced).	 HMMER
	      uses  a simple rule to infer fragments: if the sequence length L
	      is less than a fraction <x> times the mean  sequence  length  of
	      all the sequences in the alignment, then the sequence is handled
	      as a fragment. The default is 0.5.

OPTIONS CONTROLLING RELATIVE WEIGHTS
       HMMER uses an ad hoc sequence weighting algorithm to downweight closely
       related	sequences  and	upweight  distantly related ones. This has the
       effect of making models less biased by uneven phylogenetic  representa‐
       tion. For example, two identical sequences would typically each receive
       half the weight that one sequence would.	 These options	control	 which
       algorithm gets used.

       --wpb  Use   the	 Henikoff  position-based  sequence  weighting	scheme
	      [Henikoff and Henikoff, J. Mol. Biol. 243:574, 1994].   This  is
	      the default.

       --wgsc Use  the	Gerstein/Sonnhammer/Chothia  weighting algorithm [Ger‐
	      stein et al, J. Mol. Biol. 235:1067, 1994].

       --wblosum
	      Use the same clustering scheme that was used to weight  data  in
	      calculating  BLOSUM subsitution matrices [Henikoff and Henikoff,
	      Proc. Natl. Acad. Sci 89:10915,  1992].  Sequences  are  single-
	      linkage  clustered  at  an identity threshold (default 0.62; see
	      --wid) and within each cluster of	 c  sequences,	each  sequence
	      gets relative weight 1/c.

       --wnone
	      No relative weights. All sequences are assigned uniform weight.

       --wid <x>
	      Sets  the	 identity  threshold used by single-linkage clustering
	      when using --wblosum.  Invalid with any other weighting  scheme.
	      Default is 0.62.

OPTIONS CONTROLLING EFFECTIVE SEQUENCE NUMBER
       After  relative weights are determined, they are normalized to sum to a
       total effective sequence number, eff_nseq.   This  number  may  be  the
       actual  number  of  sequences in the alignment, but it is almost always
       smaller than that.   The	 default  entropy  weighting  method  (--eent)
       reduces the effective sequence number to reduce the information content
       (relative entropy, or average expected score on true homologs) per con‐
       sensus  position.  The  target relative entropy is controlled by a two-
       parameter function, where the two parameters are	 settable  with	 --ere
       and --esigma.

       --eent Adjust  effective sequence number to achieve a specific relative
	      entropy per position (see --ere).	 This is the default.

       --eclust
	      Set effective sequence number to the  number  of	single-linkage
	      clusters	at  a  specific	 identity threshold (see --eid).  This
	      option is not recommended; it's for experiments  evaluating  how
	      much better --eent is.

       --enone
	      Turn  off	 effective  sequence number determination and just use
	      the actual number of sequences. One reason you might want to  do
	      this is to try to maximize the relative entropy/position of your
	      model, which may be useful for short models.

       --eset <x>
	      Explicitly set the effective sequence number for all  models  to
	      <x>.

       --ere <x>
	      Set   the	 minimum  relative  entropy/position  target  to  <x>.
	      Requires --eent.	Default depends on the sequence alphabet;  for
	      protein sequences, it is 0.59 bits/position.

       --esigma <x>
	      Sets the minimum relative entropy contributed by an entire model
	      alignment, over its whole length. This has the effect of	making
	      short  models  have  higher  relative  entropy per position than
	      --ere alone would give. The default is 45.0 bits.

       --eid <x>
	      Sets the fractional pairwise  identity  cutoff  used  by	single
	      linkage  clustering  with	 the  --eclust	option. The default is
	      0.62.

OPTIONS CONTROLLING E-VALUE CALIBRATION
       The location parameters for the expected score  distributions  for  MSV
       filter  scores, Viterbi filter scores, and Forward scores require three
       short random sequence simulations.

       --EmL <n>
	      Sets the sequence length in simulation that estimates the	 loca‐
	      tion parameter mu for MSV filter E-values. Default is 200.

       --EmN <n>
	      Sets  the	 number	 of sequences in simulation that estimates the
	      location parameter mu for MSV filter E-values. Default is 200.

       --EvL <n>
	      Sets the sequence length in simulation that estimates the	 loca‐
	      tion parameter mu for Viterbi filter E-values. Default is 200.

       --EvN <n>
	      Sets  the	 number	 of sequences in simulation that estimates the
	      location parameter mu for Viterbi filter	E-values.  Default  is
	      200.

       --EfL <n>
	      Sets  the sequence length in simulation that estimates the loca‐
	      tion parameter tau for Forward E-values. Default is 100.

       --EfN <n>
	      Sets the number of sequences in simulation  that	estimates  the
	      location parameter tau for Forward E-values. Default is 200.

       --Eft <x>
	      Sets  the tail mass fraction to fit in the simulation that esti‐
	      mates the location parameter tau for Forward evalues. Default is
	      0.04.

OTHER OPTIONS
       --mpi  Run  as  a parallel MPI program. Each alignment is assigned to a
	      MPI worker node for construction. (Therefore, the maximum paral‐
	      lelization  cannot  exceed the number of alignments in the input
	      msafile.)	 This is useful when building large profile libraries.
	      This  option  is	only  available if optional MPI capability was
	      enabled at compile-time.

       --informat <s>
	      Declare that the input msafile is in format <s>.	Currently  the
	      accepted	multiple  alignment sequence file formats only include
	      Stockholm and SELEX. Default is to autodetect the format of  the
	      file.

       --seed <n>
	      Seed  the random number generator with <n>, an integer >= 0.  If
	      <n> is nonzero, any stochastic simulations will be reproducible;
	      the  same	 command will give the same results.  If <n> is 0, the
	      random number generator is seeded	 arbitrarily,  and  stochastic
	      simulations  will vary from run to run of the same command.  The
	      default seed is 42.

	      --laplace Experimental only: use a Laplace +1 prior in place  of
	      the default mixture Dirichlet prior.

       --stall
	      For  debugging  MPI  parallelization:  arrest  program execution
	      immediately after start, and wait for a debugger	to  attach  to
	      the running process and release the arrest.

SEE ALSO
       See  hmmer(1)  for  a master man page with a list of all the individual
       man pages for programs in the HMMER package.

       For complete documentation, see the user	 guide	that  came  with  your
       HMMER   distribution   (Userguide.pdf);	or  see	 the  HMMER  web  page
       (@HMMER_URL@).

COPYRIGHT
       @HMMER_COPYRIGHT@
       @HMMER_LICENSE@

       For additional information on copyright and  licensing,	see  the  file
       called  COPYRIGHT  in  your HMMER source distribution, or see the HMMER
       web page (@HMMER_URL@).

AUTHOR
       Eddy/Rivas Laboratory
       Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147 USA
       http://eddylab.org

HMMER @HMMER_VERSION@		 @HMMER_DATE@			   hmmbuild(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net