FASTA man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

FASTA/TFASTA/FASTX/TFASTXv2.0u(1)	     FASTA/TFASTA/FASTX/TFASTXv2.0u(1)

NAME
       fasta - scan a protein or DNA sequence library for similar sequences

       tfasta  -  compare a protein sequence to a DNA sequence library, trans‐
       lating the DNA sequence library `on-the-fly'.

       lfasta - compare two protein or DNA sequences for local similarity  and
       show the local sequence alignments

       plfasta - compare two sequences for local similarity and plot the local
       sequence alignments

SYNOPSIS
       fasta [-a -A -b # -c # -d #  -E # -f # -g # -k # -l file -L  FASTLIBS
       -r STATFILE -m # -o -O file -p # -Q -s SMATRIX -w # -x "# #" -y # -z -1
       ] query-sequence-file library-file [ ktup ]

       fasta [-QaAbcdEfgHiklmnoOprswxyz] query-file @library-name-file

       fasta [-QaAbcdEfgHiklmnoOprswxyz] query-file "%PRMVI"

       fasta [-aAbcdEgHlmnoOprswyx] - interactive mode

       fastx [-aAbcdEfghHlmnoOprswyx] DNA-query-file protein-library [ ktup ]

       tfasta [-aAbcdEfgkmoOprswy3] protein-query-file DNA-library [ ktup ]

       tfastx [-abcdEfghHikmoOprswy3] protein-query-file DNA-library [ ktup ]

       lfasta [-afgmnpswx] sequence-file-1 sequence-file-2 [ ktup ]

       plfasta [-afgkmnpsxv] sequence-file-1 sequence-file-2 [ ktup ]

DESCRIPTION
       fasta is used to compare a protein  or  DNA  sequence  to  all  of  the
       entries	in  a sequence library.	 For example, fasta can compare a pro‐
       tein sequence to all of the sequences in the NBRF PIR protein  sequence
       database.   fasta  will automatically decide whether the query sequence
       is DNA or protein by reading the query sequence as protein  and	deter‐
       mining  whether	the `amino-acid composition' is more than 85% A+C+G+T.
       fasta uses an improved version of the rapid sequence  comparison	 algo‐
       rithm  described	 by Lipman and Pearson (Science, (1985) 227:1427) that
       is described in Pearson and  Lipman,  Proc.  Natl.  Acad.  USA,	(1988)
       85:2444.	 The program can be invoked either with command line arguments
       or in interactive mode.	The optional third  argument,  ktup  sets  the
       sensitivity and speed of the search.  If ktup=2, similar regions in the
       two sequences being compared are found by looking at pairs  of  aligned
       residues; if ktup=1, single aligned amino acids are examined.  ktup can
       be set to 2 or 1 for  protein  sequences,  or  from  1  to  6  for  DNA
       sequences.   The default if ktup is not specified is 2 for proteins and
       6 for DNA.

       fasta compares a query sequence to a sequence library which consists of
       sequence	 data  interspersed with comments, see below.  Normally fasta,
       fastx, tfasta, and tfastx search	 the  libraries	 listed	 in  the  file
       pointed	to  by	the environment variable FASTLIBS.  The format of this
       file is described in the file FASTA.DOC.	  tfasta  compares  a  protein
       sequence	 to  a	DNA  sequence  database,  translating the DNA sequence
       library in 6 frames `on-the-fly' (3 frames with the  -3	option).   The
       search  uses the standard BLOSUM50 scoring matrix, and uses a ktup=2 by
       default.	 tfasta searches a DNA sequence database in the standard  text
       format  described  below.   tfastx,  like  tfasta,  compares  a protein
       sequence to a DNA sequence library.  However, tfastx compares the  pro‐
       tein sequence to the forward and reverse three-frame translation of the
       DNA library sequence, allowing for frameshifts.	fastx compares	a  DNA
       sequence	 to  a protein sequence database, translating the DNA sequence
       in three frames and allowing frameshifts in the alignment.  lfasta  and
       plfasta programs compare two sequences looking for local sequence simi‐
       larities.  While fasta, fastx, and tfasta report only the  best	align‐
       ment  between  the  query sequence and the library sequence, lfasta and
       plfasta will report all of the alignments  between  the	two  sequences
       with  scores  greater  than  a  cut-off value.  lfasta shows the actual
       local alignments between the two	 sequences  and	 their	scores,	 while
       plfasta produces a plot of the alignments that looks similar to a `dot-
       matrix' homology plot.  On Unix™ systems, plfasta generates  postscript
       output.

       The  fasta  programs  use  a standard text format sequence file.	 Lines
       beginning  with	'>'  or	 ';'  are  considered  comments	 and  ignored;
       sequences  can  be  upper or lower case, blanks,tabs and unrecognizable
       characters are ignored.	fasta expects sequences to use the single let‐
       ter  amino  acid	 codes,	 see  protcodes(1)  .  Library files for fasta
       should have the form shown below.

OPTIONS
       fasta and the other programs can be  directed  to  change  the  scoring
       matrix,	search	parameters, output format, and default search directo‐
       ries by entering options on the command line (preceeded by a `-' or `/'
       for  MS-DOS).  All of the options should preceed the file name and ktup
       arguments). Alternately, these options can be changed by setting	 envi‐
       ronment variables.  The options and environment variables are:

       -1     Normally,	 the  top  scoring sequences are ranked by the z-score
	      based on the opt score.  To rank sequences by  raw  scores,  use
	      the  -z  option. With the -1 option, sequences are ranked by the
	      z-score based on the init1 score. With the

       -a     (SHOWALL) Modifies the display of the two	 sequences  in	align‐
	      ments.  Normally, both sequences are shown only where they over‐
	      lap (SHOWALL=0); If -a or the environment variable SHOWALL =  1,
	      both sequences are shown in their entirety.

       -A     Force  use  of  unlimited Smith-Waterman alignment for DNA FASTA
	      and TFASTA.  By default, the program uses the older (and faster)
	      band-limited  Smith-Waterman  alignment for DNA FASTA and TFASTA
	      alignments.

       -b #   The number of similarity scores to be shown when the  -Q	option
	      is  used.	  This value is usually calculated based on the actual
	      scores.

       -c #   (OPTCUT) The threshold for optimization with  the	 option.   The
	      OPTCUT value is normally calculated based on sequence length.

       -d #   The number of alignments to be shown.  Normally, fasta shows the
	      same number of alignments as similarity scores.  By using	 fasta
	      -Q -b 200 -d 50, one would see the top scoring 200 sequences and
	      alignments for the 50 best scores.

       -E #   The expectation value threshold for displaying similarity scores
	      and  sequence  alignments.   fasta   -Q  -E  2.0	would show all
	      library sequences with scores expected to occur no more  than  2
	      times by chance in a search of the library.

       -f #   Penalty for the first residue in a gap (-12 by default for fasta
	      with proteins, -16 for DNA).

       -g #   Penalty for additional residues in a  gap	 (-2  by  default  for
	      fasta with proteins, -4 for DNA).

       -h #   (fastx, tfastx only) penalty for a +1 or -1 frameshift.

       -H     Do not display histogram of similarity scores.

       -i     (fasta,  fastx)  search with the reverse-complement of the query
	      DNA sequence.  (tfastx) search only the  reverse	complement  of
	      the DNA library sequence.

       -k #   (GAPCUT)	Sets the threshold for joining the initial regions for
	      calculating the initn score.

       -l file
	      (FASTLIBS) The name of the library  menu	file.	Normally  this
	      will  be	determined by the environment variable FASTLIBS.  How‐
	      ever, a library menu file can also be specified with -l.

       -L     display more information	about  the  library  sequence  in  the
	      alignment.

       -m #   (MARKX)  =0,1,2,3,4,10.  Alternate  display  of matches and mis‐
	      matches in alignments. MARKX=0 uses ":","."," ", for identities,
	      consevative  replacements,  and  non-conservative	 replacements,
	      respectively. MARKX=1 uses " ","x", and "X".  MARKX=2  does  not
	      show  the second sequence, but uses the second alignment line to
	      display matches with a "."  for identity, or with the mismatched
	      residue  for  mismatches.	  MARKX=2 is useful for aligning large
	      numbers of similar sequences.  MARKX=3  writes  out  a  file  of
	      library  sequences  in  FASTA  format.  MARKX=3 should always be
	      used with the "SHOWALL" (-a) option,  but	 this  does  not  com‐
	      pletely ensure that all of the sequences output will be aligned.
	      MARKX=4 displays	a  graph  of  the  alignment  of  the  library
	      sequence	with  repect  to  the  query sequence, so that one can
	      identify the regions of the query sequence that  are  conserved.
	      MARKX=10 is used to produce a parseable output format.

       -n     Forces the query sequence to be treated as a DNA sequence.

       -O filename
	      send copy of results to "filename."

       -o     Turns  off  default  fasta  limited  optimization	 on all of the
	      sequences in the library with initn scores greater than  OPTCUT.
	      This option is now the reverse of previous versions of fasta.

       -Q     Quiet option.  This allows fasta and tfasta to search a database
	      and report the results without asking any	 questions.  fasta  -Q
	      file  library  > output can be put in the background or run at a
	      later time with the unix 'at' command.  The number of similarity
	      scores  and alignments displayed with the -Q option can be modi‐
	      fied with the -b (scores) and -d (alignments) options.

       -r     STATFILE Causes fasta to	write  out  the	 sequence  identifier,
	      superfamily  number  (if	available),  and  similarity scores to
	      STATFILE for every sequence in the library.  These  results  are
	      not sorted.

       -s str (SMATRIX)	 the  filename	of an alternative scoring matrix file.
	      For protein sequences, BLOSUM50 is used by default;  PAM250  can
	      be used with the command line option -s 250.

       -v str (LINEVAL)	 (plfasta  only)  plfasta and pclfasta can use up to 4
	      different line styles to denote the scores of local  alignments.
	      The scores that correspond to these line styles can be specified
	      with the environment variable LINVAL, or with the -v option.  In
	      either  case,  a	string	with three numbers separated by spaces
	      should be given.	This string must be surrounded by double  quo‐
	      tation  marks.   For example, LINEVAL="200 100 50" tells plfasta
	      to use solid lines for local alignments with scores greater than
	      200,  long  dashed  lines	 for scores between 100 and 200, short
	      dashed lines for scores between 50 and 100, and dotted lines for
	      scores less than 50.
		   plfasta -v "200 100 50"
	      Normally,	 the  values are 200, 100, and 50 for protein sequence
	      comparisons and 400, 200, and 100 for DNA sequence comparisons.

       -w #   (LINLEN) output line length for sequence alignments.   (normally
	      60, can be set up to 200).

       -x "offset1 offset2"
	      Causes  fasta/lfasta/plfasta  to	start  numbering  the  aligned
	      sequences starting with offset1 and offset2, rather than	1  and
	      1.   This	 is particularly useful for showing alignments of pro‐
	      moter regions.

       -y     Set the band-width used for optimization.	 -y 16 is the  default
	      for  protein  when  ktup=2  and for all DNA alignments. -y 32 is
	      used for protein and ktup=1.  For proteins,  optimization	 slows
	      comparison 2-fold and is highly recommended.

       -z     Do  not  do  statistical	significance  calculation. Results are
	      ranked by the unnormalized opt, initn, or init1 score.

       -3     (tfasta, tfastx) only.  Normally	tfasta	and  tfastx  translate
	      sequences	 in  the DNA sequence library in all six frames.  With
	      the -3 option, only the three forward frames are searched.

EXAMPLES
       (1)    fasta musplfm.aa $AABANK

       Compare the amino acid sequence in the file musplfm.aa  with  the  com‐
       plete  PIR  protein  sequence  library  using  ktup  = 2 Each "library"
       sequence (there need only be one) should	 start	with  a	 comment  line
       which starts with a '>', e.g.

	    >LCBO bovine preprolactin
	    WILLLSQ ...
	    >LCHU human ...
	    ...

       (2)    fasta -a -w 80 musplfm.aa lcbo.aa 1

       Compare	the  amino  acid  sequence  in	the  file  musplfm.aa with the
       sequences in the file lcbo.aa using ktup = 1.  Show both	 sequences  in
       their entirety, with 80 residues on each output line.

       (3)    fasta

       Run the fasta program in interactive mode.  The program will prompt for
       the file name for the query sequence, list alternative libraries to  be
       seached (if FASTLIBS is set), and prompt for the ktup.

FILES
       This  version of fasta prompts for the library file to be searched from
       a list of file names that are saved in the file pointed to by the envi‐
       ronment	variable  FASTLIBS.   If FASTLIBS = fastgb.list, then the file
       fastgb.list might have the entries:

	    NBRF Protein$0P/u/lib/aabank.lib 0
	    GB Primate$1P@/u/lib/gpri.nam
	    GB Rodent$1R@/u/lib/grod.nam
	    GB Mammal$1M@/u/lib/gmammal.nam

       Each line in this file has 4 fields: (1) The  library  name,  separated
       from  the  remaining fields by a '$'; (2) A 0 or a 1 indicating protein
       or DNA library respectively; (3) A single letter that will be  used  to
       choose  the  library;  (4) the location of the library file itself (the
       library file name can contain  an  optional  library  format  specfier.
       Fasta  recognizes the following library formats: 0 - Pearson/FASTA; 1 -
       Genbank flat file; 2 - NBRF/PIR Codata; 3 - EMBL/SWISS-PROT; 4 - Intel‐
       ligenetics;  5 - NBRF/PIR VMS); Note that this fourth field can contain
       an '@' character, which indicates that the library file is an  indirect
       library	file  containing list of library files, one per line. An indi‐
       rect library file might have the lines:
	    </usr/slib/genbank	(the directory for the library files)
	    gbpri.seq 1
	    gbrod.seq 1
	    gbmam.seq 1
	    ...
	    gbvrl.seq 1
	    ...

       You can use your own sequence files for fasta, just be certain to put a
       '>'  and	 comment  as  the  first  line	before the sequence.  Only one
       library file type, the standard NBRF library format,  is	 supported  by
       the  VAX/VMS  programs.	lfasta and plfasta do not required the '>' and
       comment line.  fasta does.

SEE ALSO
       rdf2(1),protcodes(5), dnacodes(5)

AUTHOR
       Bill Pearson
       wrp@virginia.EDU

				     local   FASTA/TFASTA/FASTX/TFASTXv2.0u(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net