faidx man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

faidx(5)		    Bioinformatics formats		      faidx(5)

NAME
       faidx - an index enabling random access to FASTA files

SYNOPSIS
       file.fa.fai, file.fasta.fai

DESCRIPTION
       Using  an  fai  index  file in conjunction with a FASTA file containing
       reference sequences  enables  efficient	access	to  arbitrary  regions
       within  those  reference	 sequences.   The index file typically has the
       same filename as the corresponding FASTA file, with .fai appended.

       An fai index file is a text file consisting of  lines  each  with  five
       TAB-delimited columns:

       NAME	   Name of this reference sequence
       LENGTH	   Total length of this reference sequence, in bases
       OFFSET	   Offset within the FASTA file of this sequence's first base
       LINEBASES   The number of bases on each line
       LINEWIDTH   The number of bytes in each line, including the newline

       The  NAME  and  LENGTH columns contain the same data as would appear in
       the SN and LN fields of	a  SAM	@SQ  header  for  the  same  reference
       sequence.

       The  OFFSET  column contains the offset within the FASTA file, in bytes
       starting from zero, of the first base of this reference sequence, i.e.,
       of  the	character  following  the newline at the end of the ">" header
       line.  Typically the lines of a fai index file appear in the  order  in
       which  the  reference sequences appear in the FASTA file, so .fai files
       are typically sorted according to this column.

       The LINEBASES column contains the  number  of  bases  in	 each  of  the
       sequence	 lines	that  form  the body of this reference sequence, apart
       from the final line which may be shorter.  The  LINEWIDTH  column  con‐
       tains the number of bytes in each of the sequence lines (except perhaps
       the final line), thus differing from LINEBASES in that it  also	counts
       the bytes forming the line terminator.

   FASTA Files
       In order to be indexed with samtools faidx, a FASTA file must be a text
       file of the form

	      >name [description...]
	      ATGCATGCATGCATGCATGCATGCATGCAT
	      GCATGCATGCATGCATGCATGCATGCATGC
	      ATGCAT
	      >name [description...]
	      ATGCATGCATGCAT
	      GCATGCATGCATGC
	      [...]

       In particular, each reference sequence must be "well-formatted",	 i.e.,
       all of its sequence lines must be the same length, apart from the final
       sequence line which may be shorter.  (While this sequence  line	length
       must  be	 the  same within each sequence, it may vary between different
       reference sequences in the same FASTA file.)

       This also means that although the FASTA file may have Unix- or Windows-
       style or other line termination, the newline characters present must be
       consistent, at least within each reference sequence.

       The samtools implementation uses the first word of the ">" header  line
       text  (i.e.,  up	 to the first whitespace character, having skipped any
       initial whitespace after the ">") as the NAME column.

EXAMPLE
       For example, given this FASTA file

	      >one
	      ATGCATGCATGCATGCATGCATGCATGCAT
	      GCATGCATGCATGCATGCATGCATGCATGC
	      ATGCAT
	      >two another chromosome
	      ATGCATGCATGCAT
	      GCATGCATGCATGC

       formatted with Unix-style (LF) line termination, the corresponding  fai
       index would be

	      one   66	  5   30   31
	      two   28	 98   14   15

       If the FASTA file were formatted with Windows-style (CR-LF) line termi‐
       nation, the fai index would be

	      one   66	   6   30   32
	      two   28	 103   14   16

SEE ALSO
       samtools(1)

       http://en.wikipedia.org/wiki/FASTA_format
	      Further description of the FASTA format

htslib				  August 2015			      faidx(5)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net