tabix man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

tabix(1)		     Bioinformatics tools		      tabix(1)

NAME
       bgzip - Block compression/decompression utility

       tabix - Generic indexer for TAB-delimited genome position files

SYNOPSIS
       bgzip [-cdhB] [-b virtualOffset] [-s size] [file]

       tabix  [-0lf]  [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol]
       [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]

DESCRIPTION
       Tabix indexes a TAB-delimited genome position file in.tab.bgz and  cre‐
       ates  an index file ( in.tab.bgz.tbi or in.tab.bgz.csi ) when region is
       absent from the command-line. The input	data  file  must  be  position
       sorted  and  compressed	by  bgzip  which has a gzip(1) like interface.
       After indexing, tabix is able to quickly retrieve data  lines  overlap‐
       ping  regions  specified in the format "chr:beginPos-endPos". Fast data
       retrieval also works over network if URI is given as a file name and in
       this  case  the	index  file  will  be  downloaded if it is not present
       locally.

INDEXING OPTIONS
       -0, --zero-based
		 Specify that the position in the data file is	0-based	 (e.g.
		 UCSC files) rather than 1-based.

       -b, --begin INT
		 Column of start chromosomal position. [4]

       -c, --comment CHAR
		 Skip lines started with character CHAR. [#]

       -C, --csi Skip lines started with character CHAR. [#]

       -e, --end INT
		 Column of end chromosomal position. The end column can be the
		 same as the start column. [5]

       -f, --force
		 Force to overwrite the index file if it is present.

       -m, --min-shiftINT
		 set minimal interval size for CSI indices to 2^INT [14]

       -p, --preset STR
		 Input format for indexing. Valid values are: gff,  bed,  sam,
		 vcf.	This option should not be applied together with any of
		 -s, -b, -e, -c and -0; it is  not  used  for  data  retrieval
		 because this setting is stored in the index file. [gff]

       -s, --sequence INT
		 Column of sequence name. Option -s, -b, -e, -S, -c and -0 are
		 all stored in the index  file	and  thus  not	used  in  data
		 retrieval. [1]

       -S, --skip-lines INT
		 Skip first INT lines in the data file. [0]

QUERYING AND OTHER OPTIONS
       -h, --print-header
	      Print also the header/meta lines.

       -H, --only-header
	      Print only the header/meta lines.

       -l, --list-chroms
	      List the sequence names stored in the index file.

       -r, --reheader  FILE
	      Replace the header with the content of FILE

       -R, --regions  FILE
	      Restrict to regions listed in the FILE. The FILE can be BED file
	      (requires .bed, .bed.gz, .bed.bgz file name extension) or a TAB-
	      delimited	 file  with  CHROM, POS, and,  optionally, POS_TO col‐
	      umns, where positions are	 1-based  and  inclusive.   When  this
	      option is in use, the input file may not be sorted.  regions.

       -T, --targets FILE
	      Similar to -R but the entire input will be read sequentially and
	      regions not listed in FILE will be skipped.

EXAMPLE
       (grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) |  bgzip  >
       sorted.gff.gz;

       tabix -p gff sorted.gff.gz;

       tabix sorted.gff.gz chr1:10,000,000-20,000,000;

NOTES
       It  is straightforward to achieve overlap queries using the standard B-
       tree index (with or without binning) implemented in all SQL  databases,
       or  the R-tree index in PostgreSQL and Oracle. But there are still many
       reasons to use tabix. Firstly, tabix  directly  works  with  a  lot  of
       widely  used  TAB-delimited  formats such as GFF/GTF and BED. We do not
       need to design database schema or specialized binary formats.  Data  do
       not need to be duplicated in different formats, either. Secondly, tabix
       works on compressed data files while most SQL  databases	 do  not.  The
       GenCode annotation GTF can be compressed down to 4%.  Thirdly, tabix is
       fast. The same indexing algorithm is known to work efficiently  for  an
       alignment with a few billion short reads. SQL databases probably cannot
       easily handle data at this scale. Last but not the  least,  tabix  sup‐
       ports remote data retrieval. One can put the data file and the index at
       an FTP or HTTP server, and other users or even  web  services  will  be
       able to get a slice without downloading the entire file.

AUTHOR
       Tabix  was  written  by Heng Li. The BGZF library was originally imple‐
       mented by Bob Handsaker and modified by Heng Li for remote file	access
       and in-memory caching.

SEE ALSO
       samtools(1)

htslib-1.3		       15 December 2015			      tabix(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net