huge-merge.pl man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

HUGE-MERGE(1)	      User Contributed Perl Documentation	 HUGE-MERGE(1)

NAME
       huge-merge.pl - Merge the results of multiple huge-sort generated files
       into a single sorted file.

SYNOPSIS
       huge-merge.pl output-directory

DESCRIPTION
       Combine the sorted bigram files generated by huge-sort.pl efficiently.

       This program is used internally by huge-count.pl.

USGAE
       huge-merge.pl [OPTIONS] SOURCEDIR

INPUT
   Required Arguments:
       SOURCEDIR

       Input to huge-merge.pl should be a single flat directory containing
       multiple plain text files generated by huge-sort.pl. The result file,
       merge.* (* is a number, the final result file has the maximum number),
       is in the source directory.

   Optional Arguments:
       --keep

       Switches ON the --keep option will keep all the intermediate merging
       files.

       Other Options:

       --help

       Displays the help information.

       --version

       Displays the version information.

       BUGS

       There is a limitation in huge-merge.pl. When the size of the corpus is
       very large (>16G)  and the some of the terms of the bigrams is very
       long (>30 chars), the program could run out of memory at huge-merge.pl
       step. This is because huge-merge use two hashes to count the
       frequencies of the first and second term of the bigrams. These two
       hashes could use up the memory with the increase of the length of the
       terms and the increase of the number of the terms. If just for normal
       text, terms are within limited length and numbers, the software won't
       use up the memory.

AUTHOR
       Ying Liu, University of Minnesota, Twin Cities.	liux0395 at umn.edu

       Ted Pedersen, University of Minnesota, Duluth.  tpederse at umn.edu

COPYRIGHT
       Copyright (C) 2009-2011, Ying Liu and Ted Pedersen

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.  This program is distributed in the hope
       that it will be useful, but WITHOUT ANY WARRANTY; without even the
       implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
       PURPOSE.	 See the GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along
       with this program; if not, write to the Free Software Foundation, Inc.,
       59 Temple Place - Suite 330, Boston, MA	02111-1307, USA.

perl v5.20.2			  2011-03-31			 HUGE-MERGE(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net