HUGE-MERGE(1) User Contributed Perl Documentation HUGE-MERGE(1)NAMEhuge-merge.pl - Merge the results of multiple huge-sort generated files
into a single sorted file.
SYNOPSIShuge-merge.pl output-directory
DESCRIPTION
Combine the sorted bigram files generated by huge-sort.pl efficiently.
This program is used internally by huge-count.pl.
USGAEhuge-merge.pl [OPTIONS] SOURCEDIR
INPUT
Required Arguments:
SOURCEDIR
Input to huge-merge.pl should be a single flat directory containing
multiple plain text files generated by huge-sort.pl. The result file,
merge.* (* is a number, the final result file has the maximum number),
is in the source directory.
Optional Arguments:
--keep
Switches ON the --keep option will keep all the intermediate merging
files.
Other Options:
--help
Displays the help information.
--version
Displays the version information.
BUGS
There is a limitation in huge-merge.pl. When the size of the corpus is
very large (>16G) and the some of the terms of the bigrams is very
long (>30 chars), the program could run out of memory at huge-merge.pl
step. This is because huge-merge use two hashes to count the
frequencies of the first and second term of the bigrams. These two
hashes could use up the memory with the increase of the length of the
terms and the increase of the number of the terms. If just for normal
text, terms are within limited length and numbers, the software won't
use up the memory.
AUTHOR
Ying Liu, University of Minnesota, Twin Cities. liux0395 at umn.edu
Ted Pedersen, University of Minnesota, Duluth. tpederse at umn.edu
COPYRIGHT
Copyright (C) 2009-2011, Ying Liu and Ted Pedersen
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version. This program is distributed in the hope
that it will be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
perl v5.20.2 2011-03-31 HUGE-MERGE(1)