text_similarity.pl man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

TEXT_SIMILARITY(1)    User Contributed Perl Documentation   TEXT_SIMILARITY(1)

NAME
       text_simlarity.pl - Measure the pair-wise similarity between files or
       strings

SYNOPSIS
	text_similarity.pl --type Text::Similarity::Overlaps --normalize
				--string '.......this is one' '????this is two'

	text_similarity.pl --type Text::Similarity::Overlaps --no-normalize
				--string '.......this is one' '????this is two'

	text_similarity.pl --type Text::Similarity::Overlaps
				--string 'sir winston churchill' 'Churchill, Winston Sir'

	text_similarity.pl --type Text::Similarity::Overlaps ../GPL.txt ../FDL.txt

	text_similarity.pl --verbose --type Text::Similarity::Overlaps ../GPL.txt ../FDL.txt

	text_similarity.pl --verbose --stoplist stoplist.txt --type Text::Similarity::Overlaps
			       ../GPL.txt ../FDL.txt

	text_similarity.pl [[--verbose] [--stoplist=FILE] [--no-normalize] [--string]]
			       --type=TYPE | --help | --version] FILE1 FILE2

DESCRIPTION
       This script is a simple command-line interface to the Text::Similarity
       Perl modules. A method for computing similarity must be specified via
       the --type option, and then that method is used to measure the
       similarity of two strings or two files.

       Text::Similarity::Overlaps measures similarity by counting the number
       of words that overlap (match) between the two inputs, without regard to
       order. So, all of the following strings would have the same pairwise
       similarity (they would each have a raw score of 4 relative to each
       other, meaning that 4 words are overlapping or matching).

	winston churchill was here
	here was winston churchill
	winston was here churchill

       By default Text::Similarity::Overlaps returns a normalized F-measure
       between 0 and 1. Normalization can be turned off by specifying
       --no-normalize. It returns various other overlap based scores if you
       specify --verbose.

OPTIONS
       --type=TYPE
	   The type of text similarity measure.	 Valid values include:

	       Text::Similarity::Overlaps

       --stoplist=FILE
	   The name of a file containing stop words. Under the ./sample
	   directory, we give two formats of the stop words format, one word
	   per line(stoplist.txt) and one word in the regular expression
	   format per line(stoplist-nsp.regex). If you want to mix these two
	   formats to make your own stop words file, it is also all right.

       --no-normalize
	   Do not normalize scores.  Normally, scores are normalized so that
	   they range from 0 to 1.  Using this option will give you a raw
	   score instead.

       --string
	   Input will be provided on the command line as strings, not files.

       --verbose
	   Show all the matches that are found between the files, their length
	   and frequency, as well as precision, recall, F-measure, E-measure,
	   Cosine, and the Dice Coefficient.

       --help
	   Show a detailed help message.

       --version
	   Show version information.

AUTHORS
	Ted Pedersen, University of Minnesota, Duluth
	tpederse at d.umn.edu

	Jason Michelizzi

	Ying Liu, University of Minnesota, Twin Cities
	liux0395 at umn.edu

       Last modified by: $Id: text_similarity.pl,v 1.4 2010/06/10 21:31:24
       liux0395 Exp $

BUGS
       --compfile is not working, seems to cause hang (tdp 3/21/08)

COPYRIGHT AND LICENSE
       Copyright (C) 2004-2010, Jason Michelizzi, Ted Pedersen and Ying Liu

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
       General Public License for more details.

       You should have received a copy of the GNU General Public License along
       with this program; if not, write to the Free Software Foundation, Inc.,
       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

perl v5.20.2			  2011-09-29		    TEXT_SIMILARITY(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net