TREEBANKFREQ(1) User Contributed Perl Documentation TREEBANKFREQ(1)NAMEtreebankFreq.pl - Compute Information Content from Penn Treebank 2
SYNOPSIStreebankFreq.pl [--outfile=OUTFILE [--stopfile=STOPFILE]
[--wnpath=WNPATH] [--resnik] [--smooth=SCHEME] PATH
| --help --version]
DESCRIPTION
This program reads the Penn Treebank, Release 2, from the Linguistic
Data Consortium, <http://ldc.upenn.edu>, and computes the frequency
counts for each synset in WordNet. These frequency counts are used by
the Lin, Resnik, and Jiang & Conrath measures of semantic relatedness
to calculate the information content values of concepts. The output is
generated in a format as required by the WordNet::Similarity modules
for computing semantic relatedness.
A more detailed description of how information content is calculated
can be found in rawtextFreq.pl. This program uses exactly the same
techniques as described there.
OPTIONS
--outfile=filename
The name of a file to which output should be written
--stopfile=filename
A file containing a list of stop listed words that will not be
considered in the frequency counts. A sample file can be down-
loaded from
http://www.d.umn.edu/~tpederse/Group01/WordNet/words.txt
--wnpath=path
Location of the WordNet data files (e.g.,
/usr/local/WordNet-3.0/dict)
--resnik
Use Resnik (1995) frequency counting
--smooth=SCHEME
Smoothing should used on the probabilities computed. SCHEME can
only be ADD1 at this time
--help
Show a help message
--version
Display version information
PATH
Path to the raw Wall Stree Journal portion of the Treebank corpus.
This is usually in the /raw/wsj subdirectory of the Treebank
installation. Thus, you might run this program as
treebankFreq.pl [OPTIONS] /home/sid/treebank/raw/wsj
BUGS
Report to WordNet::Similarity mailing list :
<http://groups.yahoo.com/group/wn-similarity>
SEE ALSO
WordNet::Similarity
Penn Treebank :
<http://ldc.upenn.edu>,
WordNet home page :
<http://wordnet.princeton.edu>
WordNet::Similarity home page :
<http://wn-similarity.sourceforge.net>
AUTHORS
Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu
Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
banerjee+ at cs.cmu.edu
Siddharth Patwardhan, University of Utah, Salt Lake City
sidd at cs.utah.edu
COPYRIGHT
Copyright (c) 2005-2008, Ted Pedersen, Satanjeev Banerjee, and
Siddharth Patwardhan
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version. This program is distributed in the hope
that it will be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
perl v5.20.2 2015-08-31 TREEBANKFREQ(1)