GO::AnnotationProvider::AnnotationParser man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

GO::AnnotationProviderUsernContribuGO::AnnotationProvider::AnnotationParser(3)

NAME
       GO::AnnotationProvider::AnnotationParser - parses a gene annotation
       file

SYNOPSIS
       GO::AnnotationProvider::AnnotationParser - reads a Gene Ontology gene
       associations file, and provides methods by which to retrieve the GO
       annotations for the an annotated entity.	 Note, it is case insensitive,
       with some caveats - see documentation below.

	   my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(annotationFile => "data/gene_association.sgd");

	   my $geneName = "AAT2";

	   print "GO associations for gene: ", join (" ", $annotationParser->goIdsByName(name	=> $geneName,
											 aspect => 'P')), "\n";

	   print "Database ID for gene: ", $annotationParser->databaseIdByName($geneName), "\n";

	   print "Database name: ", $annotationParser->databaseName(), "\n";

	   print "Standard name for gene: ", $annotationParser->standardNameByName($geneName), "\n";

	   my $i;

	   my @geneNames = $annotationParser->allStandardNames();

	   foreach $i (0..10) {

	       print "$geneNames[$i]\n";

	   }

DESCRIPTION
       GO::AnnotationProvider::AnnotationParser is a concrete subclass of
       GO::AnnotationProvider, and creates a data structure mapping gene names
       to GO annotations by parsing a file of annotations provided by the Gene
       Ontology Consortium.

       This package provides object methods for retrieving GO annotations that
       have been parsed from a 'gene associations' file, provided by the gene
       ontology consortium.  The format for the file is:

       Lines beginning with a '!' character are comment lines.

	   Column  Cardinality	 Contents
	   ------  -----------	 -------------------------------------------------------------
	       0       1	 Database abbreviation for the source of annotation (e.g. SGD)
	       1       1	 Database identifier of the annotated entity
	       2       1	 Standard name of the annotated entity
	       3       0,1	 NOT (if a gene is specifically NOT annotated to the term)
	       4       1	 GOID of the annotation
	       5       1,n	 Reference(s) for the annotation
	       6       1	 Evidence code for the annotation
	       7       0,n	 With or From (a bit mysterious)
	       8       1	 Aspect of the Annotation (C, F, P)
	       9       0,1	 Name of the product being annotated
	      10       0,n	 Alias(es) of the annotated product
	      11       1	 type of annotated entity (one of gene, transcript, protein)
	      12       1,2	 taxonomic id of the organism encoding and/or using the product
	      13       1	 Date of annotation YYYYMMDD
	      14       1	 Assigned_by : The database which made the annotation

       Columns are separated by tabs.  For those entries with a cardinality
       greater than 1, multiple entries are pipe , |, delimited.

       Further details can be found at:

       http://www.geneontology.org/doc/GO.annotation.html#file

       The following assumptions about the file are made (and should be true):

	   1.  All aliases appear for all entries of a given annotated product
	   2.  The database identifiers are unique, in that two different
	       entities cannot have the same database id.

TODO
       Also see the TODO list in the parent, GO::AnnotationProvider.

	1.  Add in methods that will allow retrieval of evidence codes with
	    the annotations for a particular entity.

	2.  Add in methods that return all the annotated entities for a
	    particular GOID.

	3.  Add in the ability to request only annotations either including
	    or excluding particular evidence codes.  Such evidence codes
	    could be provided as an anonymous array as the value of a named
	    argument.

	4.  Same as number 3, except allow the retrieval of annotated
	    entities for a particular GOID, based on inclusion or exclusion
	    of certain evidence codes.

	These first four items will require a reworking of how data are
	stored on the backend, and thus the parsing code itself, though it
	should not affect any of the already existing API.

	5.  Instead of 'use'ing Storable, 'require' it instead, only at the
	    point of use, which will mean that AnnotationParser can be
	    happily used in the absence of Storable, just without those
	    functions that need it.

	6.  Extend the ValidateFile class method to check that an entity
	    should never be annotated to the same node twice, with the same
	    evidence, with the same reference.

	7.  An additional checker, that uses an AnnotationProvider in
	    conjunction with an OntologyProvider, would be useful, that
	    checks that some of the annotations themselves are valid, ie
	    that no entities are annotated to the 'unknown' node in a
	    particular aspect, and also to another node within that same
	    aspect.  Can annotations be redundant? ie, if an entity is
	    annotated to a node, and an ancestor of the node, is that
	    annotation redundant?  Does it depend on the evidence codes and
	    references.	 Or are such annotations reinforcing?  These things
	    are useful to consider when formulating the confidence which can
	    be attributed to an annotation.

Class Methods
   Usage
       This class method simply prints out a usage statement, along with an
       error message, if one was passed in.

       Usage :

	   GO::AnnotationProvider::AnnotationParser->Usage();

   ValidateFile
       This class method reads an annotation file, and returns a reference to
       an array of errors that are present within the file.  The errors are
       simply strings, each beginning with "Line $lineNo : " where $lineNo is
       the number of the line in the file where the error was found.

       Usage:

	   my $errorsRef = GO::AnnotationProvider::AnnotationParser->ValidateFile(annotationFile => $file);

Constructor
   new
       This is the constructor for an AnnotationParser object.

       The constructor expects one of two arguments, either a 'annotationFile'
       argument, or and 'objectFile' argument.	When instantiated with an
       annotationFile argument, it expects it to correspond to an annotation
       file created by one of the GO consortium members, according to their
       file format.  When instantiated with an objectFile argument, it expects
       to open a previously created annotationParser object that has been
       serialized to disk (see the serializeToDisk method).

       Usage:

	   my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(annotationFile => $file);

	   my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(objectFile => $file);

Public instance methods
Some methods dealing with ambiguous names
       Because there are many names by which an annotated entity may be
       referred to, that are non-unique, there exist a set of methods for
       determining whether a name is ambiguous, and to what database
       identifiers such ambiguous names may refer.

       Note, that the AnnotationParser is now case insensitive, but with some
       caveats.	 For instance, you can use 'cdc6' to retrieve data for CDC6.
       However, This if gene has been referred to as abc1, and another
       referred to as ABC1, then these are treated as different, and
       unambiguous.  However, the text 'Abc1' would be considered ambiguous,
       because it could refer to either.  On the other hand, if a single gene
       is referred to as XYZ1 and xyz1, and no other genes have that name (in
       any casing), then Xyz1 would still be considered unambiguous.

   nameIsAmbiguous
       This public method returns a boolean to indicate whether a name is
       ambiguous, i.e. whether the name might map to more than one entity (and
       therefore more than one databaseId).

       NB: API change:

       nameIsAmbiguous is now case insensitive - that is, if there is a name
       that is used twice using different casing, that will be treated as
       ambiguous.  Previous versions would have not treated these as
       ambiguous.  In the case that a name is provided in a certain casing,
       which was encountered only once, then it will be treated as
       unambiguous.  This is the price of wanting a case insensitive
       annotation parser...

       Usage:

	   if ($annotationParser->nameIsAmbiguous($name)){

	       do something useful....or not....

	   }

   databaseIdsForAmbiguousName
       This public method returns an array of database identifiers for an
       ambiguous name.	If the name is not ambiguous, an empty list will be
       returned.

       NB: API change:

       databaseIdsForAmbiguousName is now case insensitive - that is, if there
       is a name that is used twice using different casing, that will be
       treated as ambiguous.  Previous versions would have not treated these
       as ambiguous.  However, if the name provided is of the exact casing as
       a name that appeared only once with that exact casing, then it is
       treated as unambiguous. This is the price of wanting a case insensitive
       annotation parser...

       Usage:

	   my @databaseIds = $annotationParser->databaseIdsForAmbiguousName($name);

   ambiguousNames
       This method returns an array of names, which from the annotation file
       have been deemed to be ambiguous.

       Note - even though we have made the annotation parser case insensitive,
       if something appeared in the annotations file as BLAH1 and blah1, we
       would not deem either of these to be ambiguous.	However, if it
       appeared as blah1 twice, referring to two different genes, then blah1
       would be ambiguous.

       Usage:

	   my @ambiguousNames = $annotationParser->ambiguousNames();

Methods for retrieving GO annotations for entities
   goIdsByDatabaseId
       This public method returns a reference to an array of GOIDs that are
       associated with the supplied databaseId for a specific aspect.  If no
       annotations are associated with that databaseId in that aspect, then a
       reference to an empty array will be returned.  If the databaseId is not
       recognized, then undef will be returned. In the case that a databaseId
       is ambiguous (for instance the same databaseId exists but with
       different casings) then if the supplied database id matches the exact
       case of one of those supplied, then that is the one it will be treated
       as.  In the case where the databaseId matches none of the possibilities
       by case, then a fatal error will occur, because the provided databaseId
       was ambiguous.

       Usage:

	   my $goidsRef = $annotationParser->goIdsByDatabaseId(databaseId => $databaseId,
							       aspect	  => <P|F|C>);

   goIdsByStandardName
       This public method returns a reference to an array of GOIDs that are
       associated with the supplied standardName for a specific aspect.	 If no
       annotations are associated with the entity with that standard name in
       that aspect, then a reference to an empty list will be returned.	 If
       the supplied name is not used as a standard name, then undef will be
       returned.  In the case that the supplied standardName is ambiguous (for
       instance the same standardName exists but with different casings) then
       if the supplied standardName matches the exact case of one of those
       supplied, then that is the one it will be treated as.  In the case
       where the standardName matches none of the possibilities by case, then
       a fatal error will occur, because the provided standardName was
       ambiguous.

       Usage:

	   my $goidsRef = $annotationParser->goIdsByStandardName(standardName =>$standardName,
								 aspect	      =><P|F|C>);

   goIdsByName
       This public method returns a reference to an array of GO IDs that are
       associated with the supplied name for a specific aspect.	 If there are
       no GO associations for the entity corresponding to the supplied name in
       the provided aspect, then a reference to an empty list will be
       returned.  If the supplied name does not correspond to any entity, then
       undef will be returned.	Because the name can be any of the databaseId,
       the standard name, or any of the aliases, it is possible that the name
       might be ambiguous.  Clients of this object should first test whether
       the name they are using is ambiguous, using the nameIsAmbiguous()
       method, and handle it accordingly.  If an ambiguous name is supplied,
       then it will die.

       NB: API change:

       goIdsByName is now case insensitive - that is, if there is a name that
       is used twice using different casing, that will be treated as
       ambiguous.  Previous versions would have not treated these as
       ambiguous.  This is the price of wanting a case insensitive annotation
       parser.	In the event that a name is provided that is ambiguous because
       of case, if it matches exactly the case of one of the possible matches,
       it will be treated unambiguously.

       Usage:

	   my $goidsRef = $annotationParser->goIdsByName(name	=> $name,
							 aspect => <P|F|C>);

Methods for mapping different types of name to each other
   standardNameByDatabaseId
       This method returns the standard name for a database id.

       NB: API change

       standardNameByDatabaseId is now case insensitive - that is, if there is
       a databaseId that is used twice (or more) using different casing, it
       will be treated as ambiguous.  Previous versions would have not treated
       these as ambiguous.  This is the price of wanting a case insensitive
       annotation parser.  In the event that a name is provided that is
       ambiguous because of case, if it matches exactly the case of one of the
       possible matches, it will be treated unambiguously.

       Usage:

	   my $standardName = $annotationParser->standardNameByDatabaseId($databaseId);

   databaseIdByStandardName
       This method returns the database id for a standard name.

       NB: API change

       databaseIdByStandardName is now case insensitive - that is, if there is
       a standard name that is used twice (or more) using different casing, it
       will be treated as ambiguous.  Previous versions would have not treated
       these as ambiguous.  This is the price of wanting a case insensitive
       annotation parser.  In the event that a name is provided that is
       ambiguous because of case, if it matches exactly the case of one of the
       possible matches, it will be treated unambiguously.

       Usage:

	   my $databaseId = $annotationParser->databaseIdByStandardName($standardName);

   databaseIdByName
       This method returns the database id for any identifier for a gene (e.g.
       by databaseId itself, by standard name, or by alias).  If the used name
       is ambiguous, then the program will die.	 Thus clients should call the
       nameIsAmbiguous() method, prior to using this method.  If the name does
       not map to any databaseId, then undef will be returned.

       NB: API change

       databaseIdByName is now case insensitive - that is, if there is a name
       that is used twice using different casing, that will be treated as
       ambiguous.  Previous versions would have not treated these as
       ambiguous.  This is the price of wanting a case insensitive annotation
       parser.	In the event that a name is provided that is ambiguous because
       of case, if it matches exactly the case of one of the possible matches,
       it will be treated unambiguously.

       Usage:

	   my $databaseId = $annotationParser->databaseIdByName($name);

   standardNameByName
       This public method returns the standard name for the the gene specified
       by the given name.  Because a name may be ambiguous, the
       nameIsAmbiguous() method should be called first.	 If an ambiguous name
       is supplied, then it will die with an appropriate error message.	 If
       the name does not map to a standard name, then undef will be returned.

       NB: API change

       standardNameByName is now case insensitive - that is, if there is a
       name that is used twice using different casing, that will be treated as
       ambiguous.  Previous versions would have not treated these as
       ambiguous.  This is the price of wanting a case insensitive annotation
       parser.

       Usage:

	   my $standardName = $annotationParser->standardNameByName($name);

Other methods relating to names
   nameIsStandardName
       This method returns a boolean to indicate whether the supplied name is
       used as a standard name.

       NB : API change.

       This is now case insensitive.  If you provide abC1, and ABc1 is a
       standard name, then it will return true.

       Usage :

	   if ($annotationParser->nameIsStandardName($name)){

	       # do something

	   }

   nameIsDatabaseId
       This method returns a boolean to indicate whether the supplied name is
       used as a database id.

       NB : API change.

       This is now case insensitive.  If you provide abC1, and ABc1 is a
       database id, then it will return true.

       Usage :

	   if ($annotationParser->nameIsDatabaseId($name)){

	       # do something

	   }

   nameIsAnnotated
       This method returns a boolean to indicate whether the supplied name has
       any annotations, either when considered as a databaseId, a
       standardName, or an alias.  If an aspect is also supplied, then it
       indicates whether that name has any annotations in that aspect only.

       NB: API change.

       This is now case insensitive.  If you provide abC1, and ABc1 has
       annotation, then it will return true.

       Usage :

	   if ($annotationParser->nameIsAnnotated(name => $name)){

	       # blah

	   }

       or:

	   if ($annotationParser->nameIsAnnotated(name	 => $name,
						  aspect => $aspect)){

	       # blah

	   }

Other public methods
   databaseName
       This method returns the name of the annotating authority from the file
       that was supplied to the constructor.

       Usage :

	   my $databaseName = $annotationParser->databaseName();

   numAnnotatedGenes
       This method returns the number of entities in the annotation file that
       have annotations in the supplied aspect.	 If no aspect is provided,
       then it will return the number of genes with an annotation in at least
       one aspect of GO.

       Usage:

	   my $numAnnotatedGenes = $annotationParser->numAnnotatedGenes();

	   my $numAnnotatedGenes = $annotationParser->numAnnotatedGenes($aspect);

   allDatabaseIds
       This public method returns an array of all the database identifiers

       Usage:

	   my @databaseIds = $annotationParser->allDatabaseIds();

   allStandardNames
       This public method returns an array of all standard names.

       Usage:

	   my @standardNames = $annotationParser->allStandardNames();

Methods to do with files
   file
       This method returns the name of the file that was used to instantiate
       the object.

       Usage:

	   my $file = $annotationParser->file;

   serializeToDisk
       This public method saves the current state of the Annotation Parser
       Object to a file, using the Storable package.  The data are saved in
       network order for portability, just in case.  The name of the object
       file is returned.  By default, the name of the original file will be
       used to make the name of the object file (including the full path from
       where the file came), or the client can instead supply their own
       filename.

       Usage:

	   my $fileName = $annotationParser->serializeToDisk;

	   my $fileName = $annotationParser->serializeToDisk(filename => $filename);

Modifications
       CVS info is listed here:

	# $Author: sherlock $
	# $Date: 2008/05/13 23:06:16 $
	# $Log: AnnotationParser.pm,v $
	# Revision 1.35	 2008/05/13 23:06:16  sherlock
	# updated to fix bug with querying with a name that was unambiguous when
	# taking its casing into account.
	#
	# Revision 1.34	 2007/03/18 03:09:05  sherlock
	# couple of PerlCritic suggested improvements, and an extra check to
	# make sure that the cardinality between standard names and database ids
	# is 1:1
	#
	# Revision 1.33	 2006/07/28 00:02:14  sherlock
	# fixed a couple of typos
	#
	# Revision 1.32	 2004/07/28 17:12:10  sherlock
	# bumped version
	#
	# Revision 1.31	 2004/07/28 17:03:49  sherlock
	# fixed bugs when calling goidsByDatabaseId instead of goIdsByDatabaseId
	# on lines 1592 and 1617 - thanks to lfriedl@cs.umass.edu for spotting this.
	#
	# Revision 1.30	 2003/11/26 18:44:28  sherlock
	# finished making all the changes that were required to make it case
	# insensitive, and modified POD accordingly.  It appears to all work as
	# expected...
	#
	# Revision 1.29	 2003/11/22 00:05:05  sherlock
	# made a very large number of changes to make much of it
	# case-insensitive, such that using CDC6 or cdc6 amounts to the same
	# query, as long as both versions of that name don't exist in the
	# annotations file.  Still needs a little work to allow names that are
	# potentially ambiguous to be not ambiguous, if their casing matches
	# exactly one form of the name that has been seen.  Have started to
	# update test suite to check all the case insensitive stuff, but is not
	# yet finished.
	#
	#

AUTHORS
       Elizabeth Boyle, ell@mit.edu

       Gavin Sherlock,	sherlock@genome.stanford.edu

perl v5.14.1			  2GO::AnnotationProvider::AnnotationParser(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net