Gene Ontology (GO) Browser (v1.0) – Primer

Table of contents

  1.       Introduction
  2.       Anatomy of the GO Browser
  3.       What is GO?
  4.       The Structure of GO terms
  5.       Definition of Terminology (v. important)
  6.       Understanding association and relationship statistics (v. important)
  7.       The functions and what they do
  8.       Supported biological items and how they link
  9.       Behind the scenes
1. Introduction

The GO browser provides an interface for analyzing the qualities and behaviour of multiple biological items using GO terms. Such biological items may be genes, proteins and any other data item with a reference to the GO database.

Currently the GO Browser supports the import of items from databases such as LocusLink, MGC, RefSeq & SwissProt. Later versions will be extended to support the import of other biological items, including SAGE tags.

The browser is not designed for the detailed investigation of individual GO terms. Other tools such as Amigo (www.godatabase.org) will provide a better representation of specific terms. Instead the GO Browser will provide a 'macro view' to represent how sets of biological items (such as genes, proteins, etc) relate to GO terms.

This primer document is intended to introduce the user to the basic concepts of the Gene Ontology project. The user will become acquainted with the functionality made available by the GO browser and will gain a detailed understanding of how data is represented in the browser view.

2. Anatomy of the GO Browser

This section includes a group of images in order to orientate the new user with the interface of the GO browser:



Figure 1) An image of the GO browser. The user has dragged these 9 biological items onto the main view of the browser. In this case RefSeq genes, from other windows in DISCOVERYspace, have been imported into the GO browser. The user is about to execute a function; she has selected all of the items in the view and is about to select the function 'Show Items with associations' from the list box.



Figure 2) An image of the GO browser. The view displays the set of terms 'related to' the selected items from figure 1. This was achieved by using the function 'Show related terms'. The browser scores all the terms to represent the strength of their relationships and associations with the previous set of selected items. For example, from the top row one can tell that only 88.89% of the RefSeq items in figure 1 are related to the GO database.

Importing items into the GO browser


Figure 1 shows a GO browser view which has been populated with biological items. Importing items into the browser is done by selecting and dragging items from the data viewer of DISCOVERYspace onto the browser. This action should be familiar to existing users of DISCOVERYspace. Terms and items can be exported from the browser by selecting and dragging them onto the DISCOVERYspace desktop.

About backwards and forwards buttons

Like other browser-like interfaces (for example Internet Explorer, Netscape, etc), the GO browser maintains a history of previous views within this browser session.

The browser history is based upon the fact that each resulting view has a source view which preceded it; thus the progression of the browsing session can be analyzed. As each function is executed the existing source view is stored and the results of the function are displayed in a new resulting view. If the user wishes to backtrack to a source view she may do so via the backwards button. She will be able to return to the resulting view via the forwards button.

If a function is executed from a previous view then a new result view will be created from the previous view. The existing result view will replace the previous one. For example, executing a function on view A produces view B. Going back to view A and executing a new function on it produces view C. View B is then not recoverable via the backwards or the forwards buttons.



Figure 3) A detail from the toolbar of the GO browser. From left to right: the Backwards button, the Forwards button, the Select All button, the Toggle Selection button, the Remove Selection button, the Print image to file button, the New Browser button and, finally, the Select function list box. The user executes functions by making her selection from the available items/terms and then selecting an available function from this list box.

The controls of the toolbar


Figure 3 (above) shows the controls available in the GO browser interface. The next section explains those controls in more detail:

The Backwards button
The backwards button returns the user to the source view of the the current view, unless there was no previous view.

The Forwards button
The counterpart to the backwards button. If the user has used the backwards button to review previous information then she may wish to return to subsequent resulting views. The forwards button returns the user to the result view subsequent to the current view, unless there was no subsequent view.

The Select All button
Functions are executed on the selection of rows from the current view. It is often the case that the user will wish to select all records in the current view. Use the Select All button for this purpose.

The Toggle Selection button
This button deselects all of the currently selected rows and selects all deselected ones. If no rows are selected then all rows will be selected and if all are selected then all will be deselected.

The Remove Selected button
All selected rows are removed from the current view. This is done by creating a new view with only the retained items. Thus the original set of rows is still available via the backwards button.

The Print image to file button
This saves a JPEG copy of the current view to file. This will include all of the view and not just the portion viewable via the scrollbar.

The New Browser button
Creates an exact copy of the current browser, including all previous and subsequent views.

The Select Function list box
Functions execute on the selection from the current view. The available functions are dependent on the type of the current view; terms or biological items. A new view is created to display the results of the function. Thus the previous, source view is always available via the backwards button.

If no items are selected from the current view then the function will still execute but will potentially return no results. Remember to select items before executing a function.

Functions are explained in detail in section vii; The functions and what they do.

3. What is GO?

The GO (Gene Ontology) database does not itself house or define Gene product data;

GO is not a database of gene sequences, nor a catalog of gene products. Rather, GO describes how gene products behave in a cellular context.

GO defines a shared vocabulary of terms which are referenced and linked by other biological databases. By referencing GO, disparate biological items can be linked via the terms that they share. The GO database itself contains some references to items in other databases (eg. PFAM, INTERPRO, etc.) However the real power relies in other database entities referencing GO.

Of particular utility is the LocusLink database which heavily annotates its genetic loci records using GO terminology. Locuslink links to and is linked by numerous other databases and thus by inference links those databases to GO. RefSeq, MGC and UNIGENE all reference GO terms via LocusLink. Other databases such as SWISSPROT can be linked to GO via other datasources such as the GOA (GO Annotation) project.

The GO Browser uses such cross-references extensively to link a given biological item to associated GO terms. The exact methodologies used when linking particular classes of item to GO terms are enumerated in section viii; Supported biological items and how they link.

At its core the GO Browser uses data from the Gene Ontology Consortium (www.geneontology.org). The GO website houses extensive documentation about the project and its processes.

4. The Structure of GO Terms

As noted previously, the GO browser is not primarily designed to investigate individual GO terms but to describe how biological items relate to GO terminology. Therefore much of the structure of how terms interrelate has been slightly abstracted away; when viewing potentially thousands of relationships at one time, properties of individual relationships become less important.

Nonetheless, users of the GO browser should have some grasp of how GO terminology is structured in order to have an accurate understanding of the data being displayed.

The GO database specifies a network of biological terms. According to the GO Consortium;

GO terms are organized in structures called directed acyclic graphs (DAGs), which differ from hierarchies in that a 'child' (more specialized term) can have many 'parents' (less specialized terms).

This property, that 'child' terms may have many 'parent' terms, is key to the power of the GO methodology: a term may be defined by reference to many other terms. However, this also means that the network of terms is extremely difficult to visualize (unlike a standard tree graph).

In addition, GO defines two kinds of parent-child relationship: 'is a' relationships and 'part of' relationships. Thus a term can be described by the fact that it 'is a' specialization and/or is a 'part of' other terms:



Figure 4) In the diagram above term 5749 has three parent terms; a 'part of' parent (5746) and two 'is a' parents (45283, 45257).

In the 'macro view' that the GO browser offers, such 'is a' and 'part of' information is mostly obscured. Certain functions – such as Show IS A ancestors and Show PART OF ancestors – do utilize this information and will be discussed in section vii; The functions and what they do.

At this stage it is worth introducing the terms at the root of the GO hierarchy. The primogenitor, the root parent, of all GO terms is 'Gene_Ontology' (3673). All GO terms are 'part of' this term. The children of 'Gene_Ontology' are 'biological_process' (8150), 'molecular_function' (3674) and 'cellular_component' (5575):

The three organizing principles of GO are molecular function, biological process and cellular component. A gene product has one or more molecular functions and is used in one or more biological processes; it might be associated with one or more cellular components. For example,the gene product cytochrome c can be described by the molecular function term electron transporter activity, the biological process terms oxidative phosphorylation and induction of cell death, and the cellular component terms mitochondrial matrix and mitochondrial inner membrane.

The purpose of this document is not to define the GO project. For a more thorough introduction to the GO structure please reference the GO consortium's 'Introduction to Gene Ontology' (http://www.geneontology.org/doc/GO.doc.html).

5. Definition of terminology

GO itself links terms to 'gene products'. The GO browser allows the user to import gene records as well as protein records and, in future versions, SAGE tags. The generic term used to define these importable records is biological items. In addition, the logical algorithms made available to the user are termed functions.

The relationships between GO terms are defined in terms of parents (more general terms) and children (more specialized terms). The browser makes heavy use of the word ancestor to refer to all the parents of all the parents, recursively, of a given term. This concept of ancestor terms allows the browser to display ancestral commonalities between terms. The antonym of ancestor is descendant, which used to describe the children of the children, recursively.

In order to use the GO browser it is vital to understand two key definitions which are used widely throughout the application; associated and related. In the context of the GO browser these words are given particular meanings:
The GO browser uses this terminology to describe its functionality. Users should be comfortable with the terminology before using the browser.

6. Understanding association and relationship statistics

When displaying potentially thousands of GO terms related with a given set of biological items it is vital that there be some concept of 'more related' in order to sort the GO terms by relevance. To do this the the browser scores the set against the last selected set of items. The browser scores each term by how it is associated with the set of items and how it is related to them. Thus each term is displayed with two scores: Associated(%) and Related(%). These values are also represented in a bar graph; the blue bar is for percentage associated and the red bar for percentage related.



Figure 5) In the image above one can see that the term 'transport' is related to 77.78% of the last selected items, and is associated with 44.44% of them. The term 'synaptic transmission' has  no relationships other than direct associations. And 'coated vesicle' has no associations, but is related to over half of the set.



Figure 6a) Previous to the above screenshot the user has dragged a set of REFSEQ items onto the browser. She has then isolated 1190 items using the 'Show items with associations' functions. In this resulting set she has then selected the first 10 items.



Figure 6b) The image above shows the result of the 'Show Associated Terms' function when executed on the selection from figure 6a. All terms which are directly associated with the selected items are displayed. Rows are ordered by percentage of association to the selected items, with the most associated terms first. Notice that the top two terms have relations to items in addition to the ones that they are directly associated with.



Figure 6c) The image above shows the result of the 'Show Related Terms' function when executed on the selection from 6a. All terms which are related to the selected items are displayed. Terms are ordered by percentage of selected items they are related to, with the most related terms first. Notice that the root term 'Gene_Ontology' is related to 100% of the items; previously to figure 6a, all items without associations were removed from the set using the 'Show items with associations' function.



Figure 7a) The figure 6a-c provided an example case of selecting multiple items. Notice in this image that only one item has been selected.



Figure 7b) The image above shows the result of the 'Show Related Terms' function when executed on the selection from figure 7a. All terms which are related to the selected item are displayed. Remember, the scoring of terms is based upon the last selected set of items; in this case REFSEQ 27764866 'synaptophysin'. Because there is only one item in the set, all resulting terms are related to that one term. Thus all terms are 100% associated/related.

7. The functions and what they do

The type (item or term) displayed in the current view determines which functions are available. The functions are executed by selecting one of the items in the Select Function list box. All functions execute on the current row selection. Before reading this section please ensure that you are familiar with the terms as outlined in section v; Definition of Terminology.

All views of GO terms will be ordered by percentage related unless otherwise noted. All terms are scored by the last selected set of biological items.

Item Functions
Functions available when the current view displays biological items:

Show associated terms
Displays the terms directly associated with the currently selected items. The resulting view will be ordered by percentage association.

Show related terms
Displays the terms related to the the currently selected items.

Show items with associations
Displays only those items which have associations with GO terms.

Show items without associations
Displays only those items which do not have associations with GO terms.

Term Functions
Functions available when the current view displays GO terms:

Show associated items
Displays only those items, from the last selected set of items, that have direct associations with the current selection of GO terms.

Show related items
Displays only those items, from the last selected set of items, that are related to the current selection of GO terms.

Show parent terms
Displays all parent terms of the selected GO terms. Both 'is a' and 'part of' parents are returned.

Show ancestor terms
Displays all ancestor terms of the selected GO terms.

Show children with related items
Displays those child terms, of the current GO selection, that are related to the last selected set of items.

Show all children
Displays all child terms of the current GO term selection, regardless of whether they have relationships to last selected set of items. This function may take some time as it potentially has to requery the database to load child terms without relationships.

Show unassociated items
Displays only those items, from the last selected set of items, that are NOT associated with the current selection of GO terms.

Show unrelated items
Displays only those items, from the last selected set of items, that are NOT related to the current selection of GO terms.

Show IS A ancestor terms
This function returns all ancestors via IS A relationships. Thus the function returns all terms which describe what a selected term is, but NOT what it is part of.

Show PART OF ancestor terms
This function returns ancestors which are ancestors of the selected terms via PART OF relationships. Note that ancestors which are ancestors via IS A relationships are not displayed.

Therefore, if the function is executed on term A. Term B which is a parent of term A via an IS A relationship will not be returned. However, term C which is a parent of B via a PART OF relationship will be returned. A is a B which is part of C; A is part of C.

As all terms are PART OF the root term 'Gene_Ontology', this function will ALWAYS return at least the root term.

8. Supported biological items and how they link

This section will list the biological items which are currently supported by the GO browser and will indicate the methodologies used to link them to GO terms:

LocusLink
Locuslink records have extensive GO term annotation and link directly to GO.

SwissProt
SwissProt records are linked to GO via references carried by the GO database. Future versions will utilize references provided by the GOA project.

Interpro
Interpro records are linked to GO via references carried by the GO database.

RefSeq
RefSeq records hold references to LocusLinks. The LocusLinks are then used to link the ReqSeq record to GO.

MGC
MGC records hold references to LocusLinks. The LocusLinks are then used to link the MGC record to GO.

Unigene
Unigene records hold references to LocusLinks. The LocusLinks are then used to link the Unigene record to GO.

9. Behind the scenes

The GO browser is backed by a database connection from which it retrieves GO terms and links them to biological items. Thus the browser is entirely dependent upon the correct configuration of the database connection and the continuing health of the network connection and the database. In DISCOVERYspace the browser shares the connection with the DISCOVERYdb plugin.

When the user drags and drops biological items onto the browser there is much work going on behind the scenes. First of all the items are linked to the GO terms (see section viii above) which involves a database query for each item imported. Then the associated GO terms are themselves loaded, each term involving multiple queries. Then all ancestor terms of the associated terms are loaded. Dependent on the number of items and number of terms involved, this can take time. Subsequent functions selected by the user will not execute until these steps have completed.

Once this process is complete all terms related to the imported objects are held in memory; no further database querying is necessary. Functions executed by the user query only the in-memory network of items and terms. The one exception to this is the Show all children function which can reference terms other than those related to the imported items.

The browser had been tested with over 5000 imported items and seems to work well. Be patient when importing large numbers of items because it is inevitable that the database queries will take time to run.

Any errors occurring will be output to the log file. If the browser becomes unresponsive then please close it and try again. If problems persist then please send a bug report via DISCOVERYspace.


NEIL ROBERTSON 21 NOV 2003 (Revision 1.0)