The SAGE Plugin

DISCOVERY Platform

User's Manual Version 1.0

Gene Expression Bioinformatics Team
Canada's Michael Smith Genome Sciences Centre
BC Cancer Research Centre
BC Cancer Agency
Vancouver, BC, Canada

Copyright©2003

1. Change Record

Version Date Author Comment
0.1 20-Aug-2003 Scott Zuyderduyn (scottz@bcgsc.ca) Initial draft.
1.0
9-Sept-2003
Chris Fjell (cfjell@bcgsc.ca)
Final Draft.
Includes some text kindly contributed by Greg Vatcher.

2. Intended Audience

This document describes the functionality and usage of the SAGE Plugin.  This is an independent add-on for the DISCOVERYspace software that provides specific features for serial analysis of gene expression (SAGE) data.  The intended audience is the end-user.

3. Table of Contents

(not yet compiled)

You may see these icons marking text within the documentation:

An additional explanation on a particular task.
An additional explanation on a particular feature.
Non-critical information of general interest.
Information for DISCOVERY Platform administrators.
A tip on how to utilize a particular feature.

4. Getting Started Quickly

If you are already familiar with the DISCOVERYspace software, then you can jump right into this document.  However, if you are new to DISCOVERYspace, it's a good idea to read the DISCOVERYspace Users' Manual before reading this document.  This manual will assume you have a working knowledge of the DISCOVERYspace software.

5. Overview

The DISCOVERY Platform is a comprehensive set of software tools to store, visualize, and manipulate genomic data. The Platform consists of several components:

This document is organized around the feature set of the DISCOVERY Platform, not from a perspective of use for a specific purpose.  The user is encouraged to read through Chapter 11 How-To (page 43).

6. Hardware and Software Requirements

6.1 The SAGE Plugin

Minimum

The SAGE Plugin requires a copy of DISCOVERYspace 3.0 to be installed on the user's computer.  Hardware requirements for DISCOVERYspace can be found in the DISCOVERYspace User's Manual.

Recommended

>1Gb RAM if the number of SAGE libraries to be analyzed concurrently is large (>10).

A common question is why DISCOVERYspace requires so much memory, especially for SAGE analyses.  The Java language is pseudo-compiled.  This means that while the code is compiled for optimization, it is not fully translated into machine language (the .exe files you normally run on the Windows OS are executable machine language files).  This is because Java is cross-platform, and in order to maintain compatibility across operating systems, the code can not be compiled to the native language of the OS.  The pseudo-compiled code (in .class files) is interpreted by the Java Virtual Machine (JVM), which provides a layer between the pseudo-compiled code and the operating system.  The JVM requires processing power to do this, and so Java programs typically run a little slower.  In order for this type of approach to work, objects in the system require a lot of information to be associated with them.  This increases the minimum memory size that an object must take in order to exist.   Without getting into too much detail, the result is that a string of length 10 (a SAGE tag, for example), would normally occupy 10 bytes; however, a Java string requires about 80 bytes of initial memory, plus the size of the string itself, for a total of 90 bytes.  Thus, at a minimum, 9 times more memory is required to store a SAGE library in Java.  Additional memory is also required in order to optimize further operations on these objects within DISCOVERYspace (linking to other objects, etc.). The trade-off of these increased hardware requirements is that Java development is typically much quicker, and the software is capable of running on any OS (Windows/Mac/Linux/etc.) that has a JVM available.

7. Definitions

The following terms are used in this document to refer to DISCOVERY Platform features.  These include application-specific terms to clarify software terms for the general user and administrator.

7.1 Dataset

A specific selection of data from a single datasource.  This is typically the resulting items matching a search against a datasource.

7.2 Datasource

A set of related data from a single source.  For example, LocusLink is a datasource, containing a set of data with relationships between the data fields determined by the administrators of LocusLink.

7.3 Right click (Left click) actions

User actions corresponding to using the left mouse button and right mouse button.  Users having other pointing devices  will be configured differently: left-click is also sometimes called simply select or primary select; while right-click is also called alternate select.  The terms, left- and right-click are sometimes used in this document for brevity.  Left-click and right-click buttons may also differ with certain operating systems.

8. Installation

8.1 SAGE Plugin Installation

The SAGE Plugin is installed by adding the distribution files to the plugins/ directory, which resides in the directory where DISCOVERYspace is installed.  For example, if DISCOVERYspace has been installed to C:\Program Files\DISCOVERYspace3, then the SAGE Plugin distribution files would be placed in the C:\Program Files\DISCOVERYspace3\plugins directory.

8.2 Confirming Installation

When DISCOVERYspace is executed, the application will look for installed plugins.  You can confirm that the SAGE Plugin is correctly installed and loaded by noting it's appearance on the initial loadup screen (Figure 8.2).


Figure 8.2 The startup splash screen will show which plugins have been installed.

9. SAGE Plugin Software Reference

This section describes the different components of the SAGE Plugin.  If you are familiar with the SAGE Plugin and are primarily interested in learning how to do a specific task, you may want to jump to the next section.

9.1 SAGE Plugin Application

On start-up, the main application window will be displayed.  When the SAGE Plugin is installed, several modifications to the interface can be noted.  These are described in detail below:

9.1.1 Status Bar

Located at the bottom left of the main frame (on the right side of the status bar), the SAGE Control Panel toggle button is available (Figure 9.1.1).  Clicking this button will toggle the visible state of the SAGE Control Panel, which provides easy access to all saved or defined SAGE library definitions and expression profiles (see SAGE Control Panel).


Figure 9.1.1 The SAGE Control Panel toggle button (depicted with three blue cartoon SAGE tags).

9.1.2 Menu Bar

When the SAGE Plugin is installed, additional options become available on the application's Menu Bar.

9.1.2.1 Tools > SAGE Menu

The Tools > SAGE submenu contains the following menu items:

Icon Menu Item Shortcut Description

Define a GEO Library

Open the widget to define a new GEO library.
View Defined Libraries and Comparisons
Toggles the visibility of the SAGE Control Panel.
Define New Library
Opens the widget to define a new SAGE library.

Define New Library Comparison
Opens the widget to define a new pair-wise expression profile.
Multi-Library Set Manipulations
Opens the "Venn Table" widget for set manipulations of multiple libraries.

Search For Tag In Libraries
Opens the "Search For Tag In Libraries" widget that looks for the expression of a given tag amongst available SAGE libraries.

9.1.3 SAGE Toolbar

The SAGE toolbar provides shortcut access to common operations for SAGE analysis.  This toolbar is only visible when the View > Toolbars > SAGE option is selected.  By default, the SAGE toolbar is not visible.

Toolbar
Button
Name Shortcut Description
Define New Library
Opens the widget to define a new SAGE library.
View Comparison Details
View an overview of the mathematical properties of the currently loaded expression profile.
2D Expression Profile
View the current expression profile using a 2D graph.
3D Expression Profile
View the current expression profile using an interactive 3D graph.
Multi-Library Set Manipulations
Opens the "Venn Table" widget for set manipulations of multiple libraries.

Table 9.4 SAGE Toolbar Buttons

9.2 Defining and Loading Libraries

SAGE libraries are loaded using the dialogs accessible from the Tools > SAGE > Define New Library menu or the Tools > SAGE > GEO > Define a GEO Library menu. The following dialog appears when the Tools > SAGE > Define New Library is selected (the dialog for GEO Libraries is similar), listing all of the available libraries sorted by Identifier. As with all DS windows clicking on a column header sorts the table by that column (for example: clicking on the ‘NCBI Taxonomy’ header will sort the table by species).

Select libraries by clicking on them. Select multiple libraries by holding down the Ctrl key, select a range of libraries by holding the Shift key while selecting the top and bottom of the range. In this case a lung cancer library and a normal lung library are selected.



Add Library
Figure 9.2.1 Define SAGE Libraries Dialog

The Add File button allows the use to load SAGE tags from file. The Add button (enabled when a row is selected from the table) opens the following window, used for filtering the SAGE data.

Library Options
Figure 9.2.2 Library Options Dialog

The Options available here are:


9.3 SAGE Control Panel

To view the SAGE data, open the SAGE Control Panel to continue with the SAGE data, by selecting the menu item,  Tools > SAGE > View Defined Libraries and Comparisons or the icon at the bottom left of the screen.


SAGE Control Panel
Figure 9.2.3 SAGE Control Panel

Check-boxes indicate that data has been loaded and calculations performed for the corresponding item. The disk icon indicates the corresponding data has been saved to file.  Additional details about libraries are available from the Data Fields menu. In the figure, a comparison has already been defined and loaded (described in the following section).

Right-clicking on one or more selected Libraries table rows shows the options


Right-clicking on one or more selected Comparisons table rows shows the options


9.4 Defining Library Comparisons

Comparisons between libraries are performed by first defining the library comparison. Select from the main menu, Tools -> SAGE -> Define New Library Comparison, to open the ‘Define Expression Comparison’ window which lists the previously defined libraries.


Library Comparison
Figure 9.4.1 Expression Comparison Dialog


To compare two groups of libraries, select one or more libraries for each axis. Add a library to an axis group by selecting it and clicking ‘Add’ for the appropriate group. Remember to give the axes and title descriptive names so they are understandable in later displays. Selecting OK will add the comparison to the SAGE Control Panel.  The comparison calculations are not performed until the Load action is performed on the comparison.


9.5 Library Comparison Details

This window is displayed in response to the View Comparison Details selection from the SAGE Control Panel. The Abundance Classes for the two groups of libraries are shown as well as statistics for differential and similar expression.  Abundance Classes gives the distribution of number of times tags appear, as counts of the number of tags falling in a range of intervals.  For example: in the figure below for the group of libraries labelled normal brain, there were 699 unique tag sequences where the number of duplicate tags were between 10 and 99. The total number of tags having between 10 and 99 copies of the same sequence was 15 966.

The numbers of differentially expressed tags appear in the table labelled Differentially Expressed, given for confidence levels of 99.9%, 99% and 95%. Finally, the number of Similarily Expressed tags appear at alpha value of 0.05.

Comparison Details

Figure 9.5.1 Library Comparison Details Display

9.6 Library Comparison 2D Profile

Clicking 2D Expression Profile on the SAGE Control Panel shows a graph of the SAGE data. Each dot represents a SAGE tag and the location of the dot reflects its expression level in each library (or group). Width, greenish tags are differentially expressed between the groups, and blue tags are not differentially expressed. The confidence contours appear as lines for levels 99.9%, 99% and 95%.

2D Library Chart

Figure 9.6.2 2D Expression Profile Display


The Select menu contains options for selecting tags: Select Upregulated Datapoints, Select Downregulated Datapoints, Select Insignificant Datapoints, Deselect All Datapoints, Select Datapoints Based On Criteria.

For example, selecting upregulated datapoints results in the following display.

2D Chart, Upregulated selected

Figure 9.6.3 2D Expression Profile Display With Up-Regulated Tags Selected


The tool bar buttons at the top of the display perform the following actions.

Icon
Action

Select data points

Zoom into selected region

Drag selected data points

Capture graph to image file

Export graph data to file

Print graph

Change to default graph view

Represent counts (number of tags represented by a single data point) by rings arount data points:

Differentially expressed

Similarily expressed

Selected data points


Selected points can be identified by selecting Drag Mode (select the button), then right click a point and drag it to the background of the workspace. A Data Viewer display similar to the following figure will be displayed. From the Data Viewer, the genes represented by the tags can be identified. (See the DISCOVERY Platform User's Manual for description of the Data Viewer display.)

Upregulated Tags

Figure 9.6.4 Data Viewer Listing Up-Regulated Tags


Tags Mapped to Genes

Figure 9.6.5 Data Viewer Listing Up-Regulated Tags and Genes


9.7 Library Comparison 2D Profile

Selecting Multi-Library Set Manipulations from the SAGE Control Panel displays the following display.  The left pane displays the list of libraries for this display. Tags may be dragged from Data Viewer tables to the Auxilliary Tags panel accessible from the corresponding tab. The right panel displays the tags in each library and/or comparisons between tags in the libraries.

Venn

Figure 9.7.1 Multi-Library Set Manipulations Display

The tool bar buttons have the following meanings. On the libraries panel:

Icon
Description

New Venn table

Select all

Toggle row selection

Tag count cutoff. Tags with fewer counts than this are ignored in all calculations.

Ignore all tags from this library

Include all tags from this library

Include all tags from this library that are in common

Exclude all tags from this library

On the tags panel:

Icon
Description

Save data to file

Copy data to clipboard

Select all

Toggle row selection

Display mode selection. Options are
Count - tag counts
Frequency - frequency of tags in library
Ratio - ratio of tags between libraries
Ratio, multi ref - ratio of tags between libraries, calculated differently (???)
P Value - p value for tag
P Value, multi ref - p value for tag, calculated differently (???)


Enable value cutoff. Tags with counts below this are ignored and removed from the table.

Cutoff for displayed values

Filter super-singleton tags


9.8 Search for Tags in Libraries

The Tools > SAGE > Search For Tag In Libraries menu item raises a new display containing a list of all available SAGE libraries on the left-hand pane (titled Available), and a table showing the abundance of the tags in each library in the right-hand pane (titled Results). Tags are added to the top of the Results table after being added to the display. Tags may be added one at a time using the dialog at the bottom of the left-hand pane
( in the figure below) and pressing the button ().

The tags may also be added to the Search For Tag display by dragging rows from a Data Viewer displays.

Search For Tag


The menus available from the Search For Tag display are the following: