DISCOVERY Platform

User's Manual Version 1.0

Gene Expression Bioinformatics Team
Canada's Michael Smith Genome Sciences Centre
BC Cancer Research Centre
BC Cancer Agency
Vancouver, BC, Canada

Copyright©2003

1. Change Record

Version

Date

Author

Comment

0.1

05-May-2003

Chris Fjell (cfjell@bcgsc.ca)

Initial draft.

0.2

21-Jul-2003

Scott Zuyderduyn (scottz@bcgsc.ca)


1.0

11-Sept-2003

Chris Fjell (cfjell@bcgsc.ca)

Final draft

2. Intended Audience

This document describes the operation and maintenance of the DISCOVERY Platform software system.  The intended audience consists of end-users and administrators of the DISCOVERY Platform system.

3. Table of Contents

1. Change Log
2. Intended Audience
3. Table of Contents
4. Getting Started Quickly
5. Overview
6. Hardware Requirements
6.1 DISCOVERYspace Client
6.2 DISCOVERYdb Server
7. Definitions
8. Installation
8.1 Client Application Installation
8.2 Database Server Installation
9. DISCOVERY Platform Software Reference
9.1 DISCOVERYspace Application
9.1.1 Status Bar
9.1.2 Menu Bar
9.1.3 Project Toolbar
9.1.4 Window and Desktop Management
9.2 Searching Databases
9.2.1 The Search Dialog
9.2.2 The Data Viewer
Appendix I. Datasources
AI.1 Disease
AI.1.1 Allelic Variant
AI.1.2 OMIM
AI.2 Functional Domain
AI.2.1 Interpro
AI.2.2 Pfam
AI.2.3 Swiss-Prot Feature
AI.2.4 Swiss-Prot Organelle
AI.3 Gene
AI.3.1 Unigene
AI.3.2 Genes with Sequence
AI.3.2.1 Refseq
AI.3.2.3 Mammalian Gene Collection
AI.3.3 Genecards
AI.3.4 InParanoid
AI.3.5 LocusLink
AI.3.6 Wormbase
AI.4 Miscellaneous
AI.4.1 Gene Ontology (GO)
AI.4.2 NCBI Taxonomy
AI.5 Pathway
AI.5.1 Pathways with Images
AI.5.1.1 Biocarta Pathway
AI.5.1.2 KEGG Pathway
AI.6 Protein
AI.6.1 Swiss-Prot
AI.6.2 WormPEP
AI.7 Subcellular Localization
AI.7.1 Subcellular Localization Prediction
AI.7.1.1 MOTT Subcellular Localization Prediction
AI.7.1.2 PSORT Subcellular Localization Prediction

You may see these icons marking text within the documentation:

An additional explanation on a particular task.
An additional explanation on a particular feature.
Non-critical information of general interest.
Information for DISCOVERY Platform administrators.
A tip on how to utilize a particular feature.

4. Getting Started Quickly

If you already have a good idea of the purpose and use of the DISCOVERY Platform, you can jump to Client Application Installation (Section 8.1) and then to the How-To chapter (XXX not written yet). However, all users are encouraged to read this document completely – not all features of the DISCOVERY Platform are obvious.

5. Overview

The DISCOVERY Platform is a comprehensive set of software tools to store, visualize, and manipulate genomic data. The Platform consists of several components:

This document is organized around the feature set of the DISCOVERY Platform, not from a perspective of use for a specific purpose.  The user is encouraged to read through the How-To chapter (XXX not written yet).


Figure 5.1 DISCOVERY Platform

6. Hardware Requirements

6.1 DISCOVERYspace Client

Minimum
Operating System with Java 1.4.1 or later installed (Windows NT/2000/XP or Linux)
Intel Pentium or AMD Athlon/Duron CPU or equivalent in processing speed
128Mb RAM
Internet connection
100Mb free hard disk space
Keyboard; Mouse
Recommended
2GHz Intel Pentium IV
1Gb RAM
30Gb hard disk
32Mb video card (e.g. GeForce-2 MX)
High speed internet connection
Keyboard; Mouse
A common question is why DISCOVERYspace requires so much memory, especially for SAGE analyses.  The Java language is pseudo-compiled.  This means that while the code is compiled for optimization, it is not fully translated into machine language (the .exe files you normally run on the Windows OS are executable machine language files).  This is because Java is cross-platform, and in order to maintain compatibility across operating systems, the code can not be compiled to the native language of the OS.  The pseudo-compiled code (in .class files) is interpreted by the Java Virtual Machine (JVM), which provides a layer between the pseudo-compiled code and the operating system.  The JVM requires processing power to do this, and so Java programs typically run a little slower.  In order for this type of approach to work, objects in the system require a lot of information to be associated with them.  This increases the minimum memory size that an object must take in order to exist.   Without getting into too much detail, the result is that a string of length 10 (a SAGE tag, for example), would normally occupy 10 bytes; however, a Java string requires about 80 bytes of initial memory, plus the size of the string itself, for a total of 90 bytes.  Thus, at a minimum, 9 times more memory is required to store a SAGE library in Java.  Additional memory is also required in order to optimize further operations on these objects within DISCOVERYspace (linking to other objects, etc.). The trade-off of these increased hardware requirements is that Java development is typically much quicker, and the software is capable of running on any OS (Windows/Mac/Linux/etc.) that has a JVM available.

6.2 DISCOVERYdb Server

One fast machine with lots of memory and disk space, depending on your anticipated usage.

7. Definitions

The following terms are used in this document to refer to DISCOVERY Platform features.  These include application-specific terms to clarify software terms for the general user and administrator.

7.1 Dataset

A specific selection of data from a single datasource.  This is typically the resulting items matching a search against a datasource.

7.2 Datasource

A set of related data from a single source.  For example, LocusLink is a datasource, containing a set of data with relationships between the data fields determined by the administrators of LocusLink.

7.3 Right click (Left click) actions

User actions corresponding to using the left mouse button and right mouse button.  Users having other pointing devices  will be configured differently: left-click is also sometimes called simply select or primary select; while right-click is also called alternate select.  The terms, left- and right-click are sometimes used in this document for brevity.  Left-click and right-click buttons may also differ with certain operating systems.

8. Installation

8.1 Client Application Installation

The DISCOVERY Platform client application is DISCOVERYspace.  The functionality of the main application is extended through additional (optional) plugins.

8.1.1 Java JRE

If you haven't already installed the Java Runtime Environment 1.4.1 or higher, obtain and install it from http://java.sun.com.

8.1.2 DISCOVERYspace

Install the application by running the InstallAnywhere application from http://sage.bcgsc.ca/intranet/content/projects/ds/index.mhtml.
The following plugins are available for extended functionality:

8.1.3 SAGE Plugin

This is the analysis plugin for serial analysis of gene expression data.  The plugin is currently bundled with the core DISCOVERYspace (8.1.2) application.

8.1.4 CGH Plugin

This is the analysis plugin for comparative genomic hybridization experiments. (Availability TBA).

8.1.5 SAGEsoma

This is the plugin for viewing expression data on a karyotype. (Availability TBA).

8.1.6 NLP Plugin

This is a plugin for viewing natural language processing data. (Availability TBA).

8.2 Database Server Installation

The DISCOVERYdb database server will need to be installed to provide data to the client applications. To install DISCOVERYdb database server perform the following:

8.2.1 Install MySQL.

Instructions and software are available from MySQL AB (http://www.mysql.com).
Microsoft Windows Specific
For some reason, MySQL wants to lower-case all table names on Windows. Start the server with "-O lower_case_table_names=0" to fix this (ref. http://www.urbansim.org/docs/greenflash/database_information.html).
The maximum packet size for MySQL is ~1M by default. This will not be large enough to slurp in larger tables. The server .cnf file should be modified to include: "--set-variable = max_allowed_packet=4M"

8.2.2 Populate DISCOVERYdb with data.

This is done using the bioDatasource software.  This command line tool converts a flat-file datasource into a relational database. The tool is run using the syntax below:
CommandLineMain [OPTIONS] schemaPath dataPath
where the schemaPath is the path to a datasource schema file and the dataPath is the path to the source data file.  in some cases (for example, when using the GENECARDS format, the source data is distributed as a directory structure).  In these cases the dataPath should point to the root directory of the directory structure.

Option

Description

-help, -?

Prints help information for command line usage and exits.

-version

Prints version information and exits.

-v

Enables verbose output.

-t

Runs in test mode.  In this mode the database is not used, however, the source data will be parsed as normal.  This mode is useful for checking configuration and file format issues.

Table 8.1 bioDatasource parser command line options

9. DISCOVERY Platform Software Reference

This section describes the different components of the DISCOVERY Platform.  If you are familiar with the DISCOVERY Platform software and are primarily interested in learning how to do a specific task, you may want to jump to the next section.

9.1 DISCOVERYspace Application

On start-up, the main application will be similar to Figure 9.1 after it has started.  The main display areas are visible: the Menu Bar (top of the frame), the Project Toolbar (below the Menu Bar), and the Status Bar (bottom of the frame).  These are described in detail below:

Figure 9.1 The DISCOVERYspace main frame.

9.1.1 Status Bar

Located at the bottom of the main frame, this bar indicates the status of the connection to the DISCOVERYdb database server on the left-hand side (the box that says "No connection" in Fig. 9.1), the current memory usage (the white box in Fig. 9.1), and buttons to access various desktops (each desktop is a different main window).
The software layer that communicates with DISCOVERYdb is customizable at the code level.  This code also contributes the database status bar described above.  This makes it possible to implement a pluggable layer between DISCOVERYspace and a different DBMS, and also to add or remove visual components associated with database communications.  Therefore, customized plugins for a different DBMS may not have this status bar, or its appearance may be different.

9.1.2 Menu Bar

This contains menu items for making detailed changes to the application appearance and database connection, as well performing most routine operations in the application.
9.1.2.1 Project Menu
The Project Menu contains the following menu items:

Icon

Menu Item

Shortcut

Description

New

Ctrl-N

Starts a new project.

Open

Ctrl-O

Opens an existing project from disk.


Close


Closes the current project.

Save

Ctrl-S

Saves the current project to disk using the current project filename.  If no filename has been given, the user will be prompted for one.

Save As...


Saves the current project to disk under a different filename.


Import Data > Data Lists...


Displays a dialog window to load data from delimited text files.


Import Data > Knowledge XML...


Displays a dialog window to load data in the DISCOVERYspace native XML format.

Properties


Displays a dialog window with tabs for altering application preferences (desktop appearance, colours, plugin settings, etc.)


Printer Setup...

Ctrl-Shift-P

Displays a dialog window to change the default properties for printing.


Quit

Ctrl-Q

Exits the application.

Table 9.1 Project Menu Items
The project functionality in DISCOVERYspace allows you to define a project and author, and create a home for any files that you create during a DISCOVERYspace session.  After you open the project again, loading operations will default to your project directory.  This makes your time with the software more efficient, and provides a way to organize different projects and analyses. 
9.1.2.2 View Menu
The View Menu contains the following menu items:

Icon

Menu Item

Shortcut

Description


Toolbars >


This submenu will contain a list of available toolbars, with a checkbox to signify if the toolbar is currently enabled.  DISCOVERYspace plugins can contribute items to this sub menu.  If no plugins are currently installed that have toolbars, the submenu will display "No additional toolbars available."

Table 9.2 View Menu Items
9.1.2.3 Data Menu
The Data Menu contains buttons that, when selected, provide a dialog box to create a dataset from the data item selected.  The Data Menu is laid out in a tree.  For example, one can create a dataset of Human Refseq entries, based on keyword, by clicking Data > Gene > Genes with Sequence > Refseq > Human Refseq.  For a list of available datasources, see Appendix I.
9.1.2.4 Tools Menu
The Tools Menu contains menu items specific to installed plugins.  For example, if the SAGE plugin is installed, this menu will contain an option Tools > SAGE > Search For Tag In Libraries.  See the documentation for the specific plugin of interest for more information. 
9.1.2.5 Help Menu
The Help Menu contains the following menu items:

Icon

Menu Item

Shortcut

Description

Help

F1

Displays on-line help.

About


Displays information about the application.


Report Bug


Displays a window for the user to submit a report of a defect, including a copy of the application log.


Request Feature


Displays a window for the user to submit a feature request, including a copy of the application log.


Show application log


Displays the application log.

Table 9.3 Help Menu Items
Many components in DISCOVERYspace support the use of the F1 help shortcut key.  When you are currently using a widget in the software, you can try pressing F1 to jump to the relevant section of the documentation for help.

9.1.3 Project Toolbar

The Project Toolbar provides shortcut access to common operations for managing projects.  All of these options can also be accessed in the Menu Bar > Project menu.

Toolbar
Button

Name

Shortcut

Description

New

Ctrl-N

Starts a new project.

Open

Ctrl-O

Opens an existing project from disk.

Save

Ctrl-S

Saves the current project to disk using the current project filename.  If no filename has been given, the user will be prompted for one.

Save As...


Saves the current project to disk under a different filename.

Table 9.4 Project Toolbar Buttons

9.1.4 Window and Desktop Management

DISCOVERYspace allows you to have multiple desktops that you can use to organize your work.  At the bottom right of the main application frame, there is a series of icons that you can use to switch desktops (Fig. 9.2).  You can see which desktop you're currently viewing by checking the text printed just above these icons (i.e. "DESKTOP 1" in Fig. 9.2).  It's possible to increase or decrease the number of available desktops by changing the application settings (see ?????).  If more than four desktops are defined, then a button with a double arrow (>>) (Fig. 9.2) will be displayed.  When clicked, you will see a list of additional desktops which you can select.
Figure 9.2 The bottom right of the application main frame has components for quick access to multiple desktops.
DISCOVERYspace also allows you to move currently visible windows to different desktops.  Windows which can be manipulated in this way have a distinctive look to their top bar (Fig. 9.3).  If you right-click on this top bar, you will get a list of available desktops.  If you select a different desktop, the window will be moved to the desktop selected.  In order to see this window again, you will need to switch your active desktop (see above) to the one the window was moved to.
You're also able to change the title text for these types of windows.  The "Rename Window..." option that appears when you right-click the window's top bar (Fig. 9.3) will result in a dialog where you can change the title text (Fig. 9.4).  This will allow you to describe the information being displayed in the window in a more personalized way.
Figure 9.3 Right-clicking on the top bar of DISCOVERYspace windows allows you access to options that can move your window to a different desktop or rename the title text.
Figure 9.4 The Rename Window dialog allows you to specify a new title for the selected window.
 

9.2 Searching Databases

One the most powerful features of DISCOVERYspace is the ability to do a wide array of keyword searches on available databases.  On the Menu Bar, the Data menu will contain an organized list of searchable databases and datatypes (Fig. 9.5).  Selecting an item from this menu will result in a search dialog (Fig. 9.6) described below. 
The number of databases and searchable fields is vast in a full implementation of the DISCOVERY Platform.  It's a good idea to keep the contents of Appendix I. Datasources handy to help you devise your searches.
Figure 9.5 The Data menu on the application Menu Bar allows you to search available databases.

9.2.1 The Search Dialog

When you select an item from the Data menu, the Search Dialog will appear (Figs. 9.6-9.7).  This dialog allows you to get data based on keywords in searchable fields.  The dialog contains a Search Field combo box that you can use to select what information you wish to search on (Fig 9.6).  For example, the LocusLink database has accessions, annotations, chromosome number and others that can be searched.


Figure 9.6 The Search Dialog allows you to describe your search.  The Search Field combobox contains the searchable fields for the database of interest.
You use the Search Term text field to enter the keyword you wish to search for.  In addition, the Case Sensitive and Exact Match checkboxes allow you to specify if you want capitalization to be respected in the search, and if you want the term to exactly match the value of the field, respectively.  Often, additional information about the contents of a particular Search Field will be displayed at the bottom of the Search Dialog (i.e. Fig 9.7 shows "Descriptive text for this entry." to describe the "Annotation" Search Field).
Figure 9.7 The Search Dialog allows you to describe your search.  The Search Term contains the value you want to search for, and the Case Sensitive and Exact Match checkboxes allow you to define the type of search. If the All Fields checkbox is selected, all fields are searched.

9.2.2 The Data Viewer

The Data Viewer is a general widget used to display and manage sets of data (Fig. 9.8).  The Data Viewer is the most commonly used component of DISCOVERYspace, and knowing how to use it effectively is vital to getting the most out of the application.
The Data Viewer is centred around primary data, the data that populates the Data Table on a row-by-row basis (for example LocusLink Accession and Annotation entries in Figure 9.8).  Initially, the Data Viewer display is populated with two columns of primary data, Accession and Annotation).
Figure 9.8 The Data Viewer is one of the most common components used in DISCOVERYspace.

The Data Viewer is organized into five regions: The Data Table, the Top Bar, Menu Bar, Tool Bar, and Status Bar.

9.2.2.4 Data Table
In addition to primary data, the Data Viewer also displays linked data, the data related indirectly to the primary data.  For example, Refseq data may be displayed on a Data Viewer originally resulting from a search against LocusLink data.  Depending on the relationship between the primary and linked data, more than one piece of linked data may associate with the primary data on a single row in the table. For this reason, linked data appears on the data table as drop-down boxes when the number per row is greater than one. The number preceeding the drop-down box is the number of items in the drop-down box (for example, in the first row of the Human Refseq column in Figure 9.9, there are 5 items).

Figure 9.9 A Data Viewer window with primary and linked data.



Figure 9.10 One cell from the data table composed of a drop-down box of linked data.

The data table rows can be sorted by a clicking on the header of a column in the data table. Clicking a second time inverts the order.

Selected rows may be copied to a new Data Viewer display by dragging the rows while holding down either the left or right/middle mouse button (depending on your pointing device configuration). As well, right-clicking on a column will raise a popup menu to perform the copy. Note that the type of the primary data for the new Data Viewer is determined from the column that originated the copy.

9.2.2.1 Top Bar
The Top Bar shows the title of the current data set.  By default, this is the name of the type of data currently being displayed in the Data Viewer followed by a description of the search that generated the list of data if the display is due to a data search.  The Top Bar has some useful features described in section 9.1.4 (Desktop and Window Management).
9.2.2.2 Menu Bar
The Menu Bar contains selections for most of the operations available to the user of the Data Viewer.
Edit Menu
The Edit Menu contains the following items:

Icon

Menu Item

Shortcut

Description

Copy

Ctrl-C

Copies the currently selected data rows to the clipboard.

Cut

Ctrl-X

Removes the currently selected data rows and moves them to the clipboard.

Paste

Ctrl-V

Copies the data contained in the clipboard to the current Data Table.


Paste Special...


(not yet implemented)

Delete

Del

Deletes the currently selected rows.


Select All Rows

Ctrl-A

Selects all rows in the Data Table.


Deselect All Rows


Deselects all rows in the Data Table.


Select By Keyword...

Ctrl-F

Selects rows in the Data Table based on a keyword search.


Export Data


Exports selected/all to disk.

Table 9.5 Edit Menu items
Relationships Menu
The Relationships Menu lists data types that have contextual relationships with the primary data type of the Data Viewer. These items depend on the type of primary data. Selecting an item will create a new column and populate the rows with the appropriate data.

Figure 9.9 The Relationships Menu



Data Fields Menu

Data fields are additional primary data; the fields that are available are determined by the data source.

Figure 9.10 The Data Fields Menu contains additional information that can be displayed for each entry of primary data.

9.2.2.3 Tool Bar

The Tool Bar contains buttons for convenience to perform actions otherwise performed using the Edit Menu. Currently, Delete and Copy buttons are available.

9.2.2.5 Status Bar

The Status Bar at the bottom of the Data Viewer displays the number of selected rows and the total number of rows.

Appendix I. Datasources

The datasources available to the user depend on what the DISCOVERY Platform administrator has made available to the DISCOVERYspace client.  This is a list of the datasources which are currently supported in a complete deployment of the DISCOVERY Platform.  The datasources can be searched against by clicking the corresponding item from the DISCOVERYspace Menu Bar > Data menu (see section 9.1.2.3) – these data fields are listed as Searchable Fields in the following tables. Additional data fields that are not searchable are listed below as Additional Fields.

AI.1 Disease

AI.1.1 Allelic Variant

Searchable Fields

Name

Description

Example(s)

Name

The name of the allelic variant.

MYASTHENIC SYNDROME, SLOW-CHANNEL CONGENITAL

Synopsis

A brief synopsis of the allelic variant.

Engel et al. (1996) described a 30-year-old female patient with ocular and limb weakness, scoliosis, and a family history consistent with autosomal dominant myasthenia gravis (601462) in 3 generations. The mutation leading to pathology in this patient was a heterozygous asn217-to-lys substitution in the AChR-alpha subunit. Engel et al. (1996) evaluated the pathogenicity of the mutation by engineering the mutation into the corresponding cDNA of mouse AChR and coexpressing it with the wildtype cDNA in HEK fibroblasts. Receptor function was evaluated using patch-clamp studies and ACh binding was measured. These studies revealed that the mutations resulted in an apparent increased affinity for ACh and prolonged AChR activation episodes rendering the receptor channel leaky. 



Additional Fields

Name

Description

Example(s)

Mutation

Notation denoting specific amino acid, etc. changes.

CHRNA1, SER269ILE



AI.1.2 OMIM

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

10070

Annotation

The annotation for the record.

ABDOMINAL AORTIC ANEURYSM

Alternate Names

Synonyms for the disease.

AAA; AORTIC ANEURYSM, ABDOMINAL; ANEURYSM, ABDOMINAL AORTIC ARTERIOMEGALY, INCLUDED; ANEURYSMS, PERIPHERAL, INCLUDED

Features

Features of the disease.

Unknown Inheritance

Overall Synopsis

A verbose text describing details of the disease.

Tilson and Seashore (1984) reported 50 families in which abdominal aortic aneurysm had occurred in 2 or more first-degree relatives, mainly males. In 29 families, multiple sibs (up to 4) were affected; in 2 families, 3 generations were affected; and in 15 families, persons in 2 generations were affected. Three complex pedigrees were observed: one in which both parents and 3 sons were affected; one in which a man and his paternal uncle were affected; and one in which a man and his father and maternal great-uncle were affected. In the 'one-generation' families, there were 3 with only females affected, including a set of identical twins. (...etc.)

Clinical Synopsis

Clinical descriptions of the disease.

vascular; abdominal aortic aneurysm; generalized dilating diathesis; misc; estimated 11.6-fold increase among persons with an affected first-degree; relative; inheritance; autosomal dominant vs. recessive at an autosomal major locus or multifactorial; col3a1 gene (120180.0004) mutations cause about 2%



AI.2 Functional Domain

AI.2.1 Interpro

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

IPR000981

Annotation

The annotation for the record.

Neurhyp_horm

Entry Type

The type (ie. domain, family, etc.) of the record.

Family

Protein Classification

The classification of the protein

extracellular; Molecular Function:neurohypophyseal hormone activity



Additional Fields

Name

Description

Example(s)

Number of Matching Proteins

Number of matching proteins found to correspond to this record.

86



AI.2.2 Pfam

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

PF00004

Annotation

The annotation for the record.

ATPase family associated with various cellular activities (AAA)

Description

Description of the record

AAA family proteins often perform chaperone-like functions thatassist in the assembly, operation, or disassembly of proteincomplexes [2].



Additional Fields

Name

Description

Example(s)

Identifier

PFAM identifier

AAA

Family Type

The type of functional domain.

Family

Alignment Type

The source of the alignment math.

Clustalw



AI.2.3 Swiss-Prot Feature

Searchable Fields

Name

Description

Example(s)

Name

The name of the feature.

CHAIN



AI.2.4 Swiss-Prot Organelle

Searchable Fields

Name

Description

Example(s)

Name

The name of the organelle.

chloroplast



AI.3 Gene

AI.3.1 Unigene

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

2

Annotation

The annotation for the record.

N-acetyltransferase 2 (arylamine N-acetyltransferase)

Expressed Tissue

Tissue of gene expression

Cell lines; adenocarcinoma; colon; corresponding non cancerous liver tissue; hepatocellular carcinoma; liver

Cytoband

Location of the gene on chromosome

8p22

Name

Name of the gene

NAT2

Chromosome

Chromosome where gene is found

8



The following organisms have Unigene data: Arabidopsis, Human, Mosquito, Mouse.


AI.3.2 Genes with Sequence

AI.3.2.1 Refseq

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

4507652

Annotation

The annotation for the record.

thiopurine S-methyltransferase

Nucleotide Sequence

The nucleotide sequence of this record.

CGGCAACCAGCTGTAAGCGAGGCACGG (...etc)

Alphanumeric Accession

Accession in alphanumeric format

NM_000367

Comment

General comment

PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from S62904.1.

Chromosome

Chromosome where gene is found

6

Cytoband

Location of the gene on chromosome

6p22.3

Name

Name of the gene

TPMT

Protein Sequence

The protein sequence of this record.

MDGTRTSLDIEEYSDTEVQKNQVLTLEEWQDKWV (...etc)



Additional Fields

Name

Description

Example(s)

Nucleotide Sequence Length

The length of the nucleotide sequence of this record.

2742

Gender

Gender of the organism for this gene sequence


Circular Sequence Flag

Specifies circular that sequence is circular

false

Addition Date

Date added to Refseq

2000-10-31

Version

Refseq version

1

Sequence Classification

Classification of the sequence

Homo sapiens thiopurine S-methyltransferase (TPMT), mRNA.

Protein Sequence Length

The length of the protein sequence of this record.

246



The following organisms have Refseq data: Fly Refseq, Human Refseq, Mouse Refseq, and Rat Refseq.
AI.3.2.3 Mammalian Gene Collection

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

4361

Annotation

The annotation for the record.

pleckstrin homology, Sec7 and coiled/coildomains 2, isoform 1

Nucleotide Sequence

The nucleotide sequence of this record.

GGCGGCGGTGGCTCCCGGGGCGTTTGAGCGGGCTCAC (...etc)

Tissue

Tissue of gene expression

Lung, small cell carcinoma

Cloning Vector

Vector used to clone the gene

pOTB7

Protein Sequence

Amino acid sequence of the protein

MEDGVYEPPDLTPEERMELENIRRRKQELLVEIQRL (...etc)

I.M.A.G.E. ID

Clone ID in I.M.A.G.E. Consortium data

3538580



Additional Fields

Name

Description

Example(s)

Nucleotide Sequence Length

The length of the nucleotide sequence of this record.

1514

Protein Sequence Length

The type (ie. domain, family, etc.) of the record.

400



The following organisms have MGC data: Human, and Mouse.

AI.3.3 Genecards

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

ENIGMA

Annotation

The annotation for the record.

enigma (LIM domain protein)



AI.3.4 InParanoid

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

BGLR_ECOLI

Annotation

The annotation for the record.

Arabidopsis thaliana



AI.3.5 LocusLink

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

1

Annotation

The annotation for the record.

alpha-1-B glycoprotein

Chromosome

Chromosome where gene is found

19

Confirmation Status

Whether confirmed or not

true

Function

Function of the gene product

Transcription factor

Locus Type

Type of locus

gene with protein product, function known or inferred

Phenotype

Phenotype

Alzheimer disease, susceptibility to

Product

Product of the gene

alpha-1-B glycoprotein

Curation Status

Curation status

REVIEWED

Description

Description

The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.

Name

Name of the gene

A1B; A1BG; ABG; GAB

Variant Summary

Variant Summary

Transcript variant a includes the alternate exon IA, but not exon IB and encodes a distinct N-terminus.



AI.3.6 Wormbase

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

AC3.2

Annotation

The annotation for the record.

contains similarity to Pfam domain: PF00201 (UDP-glucoronosyl and UDP-glucosyl transferases), Score=174.0, E-value=7.7e-49, N=4

CDNA

Name of cDNA

yk822h07.3

Confirmation Type

Confirmation type

EST

Protein

Protein

CE05132

Locus

Locus

sri-20

PCR Product

PCR product

mv_AC3.2

UTR

UTR

5_UTR:AC3.4



AI.4 Miscellaneous

AI.4.1 Gene Ontology (GO)

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

4362

Annotation

The annotation for the record.

glutathione reductase (NADPH)

Definition

Definition

Catalysis of the reaction: 2 glutathione + NADP+ = glutathione disulfide + NADPH + H+. definition_

Gene Name

Gene name

ABF2; ADT1_HUMAN

Synonyms

Synonyms

FK506 binding protein; FKBP



AI.4.2 NCBI Taxonomy

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

9606

Scientific Name

The complete, canonical name of the taxonomy entry.

Homo sapiens

AI.5 Pathway

AI.5.1 Pathways with Images

AI.5.1.1 Biocarta Pathway

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

circadianPathway

Annotation

The annotation for the record.

Circadian Rhythms

Description

Description of the pathway

Organisms from flies to humans have daily circadian rhythms...



AI.5.1.2 KEGG Pathway

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

ana02010

Annotation

The annotation for the record.

ABC transporters, prokaryotic



AI.6 Protein

AI.6.1 Swiss-Prot

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

DPD2_YEAST

Annotation

The annotation for the record.

DNA polymerase delta small subunit (EC 2.7.7.7).

Alternate Accession

Alternate accession

P46957

Comments

Comments


Name

Name of protein

HUS2; HYS2; J1427; POL31; SDP5; YJR006W; YJR83.7.



Additional Fields

Name

Description

Example(s)

Keywords

Keywords

;Transferase;;DNA-directed DNA polymerase;;DNA replication;;Nuclear protein;

Last Update

Last Update of database for this entry

Tue Oct 16 00:00:00 PDT 2001

Protein Sequence

Protein sequence

MDALLTKFNEDRSLQDENLSQPRTR...

Protein Sequence Length

Length of protein sequence

487



AI.6.2 WormPEP

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

DPD2_YEAST

Annotation

The annotation for the record.

DNA polymerase delta small subunit (EC 2.7.7.7).

Alternate Accession

Alternate accession

P46957

Comments

Comments


Name

Name of protein

HUS2; HYS2; J1427; POL31; SDP5; YJR006W; YJR83.7.



AI.7 Subcellular Localization

AI.7.1 Subcellular Localization Prediction

AI.7.1.1 MOTT Subcellular Localization Prediction

(This table allows lookup of localization using indices; it contains a list of localization categories.)

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

1

Annotation

The annotation for the record.

cytoplasmic



AI.7.1.2 PSORT Subcellular Localization Prediction

(This table allows lookup of localization using indices; it contains a list of localization categories.)

Searchable Fields

Name

Description

Example(s)

Accession

The accession of the record.

6

Annotation

The annotation for the record.

peroxisomal