Sockeye changes and limitations

What's new in version 1.0

Comparative genomics

  1. 'Navigation'

    A user now has 3D move/zoom controls that are comparable to those in 2D genomic browsers for work with individual data tracks, but have been extended to support multi-track comparative genomics / algorithmics work. Key extensions include regions, multiple-selection operations and behaviours for coexisting tracks.

  2. Gene collections

    As a first stage in implementing several ways of assembling collections of related genes, all Ensembl homologues for a gene can easily be assembled in 3D, using a popup menu from the gene.

  3. Sequence alignment

    Pairs and multiple sequences can be aligned from Sockeye. Different algorithms can be used. Typically you would define the sequences by marking, then selecting, a region on each of set of data tracks that show a collection of related genes.

Large genomic regions and 'on-demand' annotations

  1. RAM is needed to store feature data internally and to display 3D features. To permit large regions to be queried with low risk of out-of-memory errors, Sockeye now queries only Ensembl genes by default, and the maximum Java heap size has been increased from 64 MB to 128 MB. To see additional annotation types, switch on their 'show' and 'select' controls in the feature tree, and they will be queried in.

Project management and collaboration

  1. Session save now includes more parts of the current session status.
  2. Tracks can be saved as GFF files that, when imported, are ??indistinguishable?? from a freshly-queried Ensembl track... gff ...??

Java

Sockeye is now using Java 1.4.1_01 and Java3D 1.3.1 beta 1.

Ensembl

  1. All queries, and all navigation operations that trigger an Ensembl query are monitored by a status elapsed time counter.
  2. Markers are now queried.
  3. Query setup tools now offer the 'end' coordinate for a chromosome or scaffold, as a combo box selection.

File import

  1. Import 'As distribution' - A data file that has a very large numbers of scored features -- e.g. from TFBS scan algorithms -- can be converted to a smoothed, 'distribution' that requires little memory.
  2. The source code has been revised to make it easier for a Java developer to add parsers for file formats other than GFF.

GUI

  1. To make it easier to assemble and work with complex assemblies of comparative datasets, data tracks/sets and features are each displayed in a tree, and the GUI for both trees has been improved.
  2. Ensembl queries and file imports are now handled from a dialog box, rather than from a panel in the main GUI. The main GUI has been simplified.
  3. Sockeye now fits 1024 x 768 screens.
  4. The RAM used by Sockeye is now displayed in real time in the status bar.
  5. The range of sources for Web browser 'views' has been expanded to include NCBI Map View and Wormbase DasView.

Separating overlapped features

  1. A Boolean <collision> feature parameter has been added to user_config.xml. When this is set to true for a feature type, a feature instance that overlaps another of that type will be separated in 3D. Currently, Java3D has an undocumented limited ability to handle multibody overlaps, and this constrains the performance of this functionality in Sockeye.

Limitations in v1.0

  1. Sequence alignment is an early implementation. It is running at the GSC, but may not yet be available outside the firewall.
  2. OutOfMemory errors (OOME) can occur when too many features are queried in or imported. For now, despite 'on-demand queries', Sockeye still has relatively shallow protection against OOME. You can minimize the risks by balancing the size of the queries against how much installed RAM you have, and adjusting the Java heap settings in the launcher LAX file in the installation directory.
  3. While v16 Ensembl is avaiable from kaka.sanger.ac.uk and from bcgsc.bc.ca, www.ensembl.org/java/download/ states (as of 07aug03) that the 26 Jun 2003 Ensembl Java 'layer' used by Sockeye to access Ensembl servers supports v14, but only partially supports v15, 13, 12 and 11.
  4. Once started, connection attempts and queries to an Ensembl server cannot be interrupted without closing Sockeye.
  5. The range of Ensembl features that Sockeye queries is smaller than that available at Ensembl, UCSC, etc. As semantic zooming is implemented, we will expand the range of Ensembl features available.
  6. For data imported from files, currently only a GFF2 parser is available. Other file formats need to be converted to GFF before they can be imported.
  7. Functionality for working with sequence data remains limited. Only one strand of nucleotides can be displayed. The functionality will be expanded in the future, but, for now, is intended to permit simple view, copy and save operations. Sequences are displayed in overlapping, hard-paged pages. You cannot select a sequence across a page boundary.
  8. If you browse a File Chooser to a nondefault folder, only some File Choosers will remember this path, and only within the current Sockeye session.
  9. While combo boxes list Ensembl species and chromosome or scaffold numbers, the lists are not (or not well) sorted, and the combo boxes are not editable. This is awkward if you want to select a particular scaffold number from a long list.
  10. In the sequence panel the sequence text becomes out of register with the text in the count column at the left.
  11. The online help is not completely up to date.

Known issues in v1.0

  1. When more than one feature instance is overlapped, the Java3D collision machinery that is used to separate the instances behaves unpredictably, as it handles only single-body overlaps. This is an undocumented Java3D limitation.

What's new in version 0.8.4

Ensembl

  1. Sockeye no longer automatically builds drivers and tests connections to XML-listed Ensembl servers on startup. All Ensembl GUI components are contextually enabled. You can ignore servers and server connections if you want to work only with GFF data. If you want to work with Ensembl data, you choose a server from a list and try to connect to it. If you can’t connect to this server, there is no time-out, but a message box eventually appears, and you can try to connect to another server. If some or no servers are unavailable, you can still do GFF work. If no servers can be connected to, or if the XML file has no servers listed, a message box asks you to check your Internet connection.
  2. The Ensembl species list for a server is simpler and offers only one version at a time, which is selected from a list of all versions that Sockeye can handle that are available from the server.
  3. When an Ensembl server is queried, a message box appears for each feature that cannot be retrieved due to database issues.
  4. Features -
    1. Added CpG islands.
    2. Homologue – still disabled; will be available in a future version
  5. The BetweenQuery ‘End’ text field now offers the chromosome/scaffold length.   Handling of incorrect Start/End entries is improved.
  6. Moved ‘All species’ and ‘Retrieve sequence’ checkboxes. Combined Server and Species subpanels.
  7. You can now submit the same query any number of times, as a date-time stamp is added to the genomic information that identifies a query/data track.
  8. Internal code for handling features was simplified.

GFF

  1. Any 3D data track -- Ensembl, GFF, Ensembl+GFF,... -- can be exported as a GFF file to the ...install_dir/user directory. The exported GFF file will contain only the features that were visible on the platform at the time the track was exported.
  2. The GFF panel shows the Add_To_Existing_Track subpanel even when there are no ExistingTracks. All panel components are contextually enabled.
  3. The FileChooser opens with a mask set to ‘gff’. Within one session, it remembers the last directory a file was opened from and reopens at this directory.
  4. The last file imported is automatically selected, so, when only one file has been imported, you do not have to select it in order to Preview or Load.
  5. The Preview dialog box opens with FeatureNames sorted. All columns are sortable (but only names and small integers work correctly for now). The frame is correctly sized, is not resizable, and has reasonable column widths.
  6. An error in ‘Abs. start coord’ was corrected. Popup menus show relative and absolute coordinates.
  7. An imported file can be deleted from the GFF file list (but its data are not deleted – for this you have to delete a whole data track).

Feature tree, user_config.XML, GFF features

  1. A general hierarchical design is now used for the user_config.xml file, the feature tree that controls the 3D display, and 3D feature appearances. The design supports managing large sets of features and projects.
  2. Users have flexible control over the feature tree and 3D feature appearances by editing the user_config.XML file. A child feature inherits display properties from its parent, so the XML file is much more compact, and can be more easily modified and extended.
  3. user_config.xml is now checked as it is loaded during startup. Format errors are reported in detail, though without line numbers.
  4. From a selected 3D feature, a popup menu item opens the feature tree to the selected feature type. With this, you can easily display all examples of one or more feature types while hiding all other types of a category that has many types, e.g. repeats.
  5. The total number of features for all displayed data tracks is shown. Update error corrected for DeleteAllTracks and trying to display more than <featuresLimit> features.
  6. To permit users to work with larger datasets, the <featuresLimit> on the number of features showable can now be edited in the user_config.xml file, and instructions are given in the online Help for increasing the amount of Java RAM. The default value is 30k; a safer limit for the default 64 MB is 5000.
  7. Added new <orientation>s for cone.WRL: 3 for upside down, 2 for upright. These are not yet in use for Ensembl features.
  8. Added <text> label parameter for genes.
  9. Added <selected> parameter for default features shown on new data tracks.

Sequences

  1. Sequence functionality is still modest, as resources continue to be focused on annotations and 3D.
  2. The sequence tree panel was brought back into the main Sockeye frame, and is visible from both DataSources and DataTracks panels.

3D

  1. When a new feature is added to the track, a default set of features is shown. The default set is user-editable in the XML file.
  2. Improved handling of the case where a new track would display more features than the limit.
  3. When features are shown or hidden, the display is updated for all feature changes in one redraw, so the display no longer flickers due to repeated redraws for separate feature types.
  4. Any data track on the platform can be displayed in a reversed orientation by checking it in the rightmost column of the track list table.
  5. The left popup menu for Ensembl genes, exons and repeats now shows coordinates and length.
  6. Ensembl EST and CpG popoups show start/end coordinates but not length.
  7. GFF features show all GFF fields but do not add a length.
  8. The left popup menu lets a user open the feature tree at the type of a selected feature.
  9. Multitranscripts - A left-click on the floating ball opens an initial implementation of a 2D transcript display. Balls float over 5’ end of a gene. Added multitranscript transcript count in mouseover banner.
  10. The track centerline is shown by default.
  11. Text display
    1. XML-defined on/off, font size
    2. GUI options – show text, long (default) or truncated descriptions, text rotates to face you (default)
    3. gene names - for genes with descriptions
  12. Bugs fixed in number of features displayed: DeleteAllTracks, more than featuresLimit. Added featuresLimit below feature tree.
  13. Initial code was added for detecting overlap between features. Experimental; probably should not be used. Not working completely: currently does not iterate over regions with multiple collision events, and collisions occur between features of different types. Default feature set includes a <collision> boolean. When true, a feature expands vertically in 3D when collisions (overlaps) occur.

Navigation and zooming

  1. As a first stage in implementing a more flexible 3D environment for comparative genomics work, you can mark a region on a data track by dragging the cursor along the platform. From a popup menu, you can then copy this region to a new track. As region-marking can interfere with platform rotations at high zoom-in, a checkbox is available in Options > Graphics settings that disables region marking.

Online Help

  1. Updated. Some longer pages were broken into several shorter pages, each of which deals with a topic area.
  2. Added information on changing JRE RAM. Started Navigation and zooming section.

Experimental – not yet available

  1. Sequence alignment can be run from the GUI.
  2. GFF files with thousands of e.g. progressive similarity results along a genome region can be converted into a distribution feature on import.

Known issues

  1. In the list of imported GFF files, the table cell height may cut off text – esp. on SuSE Linux.
  2. The Ensembl species list should be sorted.
  3. The GFF Panel is too narrow for long GFF filenames, and can’t be resized.
  4. GFF Preview sorting only works for feature names and counts.

New in version 0.8.3

Ensembl

  1. Servers are more easily added and configured – permanently in the user_config.XML file and temporarily in the Options panel.
  2. All Ensembl servers listed in the xml file are tested on startup; all servers that can be connected to are dynamically available from the GUI. User is notified of connect attempts that fail.
  3. All species of version 11 or higher that are available at a server are dynamically available from the GUI. Species are available regardless of whether they have chromosome or scaffold data.
  4. User is notified of query attempts that fail for particular features (e.g. ESTs, repeats, …), and such failures are handled robustly.
  5. Ensembl ESTs and repeats are queried, but only for query regions < 1 million bases, to prevent OutOfMemory errors with the current 64 MB JRE RAM.
  6. ‘Search at’ renamed ‘Search for’. Can do a ‘Search for’ query on all Ensembl species. Removed ‘submit’ button from ‘for’ query.
  7. Bug fixed in link to browser in ‘search for’ query.

GFF

  1. GFF files are validated when imported. Errors are reported by line number. Example GFF files include ‘invalid_similarity.gff’.
  2. GFF files that contain no data are caught and reported. Example GFF files include ‘empty.gff’.
  3. Unknown (to the XML file) GFF features can be imported. They are displayed with a ‘generic’ 3D appearance.
  4. More example files included.
  5. Genbank2gff.pl and .java demonstrated – convert Genbank files to GFF

Sequences

  1. The sequence display was moved from a tabbed panel underneath the 3D panel to an experimental separate ‘SDE’ frame with a three-category tree of available sequences.
  2. A query run with ‘Retrieve sequence’ checked loads a query name into the SDE tree.
  3. A sequence selected in the tree can be popped up into a separate frame. Having separate frames will make future sequence operations very flexible for the user.

3D

  1. GraphicsSetting added for track centerline.
  2. Warning in online help that DirectX Java3D does not support antialiasing.
  3. ‘Centre’ items removed from left popup.
  4. Right popups show all fields for a GFF feature.
  5. Newly added data tracks (Ensembl and GFF) are automatically added to the 3D platform. Once some features have been switched on, any new track added to the 3D platform is shown with these features.
  6. Added ContigView button under data track list table.
  7. User now has access to some platform display parameters in user_config.xml.

Installation

  1. Linux installer build process was corrected, eliminating problem that prevented Sockeye from launching on computers lacking the appropriate Java/Java3D installation.
  2. Added downloads without bundled Java; offer both OpenGL and DirectX Windows versions.

General

  1. Horizontal divider removed.
  2. Updated JavaHelp.
  3. Contextual help buttons activated on panels and dialog boxes.
  4. Removed tabbed panels that are not yet functional.

Last modified: 14 August 2003, 12h15