Sockeye Collaborations

Collaborations in both mammalian and the more compact nematode genomes are a central part of the platform development process. They prove the platform's applicability to facilitate work on real world biological problems.

Collaborations leverage the experience of biological and medical researchers into the platform design. These are investments that solve immediate short and long-term issues in important biological and medical problems and solutions in effective ways.


Transcriptional regulation in the adult gut

Determine whether elt-2 is the dominant transcription factor in the adult C. elegans gut.

  • Source data: SAGE tag counts for C. elegans gut cells and for whole worms. Known elt-2 sites in C. elegans and C. briggsae.
  • Compare predictions using elegans and briggsae data separately, and combined elegans-briggsae data.
  • Compare predictions using specific and nonspecific backgrounds.
  • Compare predictions for the top 10, 20, ..., 60 genes/orthologous pairs to see whether predictions degrade as the promoter sequence set includes genes that are less gut-specific.
  • Determine whether transcription factor modules are important in the adult gut.
  • Compare whether patterns in which motif discovery methods agree differs for adult gut predictions and muscle-specific promoters.
  • Predict likely transcription factor locations in constructs; these will be high priority targets for future biochemical tests.
  • Current work
    • compare the effectiveness of methods that refine a set of discoverd motifs using a run-specific estimate of motif statistical significance
    • determine whether methods that explicitly use expression data to refine a set of discovered motifs are more effective


Jim McGhee: University of Calgary, CSHL talk
Don Moerman: UBC
Sheldon McKay, Kim Wong: GSC

Scale of work

Using promoters up to 3Kb in length but truncated at the first 'next' coding region, work with up to ~65 genes or orthologous pairs of genes.

Main deliverables

To address this work we created the initial version of the 'large-scale' motif discovery component of the Gene Regulation platform. The platform component includes

  • a) calculating on a computer cluster
  • b) a visualizer for assessing results and extracting metapatterns
  • c) simple, flexible process 'glue' that makes these easy to use

It is designed to make it routine and cost-effectively to minimize risks related to using particular subsets of methods and parameter settings ' to use in motif discovery, and, to some extent, 'which background?'. It makes it routine to use many methods and parameter settings in parallel, and then it facilitates extracting 'patterns' in discovered motifs in datasets consisting of up to hundreds of related sequences. As important, it facilitates bioinformatics - biology collaborative teams in which both parts of the team can quickly learn from each other, which makes collaborations more productive. The 'large scale' platform component has proven itself to be effective, and continues to be developed and integrated more deeply into the overall Gene Regulation platform. Development efforts are focused on addressing issues identified in collaborative work.

Transcriptional regulation in muscle cells

As for C. elegans gut, above, characterize significant motifs in cells identified as muscle-specific by a combination of methods -- SAGE, GFP fusions and oligonucleotide microarray.


Don Moerman, UBC
Sheldon McKay, Kim Wong: GSC

Scale of work

Using promoters up to 3Kb in length but truncated at the first 'next' coding region, currently up to ~110 genes or orthologous pairs of genes.


Transcription factor motifs and modules

Predict regulatory module locations, identify dominant motifs and extract module structural 'rules', given:

  • a) preliminary 'rules' describing the microstructure of modules that should be present in mammalian genomes
  • b) regions or genes identified by biological tests like ChIP or expression data

Given a pattern (a single motif, a dimer, or a more complex pattern) find 'hits' in whole mammalian genomes and annotate hits with information about the genomic context in which they occur.


The Hospital for Sick Children (Toronto)

Visualize datasets from tiled microarrays

Develop flexible visualization tools for raw and smoothed results for multiple datasets.

Page last modified Feb 06, 2007