Posters: Genome Sciences Centre at BCCA Annual Conference 2002
Posters presented at the BC Cancer Agency's Annual Conference 2002 by the Genome Sciences Centre are available for download.
Characterization of inxs; a gene involved in programmed cell death in the developing Drosophila retina
Freeman JD, Ma K, Rusconi JC, Cagan RL, Marra MA, Gorski SM
British Columbia Cancer Agency Genome Sciences Centre
Washington University School of Medicine, St. Louis, MO, USA
Selective programmed cell death (PCD), or apoptosis, plays a critical role in controlling cell populations and in sculpting the shape of developing organs and tissues. In addition, PCD is implicated in disease pathogenesis and is associated with several human diseases including cancer, neurodegenerative disorders, AIDS, and autoimmunity. We are using the Drosophila retinal epithelium to study the molecular mechanisms of PCD during development. The retina consists of 750 identical repeating units called ommatidia. The ommatidia are initially separated by numerous interommatidial cells - some differentiate to become pigment cells and the excess cells undergo PCD. Inhibition of PCD leads to supernumerary cells between ommatidia and a consequent disruption of the normally precise ommatidial pattern. This disruption is evident as a rough eye phenotype in the adult. Using a genetic approach, we identified a new gene, inxs ("in excess"), involved in the retinal PCD process. Loss-of-function mutations in inxs demonstrate dominant enhancement of the rough eye phenotypes conferred by mutations in irregular chiasm C-roughest (irreC-rst) and echinus (ec), two genes implicated previously in retinal cell death. The inxs mutant phenotype on its own includes a rough eye in the adult and a cellular organization in the pupal retina similar to that observed in transgenic animals expressing the baculovirus caspase inhibitor p35. Acridine orange staining and TUNEL labelling confirmed that excess cells are due to a reduction in cell death. We have identified two additional alleles of inxs. However, flies homozygous for these alleles die prior to retinal PCD. In order to examine the retinal cell death pattern for these alleles, we used the FLP/FRT recombination system to induce somatic clones homozygous for inxs. The pattern of cell death in homozygous inxs embryos is also being investigated, using TUNEL labelling together with a marker to distinguish heterozygotes. We used deficiency mapping to localize inxs to polytene chromosome interval 64F, a region that does not correspond to previously characterized cell death genes in Drosophila. Using deficiency breakpoint mapping, we showed that inxs must lie in a 100 kb interval which contains 20 known or predicted genes. To identify inxs among the candidate transcripts, we are using sequencing and real-time RT-PCR. Given the highly conserved nature of PCD, these studies will contribute further insights into PCD regulation in other animals, including humans, and provide new avenues for investigation into this crucial process.
A set of rearrayed BAC clones spanning the human genome
Martin Krzywinski, Ian Bosdet, Duane Smailus, Carrie Mathewson, Natasja Wye, Sarah Barber, Mable Brown-John, Steve Chand, Alison Cloutier, Amara Masson, Michael Mayo, Teika Olson, Wan Lam, Calum MacAuley, Kazutoyo Osoegawa*, Shaying Zhao**, Pieter J. de Jong*, Jacqueline Schein, Steven Jones, Marco Marra
Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada
*Children's Hospital Oakland Research Institute, Oakland, CA, USA
**The Institute for Genomic Research, Rockville, MD, USA
From the human fingerprint map constructed at Washington University Genome Sequencing Center, we have selected a set of 32,850 BACs that span the human genome. The purpose of the clone set is to serve as a genome-ordered set of probes for FISH and microarray-based BAC CGH experiments. The comprehensive coverage of this clone set makes it a valuable asset in both research and clinical contexts, in the search for understanding and detection of cancer-related chromosomal and expression alterations.
The clones have been sampled from RPCI-11/13 (95%) and Caltech-D (5%) libraries, selected to optimize size, coverage of the map and consistent overlap, and have been rearrayed into 384-well format. The identity of clones has been validated by fingerprinting. Following the first round selection of 29,000 clones, a combination of automated and visual fingerprint inspection identified 2,000 clones that did not match the fingerprints stored in the physical map. 4,500 clones were added to the set to maximally conserve map coverage of the unmatched clones. Analysis of the set's sequence coverage (UCSC, 2002/06 assembly) resulted in the selection of an additional 1,300 clones to cover gaps larger than 10 kb.
The clone set covers 99.5% of the November 2001 version of the BAC fingerprint map. Using fingerprint-based localization, end sequence data and assembly coordinate data, the set was found to cover 2.80 Gb (99.5%) of the assembled sequence, with 50 Mb of assembly coverage provided by clones not found in the fingerprint map. Approximately 80% of the assembly is covered at 1X and 2X in a 1:1 ratio. The sequence coverage of the set contains 550 sequence coverage gaps totaling 15Mb, with 55% of the gaps being smaller than 10kb.
Coverage of telomeres and centromeres has been evaluated by using telomere-localized BACs (Human Telomere Sequencing and Mapping Project) and existing CGAP FISH data, available for 1,200 of the clones in the set. Out of 164 telomeric BAC markers, 53 are in the clone set with the remaining markers showing an average overlap of 100 kbp (22 HindIII fragments) with their best match in the clone set.
This first version of the clone set will be publicly distributed through Pieter de Jong. A web-based clone search and data mining portal will be available (http://www.bcgsc.ca). We anticipate that the set will evolve as new versions of the sequence assembly and physical map are released. We are planning to create analogous resources for mouse and rat.
Verification of droshophila melanogaster sequence assembly using restriction digest BAC fingerprints derived from multiple enzymes
Martin Krzywinski, Jacqueline Schein, Readman Chiu, Ian Bosdet, Carrie Mathewson, Natasja Wye, Sarah Barber, Mable Brown-John, Steve Chand, Alison Cloutier, Amara Masson, Michael Mayo, Teika Olson, Steven Jones, Roger Hoskins*, Susan Celniker*, Gerald Rubin*, and Marco Marra
Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada
*Berkeley Drosophila Genome Project, U.C. Berkeley, Berkeley, CA, USA
The annotated D. melanogaster genomic sequence is currently in its third revision (Release 3) and covers nearly all of the 120 Mb euchromatic DNA. The sequence assembly is curated by the Berkeley Drosophila Genome Project (BDGP) and comprised of data produced at Celera, Genoscope, Lawrence-Berkeley National Labs, Baylor College of Medicine and the European Drosophila Genome Project (EDGP). Drosophila continues to play a major role in providing a model for inheritance and gene interaction and a high quality assembly is required to ensure accuracy of sequence-based analysis. To this end, we have developed an automated data analysis pipeline for verification of the sequence assembly using multiple restriction enzyme digests of tiling path BAC clones. Various types of repeat regions produce incorrect, but self-consistent, sequence assemblies. These errors are very difficult to spot without an external validation method. The fingerprint verification method offers several benefits: the sequence is verified by an independent laboratory process, the fingerprints are robust in elucidating repeat elements and the data processing pipeline is extensible and can be adapted to any sequence data.
A set of 1,056 tiling path clones spanning the euchromatic portion of the genome were selected. Each clone was independently fingerprinted using 5 restriction enzymes. The enzymes were chosen to maximize coverage of the sequence with fragments in the size range of 1-20 kb to facilitate detection. The enzymes selected were ApaLI, BamHI, EcoRI, HindIII and XhoI. This combination provides coverage by at least two, three and four optimally-sized fragments for 99.9%, 98% and 87% of the sequence, respectively.
An in-silico fingerprint of each clone was derived from the sequence and compared to its experimental counterpart using a Needleman-Wunsch alignment and a 2% fragment size tolerance. Each base of the sequence is assigned a verification depth that corresponds to the number of experimentally verified in-silico fragments containing that sequence location. The average verification depth is used as a measure of overall verification. We have devised various figures of merit to identify clones with unverified subsequences and to categorize the discrepancies. An interactive web-based system has been created to visualize verification coverage.
We are currently analyzing the tiling clone verification data and identifying potential authentic inconsistencies between the sequence-derived and experimental restriction maps. The method described here will also be applied to verification of heterochromatic DNA sequence, which is being generated using smaller clones. We anticipate that this fingerprint-based sequence verification methodology can positively impact the final sequence assembly quality of other organisms such as human, mouse and rat.
Bioinformatic analysis of SAGE expression data and applications to cell death
Erin D. Pleasance, Suganthi Chittaranjan, J.D. Freeman, Richard J. Varhol, Scott D. Zuyderduyn, Marco A. Marra, Sharon M. Gorski, and Steven J.M. Jones
Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC
Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that involves sequencing small segments of mRNA transcripts ("SAGE tags"). Using bioinformatic methods, we assessed the efficacy of SAGE in identifying expressed transcripts, and devised a method for determining genes represented by SAGE tags from Caenorhabditis elegans, Drosophila melanogaster, and human genomes. Based on SAGE data, we identified both known and novel genes expressed in a D. melanogaster tissue undergoing programmed cell death.
Tag-to-gene mappings that link SAGE tags to their cognate genes were derived based on full-length transcript sets. D. melanogaster and C. elegans "conceptual" transcripts were constructed based on genomic annotations, ESTs, and cDNAs, and resulting tag-to-gene mappings were tested against D. melanogaster full-length cDNAs. For the human genome, incomplete full-length transcript sets were derived from the MGC and RefSeq sequence databases. The efficacy of SAGE in transcript identification was evaluated by studying the effect of SAGE tag length and anchoring enzyme on the percentage of transcripts that can be identified with SAGE. In addition, tag-to-gene mappings were applied to experimental data from D. melanogaster, and coupled with EST and genomic sequence data stored and visualized in an ACEDB database to confirm gene predictions and identify novel genes in the D. melanogaster genome.
Analysis of tags extracted from conceptual transcripts suggests that, using the standard SAGE procedure, expression of 8% of D. melanogaster and 15% of C. elegans genes cannot be detected unambiguously by SAGE due to shared sequence or lack of NlaIII anchoring enzyme sites. Both increasing tag length by 2-3 bp and using Sau3A instead of NlaIII as the anchoring enzyme increases potential for transcript detection. In the human genome, preliminary results suggest that an even higher proportion of genes cannot be detected unambiguously by SAGE, and NlaIII is the preferred anchoring enzyme. Thus, the number of expressed transcripts that can be identified by SAGE varies with the enzyme, tag length, and organism, which are important factors to consider when designing a SAGE experiment. Analysis of D. melanogaster SAGE data identified, out of over 4000 different tags, several hundred previously unpredicted transcripts based on unambiguous matches to ESTs and genomic sequence. This work provides a basis for further study of these genes, some of which may have a role in the process of programmed cell death that is integral in oncogenesis.
Analyzing Gene Expression with the DISCOVERY Platform
Scott Zuyderduyn, Richard Varhol, Mehrdad Oveisi-Fordoei, Shawn Rusaw, Steven Jones
BC Cancer Agency, Genome Sciences Centre, Vancouver BC, Canada V5Z 4E6
The increasing use of the serial analysis of gene expression (SAGE) method of whole-transcriptome characterization has resulted in the need for sophisticated and comprehensive bioinformatics software applications to aid in analysis.
The DISCOVERY platform, developed at the Genome Sciences Centre, consists of a central database to store raw data from public repositories and large-scale analyses alongside a user-driven, graphically oriented software application.
The data from SAGE libraries generated from a wide variety of tissues in numerous model organisms, including human cancers, can be utilized by a SAGE-specific software extension to provide an experimental focus to this wealth of data.
For example, SAGE data from cancer tissues can be compared to a variety of normal tissues to identify cancer-specific genes (either known or predicted). These isolated genes can be elucidated further by similarity to other genes, functional domain identification, subcellular localization predictions, and roles in known pathways.
The software works well with more mathematically oriented gene expression analysis applications, allowing the use of clustering algorithms to further isolate genes of interest.
Using natural language processing to find relationships between genes, drugs and cancer
Shawn Rusaw, Chris Bajdik*, Richard Varhol, Scott Zuyderdyn, Vadim Astakhov, Steven Jones
British Columbia Cancer Agency, Genome Sciences Centre, Vancouver, BC, Canada V5Z 4E6
*British Columbia Cancer Agency, Cancer Control Research Program, Vancouver, BC, Canada V5Z 4E6
We are using natural language processing to extract from MEDLINE literature relationships between genes, drugs and various cancer-types. It is hoped that as-yet unidentified relationships will be discovered that may help in specifying drug targets for research and treatment of cancer. Our general approach is as follows:
- Part-of-speech (POS) tag each MEDLINE sentence using the Brill Tagger.
- Extract noun phrases from the POS tagged sentences using a noun phrase chunker.
- Relate these noun phrases to cancer via an in-house thesaurus of cancer synonyms and a keyword search.
- Relate these noun phrases to drugs via NLM's Unified Medical Language System and a keyword search.
- Relate these noun phrases to genes via The Weizmann Institute's GeneCards database and a keyword search.
The relationships are stored in our in-house SAGEdb relational database developed for analysing SAGE expression data. We are able to use SAGEdb to map interesting up/down regulated tags in a SAGE expression library to genes, and through our natural language processing work, map these genes to related genes, drugs and cancer-types.
Reconstruction of transformation events in lung cancer on gene expression level
Peter Ruzanov, Steven Jones
Lung cancer is one of the leading causes of death in North America with the high mortality rate and increasing number of cases. Finding the efficient anti-cancer treatment is quite challenging, mostly because tumorogenesis itself is a complex multistage process, which is driven by the changes in expression of a large number of genes. Recent advances in technology have made the large-scale gene expression measurements reality, though the analysis of these data poses another challenge. The objective of this study was to analyze gene expression data, obtained with SAGE (serial analysis of gene expression) method in lung cancer.
For the analysis of SAGE data we used the approach of building phenetic trees, borrowed from systematic biology. Generally, data is arranged in a tree-shaped structure in order to reflect the hierarchical relation between analyzed parameters. We used this approach to visualize the relationship between different SAGE libraries and identify the genes, participating in malignant transformation.
To analyze SAGE data for lung cancer libraries we used the Chi-square statistical test to determine the statistical significance of changes in expression of a gene. After pairwise comparison of all possible pairs of SAGE libraries we built a "distance matrix" later used for visualization of relationship between datasets. For the annotation of the genes we used local copies of public databases such as NCBI Unigene, OMIM and others.
The optimal structure of the tree, built with Parsimony approach, is the one with the fewest number of changes among the nodes, and represents the preferred tree, or cladogram. Using this technique, we built a tree with SAGE libraries as nodes. Because the tree is based on distribution of events, that link different nodes in it, these events can be reconstructed from the tree. As far as we had the information about the biological nature of the samples, we can say that resulting tree was biologically meaningful and might be used for further analysis.
Our tree was based on the distribution of expression events, so these events might be reconstructed by analyzing group of nodes and their expression profiles. To reconstruct the transformation events we decided to extract the sets of genes, characterizing two stages of malignant transformation: first, the transition from normal tissue to carcinoma and second, from carcinoma to invasive form of carcinoma. Annotation of genes, retrieved from these two lists showed considerable biological relevance of data, thus confirming the validity and usability of the approach we used.
Gene expression technology at the BCCA Genome Sciences Centre
Khattra J, Chan S, Asano J, Pandoh P, Coughlin S, McDonald H, Vatcher G, Schnerch A, Freeman D, Zuyderduyn S, Leung D, Teague K, Jones S, Marra M
The Genome Sciences Centre has established a Gene Expression Laboratory with the purpose of consolidating high throughput transcriptome analysis platforms and related technology development.
Global gene expression profiling methods are currently being applied to collaborative research in cancer genomics (V. Ling, C. Eaves), mouse development (P. Hoodless, E. Simpson, C. Helgason), C. elegans biology (D. Baillie, D. Moerman, D. Riddle, J. McGhee), a variety of embryonic stem cells (K. Humphries, C. Eaves, J. Thomson), transient hypoxia of human tumor cells (R. Durand), and host response to pathogens (R. Brunham, C. Astell).
Our choice and design of experimental platforms addresses the following issues critical to successful gene expression profiling in the wide variety of projects listed above:
- Isolation and analysis of top quality RNA, including efficient isolation from minute samples.
- Quantitative global gene expression profiling of both known and novel gene transcripts using Affymetrix GeneChips and Serial Analysis of Gene Expression.
- Accurate validation of selected transcripts via quantitative real-time RT-PCR (ABI Prism 7900HT Sequence Detection System).