Trainee Projects
Marra Lab 2009

Summary of Dr. Marra's trainee projects
Current Trainees
Informatic approaches for identifying and reducing noise in oligonucleotide probe-level intensity data
Trainee: Noushin Farnoud
Changes in chromosomal copy number (CN) in somatic cells are hallmarks of tumour initiation and progression, as well as other developmental abnormalities such as mental retardation. Such changes can be detected using microarray technologies such as Affymetrix® SNP arrays. Analysis of data generated using this high-resolution, whole-genome analysis technique can reveal both CN changes and uniparental disomy. However, despite improvements in technology, the data are noisy, which adversely affects the sensitivity and specificity of CN change identification, particularly for changes affecting only a few consecutive probe-sets on the array.
The accuracy in assessing CN relies on the sensitivity and specificity of bioinformatics approaches that analyse the array data. Currently no software package can ensure both high accuracy and specificity for CN analysis purposes. We hypothesized and confirmed that one major drawback in available algorithms is that they do not acknowledge the intensity of individual SNP oligonucleotide probes. Instead, these software packages tend to use a normalized and smoothed form of intensity data obtained from averaging groups of probes (PM probes in the SNP probe-set), and use this value for further analysis of CN change. But many such values are likely to be inaccurate as a result of microarray noise. This signal-to-noise ratio (SNR) is the most challenging problem in analysing Affymetrix SNP data and is the key factor in generating false positives and false negatives.
Therefore in order to improve the sensitivity and specificity of CN analysis methods, my project has initially focused on: 1) identifying possible sources of noise in oligonucleotide arrays, and 2) developing an algorithm to bioinformatically improve the SNR. In order to characterize the noise, I have modeled the standard deviation of signal log2-ratios for both SNP probe-sets and individual oligo-probes, and estimated the statistical parameters for this model. The conclusions of this analysis proved that SNPs have a relatively consistent performance between experiments regardless of their CN content. As a result, we concluded that the signal fluctuations must have been generated by the variability in the performance of individual oligonucleotide probes in the probe-sets. Based on this finding, the aim of the rest of the project is to develop and implement a novel algorithm that enables monitoring of CN changes using oligo-level data. The progress so far includes designing a framework for this algorithm, implementing the beta version of the software, and testing it on several known regions of copy number variations (CNVs). All of these CNVs, which ranged from 40Kb to13Mb in size, have been successfully detected by this algorithm. This is a significant success, given that none of the available software packages were able to find “all” of these CNVs.
Unraveling transcriptional regulatory networks in health and disease
Trainee: Olena Morozova
Transcription factors (TFs) are key regulators of gene expression that account for the coordinated regulation of functionally-related genes. A global view of transcriptional regulation is necessary to understand both normal development and disease pathogenesis. The goal of this project is to reconstruct a comprehensive network of mouse TFs governing aspects of development and organogenesis using data from several high throughput experimental platforms available at the GSC. In particular, we use bioinformatics approaches to mine Serial Analysis of Gene Expression (SAGE) data from the Mouse Atlas project to identify co-expression based associations of TFs during mouse development. We also envision incorporating data from the ongoing ChIP-TS (chromatin immunoprecipitation with tag sequencing) experiments that would shed light onto the genome-wide binding patterns of TFs of interest. Ultimately, we hope to be able to achieve a detailed understanding of transcriptional networks that control development and organogenesis and their misregulation during disease.
Characterizing tumour evolution using RNA-seq
Trainee: Rodrigo Goya
Next generation sequencing is actively being used to analyze transcriptomes, obtaining an unbiased representation of the total mRNA present in a cell sample. A variety of data can be obtained from these experiments, ranging from gene expression data to mutation discovery, RNA edits, allele-specific expression, splicing patterns, alternative expression, and more. First approaches have been focused on using known gene models as a base for analyzing data, while useful this can limit the discovery of novel events and depends strongly on the sources of evidence selected as annotations. De-novo discovery of splicing events can also be done, however processing time is high and large studies can take long time to complete. We are developing approaches to tackle two standing problems: effectively use known annotations while allowing for novel splice-site predictions in an computationally efficient way, and to better predict expressed gene isoforms when low coverage may not cover a gene in its entirety. In order to better understand the role of alternative splicing and alternative expression in cancer samples, we will use these methods in a large transcriptome data set from different cancer types and identify patterns particular to types of cancer or tissues. This information will then analyzed to help differentiate between normal splicing patterns, artifacts and novel events, paving the way for discovery of relevant cancer specific features that may help target the disease or predict treatment response.
Functional characterization of EZH2 mutations
Trainee: Maria Mendez-Lago
The evolution of 5-FU drug resistance in colorectal cancer
Trainee: Jill Mwenifumbo
In 2009, an estimated 22,000 Canadians were diagnosed with colorectal cancer, as many as 50% of these patients will have metastatic disease. Although metastasis is the main cause of death from cancer, very few metastases suppressor or promoter genes have been identified. Our aim is to identify recurrent genomic mutations that are specific to colorectal metastases, and likely to be metastases suppressors or promoters. Our goal is to use second-generation sequencing technology on the genomes of normal, primary tumor, & metastasis tissue from 100 patients with colorectal cancer. Our approach will be to compare the recurrent somatic mutations found in the two groups of patients: those with metastatic colorectal cancer and those whose cancer did not metastasize. The results of our study will allow researchers to address overarching Prognostic questions such as ‘Can we identify somatic mutations in primary tumors that predict metastasis?’ Clinical questions such as ‘Can we identify somatic mutations in the primary tumors that may direct the decision on whether or not to treat with chemotherapy after stage I and II colorectal tumor resection? And pharmaceutical questions such as ‘Can we inform the development of drugs that promote the dormancy of distant micrometastases?’
Past Trainees
Identifying clinically relevant biomarkers in lymphoma using next-generation sequencing
Trainee: Ryan Morin
Lymphomas are a class of cancers that generally derive from blood cells that are present within organs called lymph nodes. Similar to other cancers, lymphoma tumours can be surgically removed. However, patients often relapse after surgery because, inevitably, a small number of cancer cells remain in the body. Diffuse large B‐cell lymphoma (DLBCL), and Follicular Lymphoma (FL) are the two most common types of lymphoma. Sophisticated techniques that allow one to view the abundance of genes (expression), or the genetic code (DNA sequence), of cancer cells can reveal clinically relevant distinctions between individual cases of DLBCL and FL. This type of grouping is important because, for example, patients with one subgroup of lymphoma known as the ABC variety appear to have an inferior response to current standard therapies compared to those with the more common GCB variety of DLBCL. The signals that define distinct subtypes of cancers are often referred to as biomarkers and their presence or absence can, in some cases, be tested in a clinical setting. Biomarkers can be gene expression or splicing differences that distinguish normal from tumour cells, or may be somatic mutations that are unique only to tumour cells. Ryan is focusing his research on using second-generation high-throughput sequencing technologies to identify new biomarkers in cancer cells from a clinically diverse group of lymphoma patients.
Antisense Transcription in Mammalian Development
Trainee: Sorana Morrissy
Elucidating the mechanisms by which gene expression is regulated is one of the main challenges in understanding mammalian development. Antisense (AS) transcription has recently been recognized as one such mechanism, although fewer than 40 sense-antisense (S-AS) transcripts are known to play roles in development. With the availability of extensive EST and fully-sequenced cDNA libraries, the number of S-AS genes has grown to encompass nearly half of the transcriptome. By using serial analysis of gene expression (LongSAGE), a technology that provides quantitative measurements of transcript levels in a given sample, we are now poised to determine whether these transcripts play a role in regulating gene expression. As part of her thesis project, Sorana is analyzing LongSAGE libraries representing ~100 murine tissues and cell types throughout development. Using clustering methods and custom algorithms, she is able to identify AS transcripts expressed with tissue and temporal specificity, and further analyze the expression of the S and AS transcripts for patterns of expression consistent with regulation.
Experimental and bioinformatic approaches for the study of alternative transcript diversity in models of cancer progression
Trainee: Malachi Griffith
Initial studies following from the Human Genome Project have revealed that the apparent number of genes present in a human is much less than expected for such a remarkably complex organism. Recent studies suggest that in fact it is not the number of genes that gives rise to our complexity but rather the number of functionally distinct versions of a gene that can be encoded from a single gene region. Human genes are comprised of DNA sequences called exons seperated by long stretches of DNA sequence called introns which must be removed so that the exons can be assembled into a working copy of the gene. My research focuses on the phenomenon of alternative splicing in which one gene is assembled from its component pieces in many different ways to produce a multitude of different functional products. In particular, changes in the forms of certain genes may be important in the progression of cancer and account for the differences in the severity of cancers and response to treatment observed among individuals. For example, by studying the alternative splicing of genes in models of cancer we hope to identify promising candidates for vaccine and drug development. Specifically, we have selected a series of colon cancer cell lines representing a cancer that is sensitive to chemotherapy or resistant to one of four commonly used chemotherapy drugs. By studying the differences in gene structure correlated with this change in drug response we may gain insight into why chemotherapy initially seems to work well in some patients but becomes less effective over time.
The use of microarray technology to profile mRNA transcripts generated by alternative splicing is an area of rapid development. To facilitate the use of microarrays to study alternative splicing I created an open source array design platform for alternative expression analysis called ‘ALEXA' (www.AlexaPlatform.org). This platform allows the design and use of microarrays of arbitrary density and complexity for alternative transcript expression analysis of most EnsEMBL annotated species. Creation of ALEXA arrays involves the extraction, scoring, filtering and annotation of oligonucleotide probes corresponding to exons, introns, exon boundaries and exon-exon junctions. I used this platform to pre-compute designs for ten EnsEMBL annotated species. To evaluate the ALEXA approach I generated a design for the human genome and measured differential expression of alternate isoforms between 5-fluorouracil sensitive and resistant colorectal cancer cell lines. Results generated from ALEXA arrays were compared to those from Affymetrix exon arrays. ALEXA array data was comparable or superior to Affymetrix exon arrays in terms of reproducibility, sensitivity and specificity and provided additional information on the connectivity and boundaries of exons.
My continuing objectives are to catalog exon combinations encoded by alternatively spliced transcripts specific to chemotherapy resistance or other forms of cancer progression and to explore in detail the differential expression, transcript structure and function of a set of transcripts with these exon combinations.
Genome analysis of pre- and post-treatment lung cancers from patients in a phase II clinical trial of first-line erlotinib
Trainee: Ian Bosdet
Lung cancer is the leading cause of cancer mortality, causing over 1 million deaths worldwide each year. The majority of patients with lung cancer are diagnosed with advanced disease due to low rates of early detection. Standard treatments for lung cancer include surgery, radiation and chemotherapy. While these treatments have proven to be effective at reducing the tumor size and extending lifespan, 5-year survival rates are only 10-20%. Improved early diagnosis and better guidance in treatment options can make a significant impact on the number of patients living with, and dying from, this disease.
Genetic changes (mutations) in DNA are thought to be responsible for almost all human cancers. An understanding of how these changes cause cancer will require identification of mutations commonly observed in cancer, an explanation of their biological role in cancer initiation and growth and an exploration of their possible applications for diagnosis and novel anti-cancer treatments. Thus, the first step in the process of understanding the biology of lung cancer is to identify genetic mutations that are commonly observed in these tumors.
Recently, several anti-cancer drugs have been developed to more specifically target cancer cells and have fewer toxic side effects than standard chemotherapy. One of these, Tarceva, has been developed to target EGFR, a protein important for the growth of many lung cancers. For unknown reasons, Tarceva is most effective in patients that are female non-smokers of southeast Asian decent that have been diagnosed with lung adenocarcinoma. A genetic characterization of the tumors in these patients may explain why they are sensitive. This knowledge will help to identify other patients who will derive the most benefit from this drug and help in the design of the next generation of therapeutics.
My research asks the following questions:
1. What are the common genetic changes seen in the DNA of lung cancers?
2. Are there genetic changes that are common to lung cancers that are sensitive to Tarceva, and can we use these changes to predict which patients will derive benefit from this drug?
3. Can identification of common genetic changes in lung cancer provide insight into the biological processes that cause lung cancer to start and to grow?
To answer these questions we are applying DNA sequencing to the study of lung cancers isolated from 65 patients enrolled in a clinical trial of Tarceva at the BC Cancer Agency. Tissue samples will be collected from each patient and, using a new generation of sequencing instruments, all the genes expressed in the cancer cells will be analyzed for mutations and expression level. These data will reveal which genes the cancer cells are using and which have mutations. This information can then be correlated to patient treatment outcome to identify genetic changes that effect response of cancer to Tarceva. Genetic changes related to drug response could then be used in the clinic to predict which patients will benefit from this drug and which will not, improving our ability to select the most effective treatment option for all patients.
Targeting lung cancer genomics: A whole genome approach to predicting response
Trainee: Trevor Pugh
Lung cancer is the leading cause of cancer related death in the world. Only 16% of patients survive for more than five years and additional strategies are needed to treat this disease. A new drug, Tarceva, has been developed which targets EGFR, a protein that plays a major role in lung cancers. Patients responding to Tarceva are typically female, non-smokers of southeast Asian descent with adenocarcinoma. DNA sequence mutations and abnormal additional copies of the EGFR gene have been found in the tumours of many responsive patients further suggesting a genetic basis for response. However, several responders lack these variants limiting the use of EGFR status as a diagnostic predictor of drug response. In a search for novel gene changes that accurately predict drug efficacy, we are applying whole genome approaches as part of an ongoing clinical trial of Tarceva as a first-line therapy. Prior to treatment, lung tumour material and blood samples are collected from all patients. To identify abnormal genetic changes in the lung cancer cells, we are employing a whole transcriptome sequencing technique which looks for genetic changes in nearly every gene expressed by these cells at an unprescended resolution. To test whether these changes are unique to each tumour, mutations in these genes are then tested for in normal DNA isolated from the patients’ blood. We anticipate that these analyses will identify a set of previously unseen mutations unique to treatment-naïve lung tumours. This pre-drug genetic information will then be correlated with post-drug clinical response data to identify which, if any, of these genetic features are associated with successful treatment. Given such knowledge, the potential exists for physicians to accurately screen lung cancer patients and identify only those with the genetic capacity to respond to the drug. By combating cancer more effectively at the molecular level, responsive patients will receive effective first-line treatment while non-candidates will be expedited to receive alternate therapies.
Characterization of novel small-ORF genes in the transcriptome of human ES cells
Trainee: Jaswinder Khattra
Small proteins are important effectors in a variety of biological processes such as cell signalling, immunity, and cellular metabolism. However, their small size has resulted in this class of biomolecules being neglected in mammalian cDNA collections, which is evident by an artificial discontinuity in annotated protein-coding cDNAs at about the 100 amino acid mark. The objectives of this project comprise discovering novel ORFs from rare mRNA transcripts observed in human ES cell transcriptomes as defined by LongSAGE technology, assessing gene regulatory activity of novel protein-coding transcripts via measures of transcript abundance in undifferentiated ES cells versus embryoid bodies, and exploring physical evidence for the expression of novel short proteins in hES cells and embryoid bodies. Full-length novel transcripts recovered using RACE chemistries have been analysed extensively in the context of genomic features, comparison to transcript databases, gene structure, and predicted protein features. Novel ORFs did not exceed a length of 129 amino acids and lacked hits to well characterized protein domains. Candidates deemed interesting from these bioinformatic analyses and differential expression assays are being cloned for further functional evaluation. In addition, physical fractionation and enumeration of the short proteome of hES cells will be conducted in parallel as an alternate experimental approach for gene discovery. Collectively, these studies will contribute towards a more comprehensive catalogue of the biomolecules present within ES cells and a better understanding of their expression pattern and interactions, all of which are prerequisites for the effective application of stem cell products as therapeutic agents or models of tissue development.
Developing algorithms for microRNA expression profiling using a next-generation sequencing strategy
Trainee: Ryan Morin
The emergence of next-generation sequencing technologies has provided means to quickly sequence a population of millions of RNA molecules. When applied to small RNA fractions of cells, these sequences represent a quantitative SAGE-like snapshot of microRNA expression. Owing to the huge amount of sequence data and unique nature of sequence reads, novel software is required to summarize these data into manageable and accessible forms. A database has been developed that efficiently stores these sequences as well as a multitude of programs that perform various annotation and analysis tasks. His analysis pipeline has been applied to three separate research projects to date. The majority analysis has focused on the microRNAs that change during differentiation of human embryonic stem cells. Ongoing research is also revealing many novel microRNA genes and potential key targets of differentially expressed microRNAs.
A functional genomic approach identifies novel players of steroid hormone induced programmed cell death in drosophila
Trainee: Suganthi Chittaranjan
Co-Supervisor: Sharon Gorski
All multicellular organisms begin as a single cell that multiplies and shapes into a fully formed adult. During this process, millions of cells are produced and their fate is determined by survival and death signals. A genetically regulated program known as Programmed Cell Death (PCD) removes obsolete cells. PCD is an important process because errors in PCD can cause a variety of human diseases including cancer. Drosophila is a model organism that shares many features of PCD with humans. During Drosophila pupation, larval salivary glands undergo stage-specific PCD. Discovery of new genes involved in the cell death of larval salivary glands and unveiling the function of them will provide new insights into PCD and will provide potential new markers and therapeutic targets for cancer and other diseases. Using the powerful genomic and bioinformatic tools available in our center, we identified 500 candidate genes that were activated prior to PCD. We determined whether these 500 candidate genes are involved in PCD using a highthroughput technique know as RNA interference (RNAi). This technique allowed us to identify seven new genes that are involved in PCD as well as 19 genes that are involved in cell survival. In addition, extensive characterization of a novel gene which we suspect may play a role in PCD and fatty acid metabolism is underway. Recent evidences suggest that disruptions in fatty acid metabolism can cause cancer. Understanding a common gene that control both PCD and fatty acid metabolism may ultimately serve as a therapeutic target, which could control cancer and also improve the health of cancer patients.
The fly eye as a tool for cell death gene discovery - cloning and characterizing echinus
Trainee: Ian Bosdet
Co-Supervisor: Sharon Gorski
Programmed cell death (PCD) is a complex biological process in which damaged or unwanted cells in the body die and are removed. It is important during normal development as well as when cells become sick or damaged. Defects in PCD play an important role in many human diseases such as cancer, diabetes and neurodegenerative disorders such as Huntington's disease. In our lab we are using the fruit fly as a model to study PCD because it is known that this process occurs in a very similar way in both flies and in humans. The fruit fly is a commonly-used experimental organism and much is already known about its biology. Many methods exist for studying aspects of its development, genetics and cellular biology. The eye of the fruit fly is a collection of almost 800 individual units that are arrayed in a very specific manner. This array is shaped during fly development by PCD and so changes in the extent of cell death in the eye will change the pattern of eye units. One well-known but previously uncharacterized gene that affects cell death in the eye is called echinus. We are characterizing this gene to determine what role it plays in PCD. It is hoped that this knowledge will further our understanding of the process of PCD, which may ultimately lead to better prevention and treatment of many common diseases.
Common regulators of apoptosis and autophagy-an analysis of known cell death genes in starvation induced autophagy
Trainee: Claire Hou
Co-Supervisor: Sharon Gorski
Marcoautophagy (autophagy) is a house-keeping mechanism for the degradation of long-lived proteins and organelles. During autophagy, cytoplasmic components are sequestered into double membrane structures called autophagosomes which then fuse with lysosomes to form autolysosomes, in which degradation occurs. The degraded products can be further recycled for macromolecular synthesis and energy production to sustain cell survival. Recent evidence indicates that regulators of apoptosis, including TRAIL, Bcl-2 and DAPK, can also regulate autophagy. This finding could have important therapeutic implications since the induction of apoptosis is a current strategy of cancer treatment modalities. Modulation of autophagy has similarly been proposed as a therapeutic strategy for cancer. Thus, it is important to understand not only the role of autophagy in cancer, but also the regulatory relationships between apoptosis and autophagy pathways. My study aims to investigate the involvement of cell death genes in starvation-induced autophagy in Dropophila model system.
Common regulators of apoptosis and autophagy-an analysis of known cell death genes in starvation induced autophagy
Trainee: Dianne Wu
MicroRNAs are known regulators of many conserved biological processes in the cell. They are generated from double-stranded RNAs via the cleavage of stem loops formed from pre-miRNA sequences, in a process similar to part of the RNAi pathway. RNA editing is a post-transcriptional event that occurs frequently in the cell, causing single-nucleotides alterations in RNA transcripts, and also operates on double stranded RNAs. The overlap between these two regulatory pathways, both of which are understood only vaguely, poses a high potential of playing a crucial role in the regulation of biological processes. RNA editing in microRNAs may alter the targets, efficiency, or activity of the effected microRNAs. We have utilized Next-Generation sequencing technology to sequence miRNA libraries as an approach to detect miRNA editing events with extremely high sensitivity. Although high-throughput sequencing data is extremely noisy, and this poses a difficult challenge for finding single-nucleotide editing events in short miRNA sequences of on average 22 nucleotides in length, we have employed several methods to recognize miRNA editing events with improved specificity and sensitivity. Examining the molecular composition of a pathway for which the exact biological mechanism is still uncertain and few assumptions can be made, our goal is to reveal new insights into the activity and composition of microRNAs in the cell and develop a novel, robust method for detection of miRNA editing events using high-throughput sequencing technology.
Follicular Lymphoma

Trainee: Alison Lee
Alison Lee is working with Tesa Severson in a project identifying single nucleotide polymorphisms (SNPs) in follicular lymphoma tumour cells. Her role is to assist in verifying SNPs throughout a variety of different methods including Illumina sequencing and capillary sequencing. Through this project, Alison hopes to identify novel mutations that play an integral role in causing follicular lymphoma.
Validation of candidate mutations in lymphoma

Trainee: Jasmine Lin
Jasmine is providing research assistance to Dr. Andrew Mungall. Projects underway include the verification of large-scale rearrangements in follicular lymphoma patients. These projects involve PCR, agarose gel electrophoresis and traditional Sanger sequencing technologies, followed by manual analysis of gel and sequence data to determine whether these mutations are derived from germline or somatic cells.
Retrospective clinical study aimed on improving treatment screening for colorectal cancer patients with respect to mutations of the UMPS gene and the drug 5-fluorouracil
Trainees: Jess Paul, Pierre Cheung (BC Clinical Genomics Network Studentship awardee), and Shaun Drummond
5-fluorouracil (5-FU) is a common chemotherapeutic drug used to combat colorectal cancer, the third most common cancer among Canadians. Though some patients respond positively to 5-FU, others are unaffected or react adversely. In order to metabolize 5-FU into its cancer-fighting form, expression of the UMPS gene is necessary. Therefore mutations in the UMPS gene are believed to confer 5-FU drug resistance to tumour cells.
As part of Malachi Griffith’s work on the characterization of the UMPS locus and its potential relevance to 5-FU resistance, a retrospective clinical study aimed on improving treatment screening for colorectal cancer is in progress. Through collaborations with local hospitals and the Ontario Tumour Bank, 117 pathology samples from colorectal cancer patients were gathered. Sequencing and expression analysis of the UMPS gene isoforms will be analyzed and correlated according to the presence of 5-FU treatment and treatment outcomes. Hopefully, the new insights will help predict treatment response and allow responsible allocation of the drug 5-FU.
Mutation Browser: Synthesizing Datasources to Explore Non-Hodgkins Lymphom
Trainee: Eric Zhao
During my time at the BC Genome Sciences Centre, I have worked towards developing a web browser based user interface as a more intuitive, streamlined way to view next generation sequencing data. Under the guidance of Ryan Morin, I am developing tools to display lymphoma patient data interpreted by a bioinformatics pipeline that includes BWA alignment and variant detection using SNVMix. By accessing information from SIFT (http://sift.jcvi.org), Ensembl (http://www.ensembl.org), and Uniprot (http://www.uniprot.org), variants can be annotated and interactively filtered to potentially facilitate the discovery of key features which call for further investigation.
Detection of novel mutations in brain cancers using next-generation sequencing
Trainee: Sasha Maslova
I have been providing research assistance to Olena Morozova in studying tumours of the nervous system. The current focus of her research is brain tumours, specifically oligodendroglioma and glioblastoma multiforme. Bioinformatics analysis is used to interpret next-generation sequencing data to identify genomic aberrations, such as gene expression changes, point mutations, and larger genomic rearrangements in tumour cells. These studies hope to identify the molecular mechanisms that lead to the cancer phenotype, as well as potential prognostic markers and targets for drug therapy.

