Over the next five years, I and my colleagues at the GSC will be involved in the design, evaluation and implementation of novel genomics approaches to research problems fundamentally important in health and disease. My personal research program, which is designed to complement those of others at the GSC, will utilize state-of-the-art genomics approaches to focus primarily on (1) identification and analysis of genome variation in health and disease models and (2) expression genomics, in which DNA sequencing and micro-array approaches will be used to identify, clone and sequence differentially expressed transcripts, including alternatively spliced transcripts produced from the same locus but encoding different proteins. Specific examples of projects in both of my two selected areas of emphasis are provided below to illustrate the approaches I will use to study mental retardation, cancers, and embryonic stem cell lines. Much of work is collaborative, and so involves researchers with biological and disease expertise complementary to my own expertise in genomics.
Biology of Cancer: Insights from genomics analyses of lymphoid neoplasms
This team grant funded in 2008 includes several members of the BC Cancer Agency. Dr. Marra brings genomics and technology expertise to the group. Drs. Joe Connors, Randy Gascoyne and Doug Horsman bring the lymphoma biology and the clinical perspective. The team proposes to apply Illumina sequencing and bioinformatics pipelines to genome-wide analysis of lymphomas, with the objectives of analyzing entire transcriptomes, including microRNAs, and discovering mutated genes, novel transcript structures, and candidate small molecule inhibitors of proteins we will identify .
Genomics applied to the management of high-risk AML/myelodysplastic syndromes
This research project is part of Genome BC’s personalized medicine program, and started in 2011. It is led by Dr. Aly Karsan, Medical Director of the Cancer Genetics Laboratory at BC Cancer Agency and Dr. Marra. Using AML as a model, scientists and clinicians will use advanced genomics technologies to improve the therapeutic stratification of patients, thereby leading to more personalized treatment and hopefully improved outcomes. This study will also assess costs associated with the implementation of personalized diagnostic in the health care system, as well as social benefits and risks and best practices for handling the release of genetic information.
Cancer transcriptome characterization using massively parallel DNA sequencing
This is part of The Cancer Genome Atlas (TCGA) project, which is an NIH-initiated, comprehensive, and coordinated effort to accelerate the understanding of the genetics of cancer by studying more than 20 tumor types and analyzing thousands of samples over a five year period. Each cancer will undergo detailed genomic characterization that incorporates powerful bioinformatic and data analysis components. The TCGA project will result in the most comprehensive understanding of cancer genomes and will enable researchers to further mine the TCGA data to improve prevention, diagnosis and treatment of cancer. Over a period of 5 years, the GSC will be characterizing in total more than 10,000 tumor samples of cancer patients by either micro RNA (miRNA) analysis alone or in combination with RNA-Seq analysis. To optimize the interpretation of the biological information, the GSC will also provide support to the Genome Data Analysis Centers for RNA-specific data analysis.
Sequencing for discovery of candidate mutations in lymphoma transcriptomes
In combination, the lymphoid cancers (non-Hodgkin lymphoma, Hodgkin lymphoma, myeloma and chronic lymphocytic leukemia), constitute the fourth most common malignancy in both men and women in North America. Lymphomas typically have characteristic abnormal chromosomes, including translocations, indicating the relevance of mutations to how NHLs develop and behave. This project uses lymphoid neoplasms as the test platform to demonstrate that detailed mutational analysis associated with a specific well-characterized set of neoplasms can provide a candidate list of mutation sites specific to and common across lymphoma types. The resulting data set is facilitating the study of clinical behavior, response to treatment, patient outcome and survival, and target pathways for therapeutic agents. To pursue these objectives, we are using state-of-the-art transcriptome and whole genome sequencing coupled with leading edge bioinformatics, data management and analysis approaches to probe for genomic abnormalities in malignant cells from 92 Diffuse Large B-cell Lymphoma (DLBCL) cases.
HIV+ associated cancers-genomic and transcriptome characterization
The global prevalence of HIV infection is approximately 40 million. People infected with HIV have an elevated risk of cancer and mortality, and cancer is a leading cause of death among people with HIV/AIDS. Certain cancers, but not others, are increased in patients with HIV infection. Even though many HIV-associated cancers have a viral etiology, and immunodeficiency is believed to provide a permissive environment for viral oncogenesis, many questions remain about how these tumors form. Surprisingly, these tumors have not been extensively molecularly characterized. The goal of this project is to develop a comprehensive database of the molecular changes in Human Immunodeficiency Virus (HIV)-associated cancers (from HIV-infected patients) that will be available to the research community world-wide. The molecular characterization is sequence-based, using 2nd generation sequencing technology.
Integrated epigenetic maps of human embryonic and adult cells
Dr. Joe Costello from the University of California San Francisco and Dr. Marra co-lead a Reference Epigenome Mapping Center as part of the NIH Roadmap Epigenomics Project. They work cooperatively with other Roadmap Epigenome Mapping Centers and the Data Coordination Center (EDACC), on this project to comprehensively mapping epigenomes of select human cells with significant relevance to complex human disease. Their group is focusing on cells from the blood, brain, skin, breast and other tissues . This epigenetic data, along with genetic and expression data will be integrated using advanced informatics to address fundamental roles of epigenetics in differentiation, maintenance of cell-type identity and gene expression. Their reference epigenomes will enable new disciplines including human population epigenetics, comparative epigenomics, neuroepigenetics, and therapeutic epigenetics for tissue regeneration and reversal of disease.
Stratifying and targeting pediatric medulloblastoma through genomics
Drs. Michael Taylor, David Malkin, and I co-lead a project with the goal to analyze the genomic information of pediatric medulloblastoma samples, which have been obtained through the international medulloblastoma consortium. mRNA and miRNA expression profiles of 1000 samples, representing all four subgroups (Wnt, Shh, Group 3, and 4), will be studied to identify novel subtypes within each subgroup. The resulting subtype-specific expression profiles will support the development of reliable and robust biomarkers to more accurately and reliably classify medulloblastomas for treatment in clinical trials. Genomic DNA analysis of 380 high risk subgroup samples will support the discovery of subgroup specific somatic mutations in order to inform current clinical trials of targeted therapies, and to identify genes and pathways already targeted in other diseases.
Significant Research Contributions
My most significant contributions to genome science are listed below. Publications have been organized into six groups of technically or scientifically related topic areas.
I. Science, 2009 Apr 24;324(5926):522-528; Genome Biol, 2007 Oct 22;8(10):R224; Science, 2007 Apr 13;316(5822):222-234; Science, 2006 Nov 10;314(5801):941-952; Science, 2006 Sep 15;313(5793):1596-1604; Genome Res, 2006 Jun;16(6):768-775; Science, 2006 Sep 15; 313 (5793):1596-1604. Proc Natl Acad Sci USA, 2005 Dec 20;102(51):18526-18531; Science, 2005 Jul 15;309(5733):436-442; Nature, 2005 Apr 7;434(7034):724-731; Science, 2005 Feb 25;307(5713):1321-1324; Nature, 2004 Apr 1;428(6982):493-521; Nature, 2003 Jul 10;424(6945):157-164; Nature, 2002 Aug 15;418 (6899):743-750; Nature Genet, 2001Oct;29(2):133-134; Genome Res, 2001 Feb;11(2):274-280; Nature, 2001 Feb 15;409(6822):934-941; Nature, 2001 Feb 15; 409(6822):860-921.Genome Res, 1997;7:1072-1084.
These selected publications describe large-scale high throughput DNA sequencing conducted via a hierarchical map-based approach. The papers published in the Feb. 15, 2001 issue of Nature, titled "The Human Genome", describe the construction and use of the human genome map to fuel human genome sequencing. My contribution was to devise and then implement the approaches that led to the construction and use of the map, which served as the centralized coordinating resource for the sequencing effort. I also led map construction efforts in support of the sequencing of the mouse, rat, bovine, and other genomes, as described in these papers.
II. Nature, 2000 Dec 14;408(6814):796-815; Nature, 2000;408:823-826; Cell, 2000;100:377-386; Nature, 1999; 402:769-776; Science, 1999;286:2468-2474; Nature Genet, 1999;22:265-270; Nature Genet, 1999;22:271-275.
The series of papers describes the mapping and sequencing of the Arabidopsis thaliana genome. A. thaliana is an important model plant used widely to address issues relevant to plant developmental genetics. I was a key member of the Cold Spring Harbor Sequencing Consortium, focused on first leading the effort to map the A. thaliana genome and subsequently coordinating aspects of the whole genome sequencing activity.
III. Emerg Infect Dis, 2004 Dec;10(12):2192-2195; Science, 2003 May;300(5624):1399-1404.
The EID publication describes the sequencing of Avian flu genomes isolated from human patients during an Avian flu outbreak. The Science publication describes the rapid generation of the complete and accurate sequence of the SARS-associated coronavirus. The Genome Sciences Centre generated and end-sequenced cDNAs, and then assembled these sequences into the final ~29 kilobase genome sequence. The entire effort took about six days, demonstrating that genome sequencing of a new viral pathogen could be considered a legitimate part of a "rapid response" to an emerging infectious disease. The Science paper has been cited more than 900 times.
Marco Marra's Complete Publications List including selected links to full text articles.