Frontiers in Genetics
RNA sequencing (RNAseq) has been widely used to generate bulk gene expression measurements collected from pools of cells. Only relatively recently have single-cell RNAseq (scRNAseq) methods provided opportunities for gene expression analyses at the single-cell level, allowing researchers to study heterogeneous mixtures of cells at unprecedented resolution. Tumors tend to be composed of heterogeneous cellular mixtures and are frequently the subjects of such analyses. Extensive method developments have led to several protocols for scRNAseq but, owing to the small amounts of RNA in single cells, technical constraints have required compromises. For example, the majority of scRNAseq methods are limited to sequencing only the 3′ or 5′ termini of transcripts. Other protocols that facilitate full-length transcript profiling tend to capture only polyadenylated mRNAs and are generally limited to processing only 96 cells at a time. Here, we address these limitations and present a novel protocol that allows for the high-throughput sequencing of full-length, total RNA at single-cell resolution. We demonstrate that our method produced strand-specific sequencing data for both polyadenylated and non-polyadenylated transcripts, enabled the profiling of transcript regions beyond only transcript termini, and yielded data rich enough to allow identification of cell types from heterogeneous biological samples.
Nature Genetics, 2020
Cervical cancer is the most common cancer affecting sub-Saharan African women and is prevalent among HIV-positive (HIV+) individuals. No comprehensive profiling of cancer genomes, transcriptomes or epigenomes has been performed in this population thus far. We characterized 118 tumors from Ugandan patients, of whom 72 were HIV+, and performed extended mutation analysis on an additional 89 tumors. We detected human papillomavirus (HPV)-clade-specific differences in tumor DNA methylation, promoter- and enhancer-associated histone marks, gene expression and pathway dysregulation. Changes in histone modification at HPV integration events were correlated with upregulation of nearby genes and endogenous retroviruses.
Nature Cancer, 2020
Advanced and metastatic tumors with complex treatment histories drive cancer mortality. Here we describe the POG570 cohort, a comprehensive whole-genome, transcriptome and clinical dataset, amenable for exploration of the impacts of therapies on genomic landscapes. Previous exposure to DNA-damaging chemotherapies and mutations affecting DNA repair genes, including POLQ and genes encoding Polζ, were associated with genome-wide, therapy-induced mutagenesis. Exposure to platinum therapies coincided with signatures SBS31 and DSB5 and, when combined with DNA synthesis inhibitors, signature SBS17b. Alterations in ESR1, EGFR, CTNNB1, FGFR1, VEGFA and DPYD were consistent with drug resistance and sensitivity. Recurrent noncoding events were found in regulatory region hotspots of genes including TERT, PLEKHS1, AP2A1 and ADGRG6. Mutation burden and immune signatures corresponded with overall survival and response to immunotherapy. Our data offer a rich resource for investigation of advanced cancers and interpretation of whole-genome and transcriptome sequencing in the context of a cancer clinic.
Read the publication
Read the news story
Download the complete author manuscript (1.2 MB)
Download the main figures (6.1 MB)
Download the extended data figures (11.8 MB)
Download the supplementary methods (299 KB)
Download the supplementary tables (1.1 MB)
Cell reports, 2019
Extra-cranial malignant rhabdoid tumors (MRTs) and cranial atypical teratoid RTs (ATRTs) are heterogeneous pediatric cancers driven primarily by SMARCB1 loss. To understand the genome-wide molecular relationships between MRTs and ATRTs, we analyze multi-omics data from 140 MRTs and 161 ATRTs. We detect similarities between the MYC subgroup of ATRTs (ATRT-MYC) and extra-cranial MRTs, including global DNA hypomethylation and overexpression of HOX genes and genes involved in mesenchymal development, distinguishing them from other ATRT subgroups that express neural-like features. We identify five DNA methylation subgroups associated with anatomical sites and SMARCB1 mutation patterns. Groups 1, 3, and 4 exhibit cytotoxic T cell infiltration and expression of immune checkpoint regulators, consistent with a potential role for immunotherapy in rhabdoid tumor patients.
Proceedings of the National Academy of Sciences of the United States of America, 2019
Glioblastoma multiforme (GBM) is the most deadly brain tumor, and currently lacks effective treatment options. Brain tumor-initiating cells (BTICs) and orthotopic xenografts are widely used in investigating GBM biology and new therapies for this aggressive disease. However, the genomic characteristics and molecular resemblance of these models to GBM tumors remain undetermined. We used massively parallel sequencing technology to decode the genomes and transcriptomes of BTICs and xenografts and their matched tumors in order to delineate the potential impacts of the distinct growth environments. Using data generated from whole-genome sequencing of 201 samples and RNA sequencing of 118 samples, we show that BTICs and xenografts resemble their parental tumor at the genomic level but differ at the mRNA expression and epigenomic levels, likely due to the different growth environment for each sample type. These findings suggest that a comprehensive genomic understanding of in vitro and in vivo GBM model systems is crucial for interpreting data from drug screens, and can help control for biases introduced by cell-culture conditions and the microenvironment in mouse models. We also found that lack of expression in pretreated GBM is linked to hypermutation, which in turn contributes to increased genomic heterogeneity and requires new strategies for GBM treatment.
Cold Spring Harbor molecular case studies, 2019
Effective management of brain and spine tumors relies on a multidisciplinary approach encompassing surgery, radiation, and systemic therapy. In the era of personalized oncology, the latter is complemented by various molecularly targeting agents. Precise identification of cellular targets for these drugs requires comprehensive profiling of the cancer genome coupled with an efficient analytic pipeline, leading to an informed decision on drug selection, prognosis, and confirmation of the original pathological diagnosis. Acquisition of optimal tumor tissue for such analysis is paramount and often presents logistical challenges in neurosurgery. Here, we describe the experience and results of the Personalized OncoGenomics (POG) program with a focus on tumors of the central nervous system (CNS). Patients with recurrent CNS tumors were consented and enrolled into the POG program prior to accrual of tumor and matched blood followed by whole-genome and transcriptome sequencing and processing through the POG bioinformatic pipeline. Sixteen patients were enrolled into POG. In each case, POG analyses identified genomic drivers including novel oncogenic fusions, aberrant pathways, and putative therapeutic targets. POG has highlighted that personalized oncology is truly a multidisciplinary field, one in which neurosurgeons must play a vital role if these programs are to succeed and benefit our patients.
The analysis of cell-free circulating tumor DNA (ctDNA) is potentially a less invasive, more dynamic assessment of cancer progression and treatment response than characterizing solid tumor biopsies. Standard isolation methods require separation of plasma by centrifugation, a time-consuming step that complicates automation. To address these limitations, we present an automatable magnetic bead-based ctDNA isolation method that eliminates centrifugation to purify ctDNA directly from peripheral blood (PB). To develop and test our method, ctDNA from cancer patients was purified from PB and plasma. We found that allelic fractions of somatic single-nucleotide variants from target gene capture libraries were comparable, indicating that the PB ctDNA purification method may be a suitable replacement for the plasma-based protocols currently in use.
PloS one, 2019
Next generation RNA-sequencing (RNA-seq) is a flexible approach that can be applied to a range of applications including global quantification of transcript expression, the characterization of RNA structure such as splicing patterns and profiling of expressed mutations. Many RNA-seq protocols require up to microgram levels of total RNA input amounts to generate high quality data, and thus remain impractical for the limited starting material amounts typically obtained from rare cell populations, such as those from early developmental stages or from laser micro-dissected clinical samples. Here, we present an assessment of the contemporary ribosomal RNA depletion-based protocols, and identify those that are suitable for inputs as low as 1-10 ng of intact total RNA and 100-500 ng of partially degraded RNA from formalin-fixed paraffin-embedded tissues.
Cold Spring Harbor molecular case studies, 2018
Thyroid-like follicular renal cell carcinoma (TLFRCC) is a rare cancer with few reports of metastatic disease. Little is known regarding genomic characteristics and therapeutic targets. We present the clinical, pathologic, genomic, and transcriptomic analyses of a case of a 27-yr-old male with TLFRCC who presented initially with bone metastases of unknown primary. Genomic DNA from peripheral blood and metastatic tumor samples were sequenced. A transcriptome of 280 million sequence reads was generated from the same tumor sample. Tumor somatic expression profiles were analyzed to detect aberrant expression. Genomic and transcriptomic data sets were integrated to reveal dysregulation in pathways and identify potential therapeutic targets. Integrative genomic analysis with The Cancer Genome Atlas (TCGA) data set revealed the following outliers in gene expression profiles: (81st percentile), (99th percentile), (100th percentile), and (99th and 100th percentiles, respectively), and (86th percentile). The patient received first-line sunitinib to target PDGFRA and PDGFRB and had stable disease for >6 mo, followed by nivolumab upon progression. To the authors' knowledge, this is the first reported case of comprehensive somatic genomic analyses in a patient with metastatic TLFRCC. Somatic analyses provided molecular confirmation of the primary site of cancer and potential therapeutic strategies in a rare disease with little evidence of efficacy on systemic therapy.
Mixed phenotype acute leukaemia (MPAL) is a high-risk subtype of leukaemia with myeloid and lymphoid features, limited genetic characterization, and a lack of consensus regarding appropriate therapy. Here we show that the two principal subtypes of MPAL, T/myeloid (T/M) and B/myeloid (B/M), are genetically distinct. Rearrangement of ZNF384 is common in B/M MPAL, and biallelic WT1 alterations are common in T/M MPAL, which shares genomic features with early T-cell precursor acute lymphoblastic leukaemia. We show that the intratumoral immunophenotypic heterogeneity characteristic of MPAL is independent of somatic genetic variation, that founding lesions arise in primitive haematopoietic progenitors, and that individual phenotypic subpopulations can reconstitute the immunophenotypic diversity in vivo. These findings indicate that the cell of origin and founding lesions, rather than an accumulation of distinct genomic alterations, prime tumour cells for lineage promiscuity. Moreover, these findings position MPAL in the spectrum of immature leukaemias and provide a genetically informed framework for future clinical trials of potential treatments for MPAL.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 2017
Purpose Children with acute myeloid leukemia (AML) whose disease is refractory to standard induction chemotherapy therapy or who experience relapse after initial response have dismal outcomes. We sought to comprehensively profile pediatric AML microRNA (miRNA) samples to identify dysregulated genes and assess the utility of miRNAs for improved outcome prediction. Patients and Methods To identify miRNA biomarkers that are associated with treatment failure, we performed a comprehensive sequence-based characterization of the pediatric AML miRNA landscape. miRNA sequencing was performed on 1,362 samples-1,303 primary, 22 refractory, and 37 relapse samples. One hundred sixty-four matched samples-127 primary and 37 relapse samples-were analyzed by using RNA sequencing. Results By using penalized lasso Cox proportional hazards regression, we identified 36 miRNAs the expression levels at diagnosis of which were highly associated with event-free survival. Combined expression of the 36 miRNAs was used to create a novel miRNA-based risk classification scheme (AMLmiR36). This new miRNA-based risk classifier identifies those patients who are at high risk (hazard ratio, 2.830; P ≤ .001) or low risk (hazard ratio, 0.323; P ≤ .001) of experiencing treatment failure, independent of conventional karyotype or mutation status. The performance of AMLmiR36 was independently assessed by using 878 patients from two different clinical trials (AAML0531 and AAML1031). Our analysis also revealed that miR-106a-363 was abundantly expressed in relapse and refractory samples, and several candidate targets of miR-106a-5p were involved in oxidative phosphorylation, a process that is suppressed in treatment-resistant leukemic cells. Conclusion To assess the utility of miRNAs for outcome prediction in patients with pediatric AML, we designed and validated a miRNA-based risk classification scheme. We also hypothesized that the abundant expression of miR-106a could increase treatment resistance via modulation of genes that are involved in oxidative phosphorylation.
We report a comprehensive analysis of 412 muscle-invasive bladder cancers characterized by multiple TCGA analytical platforms. Fifty-eight genes were significantly mutated, and the overall mutational load was associated with APOBEC-signature mutagenesis. Clustering by mutation signature identified a high-mutation subset with 75% 5-year survival. mRNA expression clustering refined prior clustering analyses and identified a poor-survival "neuronal" subtype in which the majority of tumors lacked small cell or neuroendocrine histology. Clustering by mRNA, long non-coding RNA (lncRNA), and miRNA expression converged to identify subsets with differential epithelial-mesenchymal transition status, carcinoma in situ scores, histologic features, and survival. Our analyses identified 5 expression subtypes that may stratify response to different treatments.
Nature genetics, 2017
Spatial heterogeneity of transcriptional and genetic markers between physically isolated biopsies of a single tumor poses major barriers to the identification of biomarkers and the development of targeted therapies that will be effective against the entire tumor. We analyzed the spatial heterogeneity of multiregional biopsies from 35 patients, using a combination of transcriptomic and genomic profiles. Medulloblastomas (MBs), but not high-grade gliomas (HGGs), demonstrated spatially homogeneous transcriptomes, which allowed for accurate subgrouping of tumors from a single biopsy. Conversely, somatic mutations that affect genes suitable for targeted therapeutics demonstrated high levels of spatial heterogeneity in MB, malignant glioma, and renal cell carcinoma (RCC). Actionable targets found in a single MB biopsy were seldom clonal across the entire tumor, which brings the efficacy of monotherapies against a single target into question. Clinical trials of targeted therapies for MB should first ensure the spatially ubiquitous nature of the target mutation.
Cancer cell, 2016
Malignant rhabdoid tumors (MRTs) are rare lethal tumors of childhood that most commonly occur in the kidney and brain. MRTs are driven by SMARCB1 loss, but the molecular consequences of SMARCB1 loss in extra-cranial tumors have not been comprehensively described and genomic resources for analyses of extra-cranial MRT are limited. To provide such data, we used whole-genome sequencing, whole-genome bisulfite sequencing, whole transcriptome (RNA-seq) and microRNA sequencing (miRNA-seq), and histone modification profiling to characterize extra-cranial MRTs. Our analyses revealed gene expression and methylation subgroups and focused on dysregulated pathways, including those involved in neural crest development.
Primary triple-negative breast cancers (TNBCs), a tumour type defined by lack of oestrogen receptor, progesterone receptor and ERBB2 gene amplification, represent approximately 16% of all breast cancers. Here we show in 104 TNBC cases that at the time of diagnosis these cancers exhibit a wide and continuous spectrum of genomic evolution, with some having only a handful of coding somatic aberrations in a few pathways, whereas others contain hundreds of coding somatic mutations. High-throughput RNA sequencing (RNA-seq) revealed that only approximately 36% of mutations are expressed. Using deep re-sequencing measurements of allelic abundance for 2,414 somatic mutations, we determine for the first time-to our knowledge-in an epithelial tumour subtype, the relative abundance of clonal frequencies among cases representative of the population. We show that TNBCs vary widely in their clonal frequencies at the time of diagnosis, with the basal subtype of TNBC showing more variation than non-basal TNBC. Although p53 (also known as TP53), PIK3CA and PTEN somatic mutations seem to be clonally dominant compared to other genes, in some tumours their clonal frequencies are incompatible with founder status. Mutations in cytoskeletal, cell shape and motility proteins occurred at lower clonal frequencies, suggesting that they occurred later during tumour progression. Taken together, our results show that understanding the biology and therapeutic responses of patients with TNBC will require the determination of individual tumour clonal genotypes.
Genome biology, 2010
Adenocarcinomas of the tongue are rare and represent the minority (20 to 25%) of salivary gland tumors affecting the tongue. We investigated the utility of massively parallel sequencing to characterize an adenocarcinoma of the tongue, before and after treatment.
Science (New York, N.Y.), 2003
We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously known groups of coronaviruses. The genome sequence will aid in the diagnosis of SARS virus infection in humans and potential animal hosts (using polymerase chain reaction and immunological tests), in the development of antivirals (including neutralizing antibodies), and in the identification of putative epitopes for vaccine development.