Imprinting is a critical part of normal embryonic development in mammals, controlled by defined parent-of-origin (PofO) differentially methylated regions (DMRs) known as imprinting control regions. Direct nanopore sequencing of DNA provides a means to detect allelic methylation and to overcome the drawbacks of methylation array and short-read technologies. Here, we used publicly available nanopore sequencing data for 12 standard B-lymphocyte cell lines to acquire the genome-wide mapping of imprinted intervals in humans. Using the sequencing data, we were able to phase 95% of the human methylome and detect 94% of the previously well-characterized, imprinted DMRs. In addition, we found 42 novel imprinted DMRs (16 germline and 26 somatic), which were confirmed using whole-genome bisulfite sequencing (WGBS) data. Analysis of WGBS data in mouse (Mus musculus), rhesus monkey (Macaca mulatta), and chimpanzee (Pan troglodytes) suggested that 17 of these imprinted DMRs are conserved. Some of the novel imprinted intervals are within or close to imprinted genes without a known DMR. We also detected subtle parental methylation bias, spanning several kilobases at seven known imprinted clusters. At these blocks, hypermethylation occurs at the gene body of expressed allele(s) with mutually exclusive H3K36me3 and H3K27me3 allelic histone marks. These results expand upon our current knowledge of imprinting and the potential of nanopore sequencing to identify imprinting regions using only parent-offspring trios, as opposed to the large multi-generational pedigrees that have previously been required.
As guidelines, therapies and literature on cancer variants expand, the lack of consensus variant interpretations impedes clinical applications. CIViC is a public-domain, crowd-sourced and adaptable knowledgebase of evidence for the clinical interpretation of variants in cancer, designed to reduce barriers to knowledge sharing and alleviate the variant-interpretation bottleneck.
Cold Spring Harbor Molecular Case Studies
Adrenocortical cancer (ACC) is a rare cancer of the adrenal gland. Several driver mutations have been identified in both primary and metastatic ACCs, but the therapeutic options are still limited. We performed whole-genome and transcriptome sequencing on seven patients with metastatic ACC. Integrative analysis of mutations, RNA expression changes, mutation signature, and homologous recombination deficiency (HRD) analysis was performed. Mutations affecting CTNNB1 and TP53 and frequent loss of heterozygosity (LOH) events were observed in our cohort. Alterations affecting genes involved in cell cycle (RB1, CDKN2A, CDKN2B), DNA repair pathways (MUTYH, BRCA2, ATM, RAD52, MLH1, MSH6), and telomere maintenance (TERF2 and TERT) consisting of somatic and germline mutations, structural variants, and expression outliers were also observed. HRDetect, which aggregates six HRD-associated mutation signatures, identified a subset of cases as HRD. Genomic alterations affecting genes involved in epigenetic regulation were also identified, including structural variants (SWI/SNF genes and histone methyltransferases), and copy gains and concurrent high expression of KDM5A, which may contribute to epigenomic deregulation. Findings from this study highlight HRD and epigenomic pathways as potential therapeutic targets and suggest a subgroup of patients may benefit from a diverse array of molecularly targeted therapies in ACC, a rare disease in urgent need of therapeutic strategies.
Briefings in Bioinformatics
Survival analysis is a technique for identifying prognostic biomarkers and genetic vulnerabilities in cancer studies. Large-scale consortium-based projects have profiled >11 000 adult and >4000 pediatric tumor cases with clinical outcomes and multiomics approaches. This provides a resource for investigating molecular-level cancer etiologies using clinical correlations. Although cancers often arise from multiple genetic vulnerabilities and have deregulated gene sets (GSs), existing survival analysis protocols can report only on individual genes. Additionally, there is no systematic method to connect clinical outcomes with experimental (cell line) data. To address these gaps, we developed cSurvival (https://tau.cmmt.ubc.ca/cSurvival). cSurvival provides a user-adjustable analytical pipeline with a curated, integrated database and offers three main advances: (i) joint analysis with two genomic predictors to identify interacting biomarkers, including new algorithms to identify optimal cutoffs for two continuous predictors; (ii) survival analysis not only at the gene, but also the GS level; and (iii) integration of clinical and experimental cell line studies to generate synergistic biological insights. To demonstrate these advances, we report three case studies. We confirmed findings of autophagy-dependent survival in colorectal cancers and of synergistic negative effects between high expression of SLC7A11 and SLC2A1 on outcomes in several cancers. We further used cSurvival to identify high expression of the Nrf2-antioxidant response element pathway as a main indicator for lung cancer prognosis and for cellular resistance to oxidative stress-inducing drugs. Altogether, these analyses demonstrate cSurvival's ability to support biomarker prognosis and interaction analysis via gene- and GS-level approaches and to integrate clinical and experimental biomedical studies.
The Journal of Pathology Clinical Research
In this study, we evaluate the impact of whole genome and transcriptome analysis (WGTA) on predictive molecular profiling and histologic diagnosis in a cohort of advanced malignancies. WGTA was used to generate reports including molecular alterations and site/tissue of origin prediction. Two reviewers analyzed genomic reports, clinical history, and tumor pathology. We used National Comprehensive Cancer Network (NCCN) consensus guidelines, Food and Drug Administration (FDA) approvals, and provincially reimbursed treatments to define genomic biomarkers associated with approved targeted therapeutic options (TTOs). Tumor tissue/site of origin was reassessed for most cases using genomic analysis, including a machine learning algorithm (Supervised Cancer Origin Prediction Using Expression [SCOPE]) trained on The Cancer Genome Atlas data. WGTA was performed on 652 cases, including a range of primary tumor types/tumor sites and 15 malignant tumors of uncertain histogenesis (MTUH). At the time WGTA was performed, alterations associated with an approved TTO were identified in 39 (6%) cases; 3 of these were not identified through routine pathology workup. In seven (1%) cases, the pathology workup either failed, was not performed, or gave a different result from the WGTA. Approved TTOs identified by WGTA increased to 103 (16%) when applying 2021 guidelines. The histopathologic diagnosis was reviewed in 389 cases and agreed with the diagnostic consensus after WGTA in 94% of non-MTUH cases (n = 374). The remainder included situations where the morphologic diagnosis was changed based on WGTA and clinical data (0.5%), or where the WGTA was non-contributory (5%). The 15 MTUH were all diagnosed as specific tumor types by WGTA. Tumor board reviews including WGTA agreed with almost all initial predictive molecular profile and histopathologic diagnoses. WGTA was a powerful tool to assign site/tissue of origin in MTUH. Current efforts focus on improving therapeutic predictive power and decreasing cost to enhance use of WGTA data as a routine clinical test.
Autism spectrum disorder (ASD) describes a complex and heterogenous group of neurodevelopmental disorders. Whole genome sequencing continues to shed light on the multifactorial etiology of ASD. Dysregulated transcriptional pathways have been implicated in neurodevelopmental disorders. Emerging evidence suggests that de novo POLR2A variants cause a newly described phenotype called 'Neurodevelopmental Disorder with Hypotonia and Variable Intellectual and Behavioral Abnormalities' (NEDHIB). The variable phenotype manifests with a spectrum of features; primarily early onset hypotonia and delay in developmental milestones. In this study, we investigate a patient with complex ASD involving epilepsy and strabismus. Whole genome sequencing of the proband-parent trio uncovered a novel de novo POLR2A variant (c.1367T>C, p. Val456Ala) in the proband. The variant appears deleterious according to in silico tools. We describe the phenotype in our patient, who is now 31 years old, draw connections between the previously reported phenotypes and further delineate this emerging neurodevelopmental phenotype. This study sheds new insights into this neurodevelopmental disorder, and more broadly, the genetic etiology of ASD.
American Journal of Medical Genetics Part A
Microphthalmia, anophthalmia, and coloboma (MAC) are a heterogeneous spectrum of anomalous eye development and degeneration with genetic and environmental etiologies. Structural and copy number variants of chromosome 13 have been implicated in MAC; however, the specific loci involved in disease pathogenesis have not been well-defined. Herein we report a newborn with syndromic degenerative anophthalmia and a complex de novo rearrangement of chromosome 13q. Long-read genome sequencing improved the resolution and clinical interpretation of a duplication-triplication/inversion-duplication (DUP-TRP/INV-DUP) and terminal deletion. Sequence features at the breakpoint junctions suggested microhomology-mediated break-induced replication (MMBIR) of the maternal chromosome as the origin. Comparing this rearrangement to previously reported copy number alterations in 13q, we refine a putative dosage-sensitive critical region for MAC that might provide new insights into its molecular etiology.
Proceedings of the National Academy of Sciences of the United States of America
Pink salmon (Oncorhynchus gorbuscha) adults are the smallest of the five Pacific salmon native to the western Pacific Ocean. Pink salmon are also the most abundant of these species and account for a large proportion of the commercial value of the salmon fishery worldwide. A two-year life history of pink salmon generates temporally isolated populations that spawn either in even-years or odd-years. To uncover the influence of this genetic isolation, reference genome assemblies were generated for each year-class and whole genome re-sequencing data was collected from salmon of both year-classes. The salmon were sampled from six Canadian rivers and one Japanese river. At multiple centromeres we identified peaks of Fst between year-classes that were millions of base-pairs long. The largest Fst peak was also associated with a million base-pair chromosomal polymorphism found in the odd-year genome near a centromere. These Fst peaks may be the result of a centromere drive or a combination of reduced recombination and genetic drift, and they could influence speciation. Other regions of the genome influenced by odd-year and even-year temporal isolation and tentatively under selection were mostly associated with genes related to immune function, organ development/maintenance, and behaviour.
Journal of Community Genetics
Genomic research is driving discovery for future population beneft. Limited evidence exists on immediate patient and health system impacts of research participation. This study uses real-world data and quasi-experimental matching to examine early-stage cost and health impacts of research-based genomic sequencing. British Columbia’s Personalized OncoGenomics (POG) single-arm program applies whole genome and transcriptome analysis (WGTA) to characterize genomic landscapes in advanced cancers. Our cohort includes POG patients enrolled between 2014 and 2015 and 1:1 genetic algorithm–matched usual care controls. We undertake a cost consequence analysis and estimate 1-year efects of WGTA on patient management, patient survival, and health system costs reported in 2015 Canadian dollars. WGTA costs are imputed and forecast using system of equations modeling. We use Kaplan-Meier survival analysis to explore survival diferences and inverse probability of censoring weighted linear regression to estimate mean 1-year survival times and costs. Non-parametric bootstrapping simulates sampling distributions and enables scenario analysis, revealing drivers of incremental costs, survival, and net monetary beneft for assumed willingness to pay thresholds. We identifed 230 POG patients and 230 matched controls for cohort inclusion. The mean period cost of research-funded WGTA was $26,211 (SD: $14,191). Sequencing costs declined rapidly, with WGTA forecasts hitting $13,741 in 2021. The incremental healthcare system efect (non-research expenditures) was $5203 (95% CI: 75, 10,424) compared to usual care. No overall survival diferences were observed, but outcome heterogeneity was present. POG patients receiving WGTA-informed treatment experienced incremental survival gains of 2.49 months (95% CI: 1.32, 3.64). Future cost consequences became favorable as WGTA cost drivers declined and WGTAinformed treatment rates improved to 60%. Our study demonstrates the ability of real-world data to support evaluations of only-in-research health technologies. We identify situations where precision oncology research initiatives may produce survival beneft at a cost that is within healthcare systems’ willingness to pay. This economic evidence informs the early-stage healthcare impacts of precision oncology research.
American Journal of Medical Genetics Part A
Monoallelic pathogenic variants in BICD2 are associated with autosomal dominant Spinal Muscular Atrophy Lower Extremity Predominant 2A and 2B (SMALED2A, SMALED2B). As part of the cellular vesicular transport, complex BICD2 facilitates the flow of constitutive secretory cargoes from the trans-Golgi network, and its dysfunction results in motor neuron loss. The reported phenotypes among patients with SMALED2A and SMALED2B range from a congenital onset disorder of respiratory insufficiency, arthrogryposis, and proximal or distal limb weakness to an adult-onset disorder of limb weakness and contractures. We report an infant with congenital respiratory insufficiency requiring mechanical ventilation, congenital diaphragmatic paralysis, decreased lung volume, and single finger camptodactyly. The infant displayed appropriate antigravity limb movements but had radiological, electrophysiological, and histopathological evidence of myopathy. Exome sequencing and long-read whole-genome sequencing detected a novel de novo BICD2 variant (NM_001003800.1:c.[1543G>A];[=]). This is predicted to encode p.(Glu515Lys); p.Glu515 is located in the coiled-coil 2 mutation hotspot. We hypothesize that this novel phenotype of diaphragmatic paralysis without clear appendicular muscle weakness and contractures of large joints is a presentation of BICD2-related disease.
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
Microbiology Resource Announcements
The draft genome sequence of Bacidia gigantensis, a lichenized fungus in the order Lecanorales, was sequenced directly from a herbarium specimen collected from the type locality at Sleeping Giant Provincial Park in Ontario, Canada. Using long-read sequencing on the Oxford Nanopore PromethION platform, we assembled a nearly complete genome sequence.
Journal of Virological Methods
The COVID-19 pandemic has highlighted the need for generic reagents and flexible systems in diagnostic testing. Magnetic bead-based nucleic acid extraction protocols using 96-well plates on open liquid handlers are readily amenable to meet this need. Here, one such approach is rigorously optimized to minimize cross-well contamination while maintaining sensitivity.
Autism Spectrum Disorder (ASD) is the most common neurodevelopmental disorder in children and shows high heritability. However, how inherited variants contribute to ASD in multiplex families remains unclear. Using whole-genome sequencing (WGS) in a family with three affected children, we identified multiple inherited DNA variants in ASD-associated genes and pathways (RELN, SHANK2, DLG1, SCN10A, KMT2C and ASH1L). All are shared among the three children, except ASH1L, which is only present in the most severely affected child. The compound heterozygous variants in RELN, and the maternally inherited variant in SHANK2, are considered to be major risk factors for ASD in this family. Both genes are involved in neuron activities, including synaptic functions and the GABAergic neurotransmission system, which are highly associated with ASD pathogenesis. DLG1 is also involved in synapse functions, and KMT2C and ASH1L are involved in chromatin organization. Our data suggest that multiple inherited rare variants, each with a subthreshold and/or variable effect, may converge to certain pathways and contribute quantitatively and additively, or alternatively act via a 2nd-hit or multiple-hits to render pathogenicity of ASD in this family. Additionally, this multiple-hits model further supports the quantitative trait hypothesis of a complex genetic, multifactorial etiology for the development of ASDs.
American Journal of Medical Genetics, 2021
Prenatal detection of structural variants of uncertain significance, including copy number variants (CNV), challenges genetic counseling, and creates ambiguity for expectant parents. In Duchenne muscular dystrophy, variant classification and phenotypic severity of CNVs are currently assessed by familial segregation, prediction of the effect on the reading frame, and precedent data. Delineation of pathogenicity by familial segregation is limited by time and suitable family members, whereas analytical tools can rapidly delineate potential consequences of variants. We identified a duplication of uncertain significance encompassing a portion of the dystrophin gene (DMD) in an unaffected mother and her male fetus. Using long-read whole genome sequencing and alignment of short reads, we rapidly defined the precise breakpoints of this variant in DMD and could provide timely counseling. The benign nature of the variant was substantiated, more slowly, by familial segregation to a healthy maternal uncle. We find long-read whole genome sequencing of clinical utility in a prenatal setting for accurate and rapid characterization of structural variants, specifically a duplication involving DMD.
The Journal of Clinical Investigation
A recent report found that rare predicted loss-of-function (pLOF) variants across 13 candidate genes in TLR3- and IRF7-dependent type I IFN pathways explain up to 3.5% of severe COVID-19 cases. We performed whole-exome or whole-genome sequencing of 1,864 COVID-19 cases (713 with severe and 1,151 with mild disease) and 15,033 ancestry-matched population controls across 4 independent COVID-19 biobanks. We tested whether rare pLOF variants in these 13 genes were associated with severe COVID-19. We identified only 1 rare pLOF mutation across these genes among 713 cases with severe COVID-19 and observed no enrichment of pLOFs in severe cases compared to population controls or mild COVID-19 cases. We found no evidence of association of rare LOF variants in the 13 candidate genes with severe COVID-19 outcomes.
Imagining ways to prevent or treat glioblastoma (GBM) has been hindered by a lack of understanding of its pathogenesis. Although overexpression of platelet derived growth factor with two A-chains (PDGF-AA) may be an early event, critical details of the core biology of GBM are lacking. For example, existing PDGF-driven models replicate its microscopic appearance, but not its genomic architecture. Here we report a model that overcomes this barrier to authenticity.
Using a method developed to establish neural stem cell cultures, we investigated the effects of PDGF-AA on subventricular zone (SVZ) cells, one of the putative cells of origin of GBM. We microdissected SVZ tissue from p53-null and wild-type adult mice, cultured cells in media supplemented with PDGF-AA, and assessed cell viability, proliferation, genome stability, and tumorigenicity.
Counterintuitive to its canonical role as a growth factor, we observed abrupt and massive cell death in PDGF-AA: wild-type cells did not survive, whereas a small fraction of null cells evaded apoptosis. Surviving null cells displayed attenuated proliferation accompanied by whole chromosome gains and losses. After approximately 100 days in PDGF-AA, cells suddenly proliferated rapidly, acquired growth factor independence, and became tumorigenic in immune-competent mice. Transformed cells had an oligodendrocyte precursor-like lineage marker profile, were resistant to platelet derived growth factor receptor alpha inhibition, and harbored highly abnormal karyotypes similar to human GBM.
This model associates genome instability in neural progenitor cells with chronic exposure to PDGF-AA and is the first to approximate the genomic landscape of human GBM and the first in which the earliest phases of the disease can be studied directly.
Clinical cancer research : an official journal of the American Association for Cancer Research, 2019
Gene fusions involving neuregulin 1 () have been noted in multiple cancer types and have potential therapeutic implications. Although varying results have been reported in other cancer types, the efficacy of the HER-family kinase inhibitor afatinib in the treatment of fusion-positive pancreatic ductal adenocarcinoma is not fully understood.
Cold Spring Harbor molecular case studies, 2019
Pancreatic neuroendocrine neoplasms (PanNENs) represent a minority of pancreatic neoplasms that exhibit variability in prognosis. Ongoing mutational analyses of PanNENs have found recurrent abnormalities in chromatin remodeling genes (e.g., and ), and mTOR pathway genes (e.g., , , and ), some of which have relevance to patients with related familial syndromes. Most recently, grade 3 PanNENs have been divided into two groups based on differentiation, creating a new group of well-differentiated grade 3 neuroendocrine tumors (PanNETs) that have had a limited whole-genome level characterization to date. In a patient with a metastatic well-differentiated grade 3 PanNET, our study utilized whole-genome sequencing of liver metastases for the comparative analysis and detection of single-nucleotide variants, insertions and deletions, structural variants, and copy-number variants, with their biologic relevance confirmed by RNA sequencing. We found that this tumor most notably exhibited a -disrupting fusion, showed a novel fusion, and lacked any somatic variants in , , and .
JAMA network open, 2019
A molecular diagnostic method that incorporates information about the transcriptional status of all genes across multiple tissue types can strengthen confidence in cancer diagnosis.
Nucleic acids research, 2019
Tissues used in pathology laboratories are typically stored in the form of formalin-fixed, paraffin-embedded (FFPE) samples. One important consideration in repurposing FFPE material for next generation sequencing (NGS) analysis is the sequencing artifacts that can arise from the significant damage to nucleic acids due to treatment with formalin, storage at room temperature and extraction. One such class of artifacts consists of chimeric reads that appear to be derived from non-contiguous portions of the genome. Here, we show that a major proportion of such chimeric reads align to both the 'Watson' and 'Crick' strands of the reference genome. We refer to these as strand-split artifact reads (SSARs). This study provides a conceptual framework for the mechanistic basis of the genesis of SSARs and other chimeric artifacts along with supporting experimental evidence, which have led to approaches to reduce the levels of such artifacts. We demonstrate that one of these approaches, involving S1 nuclease-mediated removal of single-stranded fragments and overhangs, also reduces sequence bias, base error rates, and false positive detection of copy number and single nucleotide variants. Finally, we describe an analytical approach for quantifying SSARs from NGS data.
The beluga whale is a cetacean that inhabits arctic and subarctic regions, and is the only living member of the genus . The genome of the beluga whale was determined using DNA sequencing approaches that employed both microfluidic partitioning library and non-partitioned library construction. The former allowed for the construction of a highly contiguous assembly with a scaffold N50 length of over 19 Mbp and total reconstruction of 2.32 Gbp. To aid our understanding of the functional elements, transcriptome data was also derived from brain, duodenum, heart, lung, spleen, and liver tissue. Assembled sequence and all of the underlying sequence data are available at the National Center for Biotechnology Information (NCBI) under the Bioproject accession number PRJNA360851A.
Bioinformatics (Oxford, England), 2013
White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20,356 bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies.
Genome research, 2009
We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines.