Bioinformatics (Oxford, England), 2013
White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20,356 bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies.
The beluga whale is a cetacean that inhabits arctic and subarctic regions, and is the only living member of the genus . The genome of the beluga whale was determined using DNA sequencing approaches that employed both microfluidic partitioning library and non-partitioned library construction. The former allowed for the construction of a highly contiguous assembly with a scaffold N50 length of over 19 Mbp and total reconstruction of 2.32 Gbp. To aid our understanding of the functional elements, transcriptome data was also derived from brain, duodenum, heart, lung, spleen, and liver tissue. Assembled sequence and all of the underlying sequence data are available at the National Center for Biotechnology Information (NCBI) under the Bioproject accession number PRJNA360851A.
Genome biology, 2010
Adenocarcinomas of the tongue are rare and represent the minority (20 to 25%) of salivary gland tumors affecting the tongue. We investigated the utility of massively parallel sequencing to characterize an adenocarcinoma of the tongue, before and after treatment.
Genome research, 2009
We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines.
Nucleic acids research, 2019
Tissues used in pathology laboratories are typically stored in the form of formalin-fixed, paraffin-embedded (FFPE) samples. One important consideration in repurposing FFPE material for next generation sequencing (NGS) analysis is the sequencing artifacts that can arise from the significant damage to nucleic acids due to treatment with formalin, storage at room temperature and extraction. One such class of artifacts consists of chimeric reads that appear to be derived from non-contiguous portions of the genome. Here, we show that a major proportion of such chimeric reads align to both the 'Watson' and 'Crick' strands of the reference genome. We refer to these as strand-split artifact reads (SSARs). This study provides a conceptual framework for the mechanistic basis of the genesis of SSARs and other chimeric artifacts along with supporting experimental evidence, which have led to approaches to reduce the levels of such artifacts. We demonstrate that one of these approaches, involving S1 nuclease-mediated removal of single-stranded fragments and overhangs, also reduces sequence bias, base error rates, and false positive detection of copy number and single nucleotide variants. Finally, we describe an analytical approach for quantifying SSARs from NGS data.
Proceedings of the National Academy of Sciences of the United States of America, 2019
Glioblastoma multiforme (GBM) is the most deadly brain tumor, and currently lacks effective treatment options. Brain tumor-initiating cells (BTICs) and orthotopic xenografts are widely used in investigating GBM biology and new therapies for this aggressive disease. However, the genomic characteristics and molecular resemblance of these models to GBM tumors remain undetermined. We used massively parallel sequencing technology to decode the genomes and transcriptomes of BTICs and xenografts and their matched tumors in order to delineate the potential impacts of the distinct growth environments. Using data generated from whole-genome sequencing of 201 samples and RNA sequencing of 118 samples, we show that BTICs and xenografts resemble their parental tumor at the genomic level but differ at the mRNA expression and epigenomic levels, likely due to the different growth environment for each sample type. These findings suggest that a comprehensive genomic understanding of in vitro and in vivo GBM model systems is crucial for interpreting data from drug screens, and can help control for biases introduced by cell-culture conditions and the microenvironment in mouse models. We also found that lack of expression in pretreated GBM is linked to hypermutation, which in turn contributes to increased genomic heterogeneity and requires new strategies for GBM treatment.
Cold Spring Harbor molecular case studies, 2019
Pancreatic neuroendocrine neoplasms (PanNENs) represent a minority of pancreatic neoplasms that exhibit variability in prognosis. Ongoing mutational analyses of PanNENs have found recurrent abnormalities in chromatin remodeling genes (e.g., and ), and mTOR pathway genes (e.g., , , and ), some of which have relevance to patients with related familial syndromes. Most recently, grade 3 PanNENs have been divided into two groups based on differentiation, creating a new group of well-differentiated grade 3 neuroendocrine tumors (PanNETs) that have had a limited whole-genome level characterization to date. In a patient with a metastatic well-differentiated grade 3 PanNET, our study utilized whole-genome sequencing of liver metastases for the comparative analysis and detection of single-nucleotide variants, insertions and deletions, structural variants, and copy-number variants, with their biologic relevance confirmed by RNA sequencing. We found that this tumor most notably exhibited a -disrupting fusion, showed a novel fusion, and lacked any somatic variants in , , and .
Clinical cancer research : an official journal of the American Association for Cancer Research, 2019
Gene fusions involving neuregulin 1 () have been noted in multiple cancer types and have potential therapeutic implications. Although varying results have been reported in other cancer types, the efficacy of the HER-family kinase inhibitor afatinib in the treatment of fusion-positive pancreatic ductal adenocarcinoma is not fully understood.
JAMA network open, 2019
A molecular diagnostic method that incorporates information about the transcriptional status of all genes across multiple tissue types can strengthen confidence in cancer diagnosis.