Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer composed of at least 2 molecular subtypes that differ in gene expression and distribution of mutations Recently application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease Here we provide a whole-genome-sequencing-based perspective of DLBCL mutational complexity by characterizing 40 de novo DLBCL cases and 13 DLBCL cell lines and combining these data with DNA copy number analysis and RNA-seq from an extended cohort of 96 cases Our analysis identified widespread genomic rearrangements including evidence for chromothripsis as well as the presence of known and novel fusion transcripts We uncovered new gene targets of recurrent somatic point mutations and genes that are targeted by focal somatic deletions in this disease We highlight the recurrence of germinal center B-cell-restricted mutations affecting genes that encode the S1P receptor and 2 small GTPases (GNA13 and GNAI2) that together converge on regulation of B-cell homing We further analyzed our data to approximate the relative temporal order in which some recurrent mutations were acquired and demonstrate that ongoing acquisition of mutations and intratumoral clonal heterogeneity are common features of DLBCL This study further improves our understanding of the processes and pathways involved in lymphomagenesis and some of the pathways mutated here may indicate new avenues for therapeutic intervention Discoveries from cancer genome sequencing have the potential to translate into advances in cancer prevention diagnostics prognostics treatment and basic biology Given the diversity of downstream applications cancer genome-sequencing studies need to be designed to best fulfil specific aims Knowledge of second-generation cancer genome-sequencing study design also facilitates assessment of the validity and importance of the rapidly growing number of published studies In this Review we focus on the practical application of second-generation sequencing technology (also known as next-generation sequencing) to cancer genomics and discuss how aspects of study design and methodological considerations - such as the size and composition of the discovery cohort - can be tailored to serve specific research aims Diffuse large B-cell lymphoma (DLBCL) accounts for 30% to 40% of newly diagnosed lymphomas and has an overall cure rate of approximately 60% Previously we observed FOXO1 mutations in non-Hodgkin lymphoma patient samples To explore the effects of FOXO1 mutations we assessed FOXO1 status in 279 DLBCL patient samples and 22 DLBCL-derived cell lines FOXO1 mutations were found in 8 6% (24/279) of DLBCL cases: 92 3% (24/26) of mutations were in the first exon 46 2% (12/26) were recurrent mutations affecting the N-terminal region and another 38 5% (10/26) affected the Forkhead DNA binding domain Recurrent mutations in the N-terminal region resulted in diminished T24 phosphorylation loss of interaction with 14-3-3 and nuclear retention FOXO1 mutation was associated with decreased overall survival in patients treated with rituximab cyclophosphamide doxorubicin vincristine and prednisone (P = 037) independent of cell of origin (COO) and the revised International Prognostic Index (R-IPI) This association was particularly evident (P = 003) in patients in the low-risk R-IPI categories The independent relationship of mutations in FOXO1 to survival transcending the prognostic influence of the R-IPI and COO indicates that FOXO1 mutation is a novel prognostic factor that plays an important role in DLBCL pathogenesis The drug fluorouracil (5-FU) is a widely used antimetabolite chemotherapy in the treatment of colorectal cancer The gene uridine monophosphate synthetase (UMPS) is thought to be primarily responsible for conversion of 5-FU to active anticancer metabolites in tumor cells Mutation or aberrant expression of UMPS may contribute to 5-FU resistance during treatment We undertook a characterization of UMPS mRNA isoform expression and sequence variation in 5-FU-resistant cell lines and drug-naive or -exposed primary and metastatic tumors We observed reciprocal differential expression of two UMPS isoforms in a colorectal cancer cell line with acquired 5-FU resistance relative to the 5-FU-sensitive cell line from which it was derived A novel isoform arising as a consequence of exon skipping was increased in abundance in resistant cells The underlying mechanism responsible for this shift in isoform expression was determined to be a heterozygous splice site mutation acquired in the resistant cell line We developed sequencing and expression assays to specifically detect alternative UMPS isoforms and used these to determine that UMPS was recurrently disrupted by mutations and aberrant splicing in additional 5-FU-resistant colorectal cancer cell lines and colorectal tumors The observed mutations aberrant splicing and downregulation of UMPS represent novel mechanisms for acquired 5-FU resistance in colorectal cancer Networks are typically visualized with force-based or spectral layouts These algorithms lack reproducibility and perceptual uniformity because they do not use a node coordinate system The layouts can be difficult to interpret and are unsuitable for assessing differences in networks To address these issues we introduce hive plots (http://www hiveplot com) for generating informative quantitative and comparable network layouts Hive plots depict network structure transparently are simple to understand and can be easily tuned to identify patterns of interest The method is computationally straightforward scales well and is amenable to a plugin for existing tools Oligodendroglioma is characterized by unique clinical pathological and genetic features Recurrent losses of chromosomes 1p and 19q are strongly associated with this brain cancer but knowledge of the identity and function of the genes affected by these alterations is limited We performed exome sequencing on a discovery set of 16 oligodendrogliomas with 1p/19q co-deletion to identify new molecular features at base-pair resolution As anticipated there was a high rate of IDH mutations: all cases had mutations in either IDH1 (14/16) or IDH2 (2/16) In addition we discovered somatic mutations and insertions/deletions in the CIC gene on chromosome 19q13 2 in 13/16 tumours These discovery set mutations were validated by deep sequencing of 13 additional tumours which revealed seven others with CIC mutations thus bringing the overall mutation rate in oligodendrogliomas in this study to 20/29 (69%) In contrast deep sequencing of astrocytomas and oligoastrocytomas without 1p/19q loss revealed that CIC alterations were otherwise rare (1/60 2%) Of the 21 non-synonymous somatic mutations in 20 CIC-mutant oligodendrogliomas nine were in exon 5 within an annotated DNA-interacting domain and three were in exon 20 within an annotated protein-interacting domain The remaining nine were found in other exons and frequently included truncations CIC mutations were highly associated with oligodendroglioma histology 1p/19q co-deletion and IDH1/2 mutation (p < 0 001) Although we observed no differences in the clinical outcomes of CIC mutant versus wild-type tumours in a background of 1p/19q co-deletion hemizygous CIC mutations are likely important We hypothesize that the mutant CIC on the single retained 19q allele is linked to the pathogenesis of oligodendrogliomas with IDH mutation Our detailed study of genetic aberrations in oligodendroglioma suggests a functional interaction between CIC mutation IDH1/2 mutation and 1p/19q co-deletion Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs) Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL case to identify genes with mutations in B-cell NHL We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations Genes with roles in histone modification were frequent targets of somatic mutation For example 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2 which encodes a histone methyltransferase and 11 4% and 13 4% of DLBCL and FL cases respectively had mutations in MEF2B a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis To analyze the relationship between antisense transcription and alternative splicing we developed a computational approach for the detection of antisense-correlated exon splicing events using Affymetrix exon array data Our analysis of expression data from 176 lymphoblastoid cell lines revealed that the majority of expressed sense-antisense genes exhibited alternative splicing events that were correlated to the expression of the antisense gene Most of these events occurred in areas of sense-antisense (SAS) gene overlap which were significantly enriched in both exons and nucleosome occupancy levels relative to nonoverlapping regions of the same genes Nucleosome occupancy was highly correlated with Pol II abundance across overlapping regions and with concomitant increases in local alternative exon usage These results are consistent with an antisense transcription-mediated mechanism of splicing regulation in normal human cells A comparison of the prevalence of antisense-correlated splicing events between individuals of Mormon versus African descent revealed population-specific events that may indicate the continued evolution of new SAS loci Furthermore the presence of antisense transcription was correlated to alternative splicing across multiple metazoan species suggesting that it may be a conserved mechanism contributing to splicing regulation Next generation sequencing has brought epigenomic studies to the forefront of current research The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago With early proof of concept studies published the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms We will discuss these in terms of library preparation sequence platforms and analysis techniques In alternative expression analysis by sequencing (ALEXA-seq) we developed a method to analyze massively parallel RNA sequence data to catalog transcripts and assess differential and alternative expression of known and predicted mRNA isoforms in cells and tissues As proof of principle we used the approach to compare fluorouracil-resistant and -nonresistant human colorectal cancer cell lines We assessed the sensitivity and specificity of the approach by comparison to exon tiling and splicing microarrays and validated the results with reverse transcription-PCR quantitative PCR and Sanger sequencing We observed global disruption of splicing in fluorouracil-resistant cells characterized by expression of new mRNA isoforms resulting from exon skipping alternative splice site usage and intron retention Alternative expression annotation databases source code a data viewer and other resources to facilitate analysis are available at http://www alexaplatform org/alexa_seq/ Follicular lymphoma (FL) and the GCB subtype of diffuse large B-cell lymphoma (DLBCL) derive from germinal center B cells Targeted resequencing studies have revealed mutations in various genes encoding proteins in the NF-kappaB pathway that contribute to the activated B-cell (ABC) DLBCL subtype but thus far few GCB-specific mutations have been identified Here we report recurrent somatic mutations affecting the polycomb-group oncogene EZH2 which encodes a histone methyltransferase responsible for trimethylating Lys27 of histone H3 (H3K27) After the recent discovery of mutations in KDM6A (UTX) which encodes the histone H3K27me3 demethylase UTX in several cancer types EZH2 is the second histone methyltransferase gene found to be mutated in cancer These mutations which result in the replacement of a single tyrosine in the SET domain of the EZH2 protein (Tyr641) occur in 21 7% of GCB DLBCLs and 7 2% of FLs and are absent from ABC DLBCLs Our data are consistent with the notion that EZH2 proteins with mutant Tyr641 have reduced enzymatic activity in vitro Transcriptome analysis has been a key area of biological inquiry for decades Over the years research in the field has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes Our tool is effective in displaying variation in genome structure and generally any other kind of positional relationships between genomic intervals Such data are routinely produced by sequence alignments hybridization arrays genome mapping and genotyping studies Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons which encode the position size and orientation of related genomic elements Circos is capable of displaying data as scatter line and histogram plots heat maps tiles connectors and text Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files which can be easily generated by automated tools making Circos suitable for rapid deployment in data analysis and reporting pipelines We describe a new method Tag-seq which employs ultra high-throughput sequencing of 21 base pair cDNA tags for sensitive and cost-effective gene expression profiling We compared Tag-seq data to LongSAGE data and observed improved representation of several classes of rare transcripts including transcription factors antisense transcripts and intronic sequences the latter possibly representing novel exons or genes We observed increases in the diversity abundance and dynamic range of such rare transcripts and took advantage of the greater dynamic range of expression to identify in cancers and normal libraries altered expression ratios of alternative transcript isoforms The strand-specific information of Tag-seq reads further allowed us to detect altered expression ratios of sense and antisense (S-AS) transcripts between cancer and normal libraries S-AS transcripts were enriched in known cancer genes while transcript isoforms were enriched in miRNA targeting sites We found that transcript abundance had a stronger GC-bias in LongSAGE than Tag-seq such that AT-rich tags were less abundant than GC-rich tags in LongSAGE Tag-seq also performed better in gene discovery identifying > 98% of genes detected by LongSAGE and profiling a distinct subset of the transcriptome characterized by AT-rich genes which was expressed at levels below those detectable by LongSAGE Overall Tag-seq is sensitive to rare transcripts has less sequence composition bias relative to LongSAGE and allows differential expression analysis for a greater range of transcripts including transcripts encoding important regulatory molecules Diagnosing constitutional pathogenic copy number variants requires measuring the submicroscopic segmental chromosomal imbalance The Affymetrix GeneChip mapping array has been used to measure duplication and deletion of genetic material in DNA samples This protocol describes the measurement and analysis processes specifically the computational analyses that are involved in identifying pathogenic copy number variants Changes to covalent modifications of DNA and histones can be induced via environmental stimuli such as nutrients hormones and drugs These changes can be both transient and heritable in nature and provide a framework in which to investigate how environment and lifestyle choices impact disease susceptibility and progression Furthermore these modifications are central to chromatin dynamics and as such play key roles in many biological processes involving chromatin such as DNA replication and repair transcription and development In this review we provide an overview of recent advances in our understanding of the roles that DNA and histone modification play in the onset and progression of human disease A new generation of sequencing technologies from Illumina/Solexa ABI/SOLiD 454/Roche and Helicos has provided unprecedented opportunities for high-throughput functional genomic research To date these technologies have been applied in a variety of contexts including whole-genome sequencing targeted resequencing discovery of transcription factor binding sites and noncoding RNA expression profiling This review discusses applications of next-generation sequencing technologies in functional genomics research and highlights the transforming potential these technologies offer Large-scale copy number variants (CNVs) have recently been recognized to play a role in human genome variation and disease Approaches for analysis of CNVs in small samples such as microdissected tissues can be confounded by limited amounts of material To facilitate analyses of such samples whole genome amplification (WGA) techniques were developed In this study we explored the impact of Phi29 multiple-strand displacement amplification on detection of CNVs using oligonucleotide arrays We extracted DNA from fresh frozen lymph node samples and used this for amplification and analysis on the Affymetrix Mapping 500k SNP array platform We demonstrated that the WGA procedure introduces hundreds of potentially confounding CNV artifacts that can obscure detection of bona fide variants Our analysis indicates that many artifacts are reproducible and may correlate with proximity to chromosome ends and GC content Pair-wise comparison of amplified products considerably reduced the number of apparent artifacts and partially restored the ability to detect real CNVs Our results suggest WGA material may be appropriate for copy number analysis when amplified samples are compared to similarly amplified samples and that only the CNVs with the greatest significance values detected by such comparisons are likely to be representative of the unamplified samples Genome rearrangements have long been recognized as hallmarks of human tumors and have been used to diagnose cancer Techniques used to detect genome rearrangements have evolved from microscopic examinations of chromosomes to the more recent microarray-based approaches The availability of next-generation sequencing technologies may provide a means for scrutinizing entire cancer genomes and transcriptomes at unparalleled resolution Here we review the methods that have been used to detect genome rearrangements and discuss the scope and limitations of each approach We end with a discussion of the potential that next-generation sequencing technologies may offer to the field Restriction digest fingerprinting is a common method for characterizing large insert genomic clones e g bacterial artificial chromosome (BAC) P1 artificial chromosome (PAC) and Fosmid clones This clone fingerprinting method has been widely applied in the construction of clone-based physical maps which have been used as positional cloning resources as well as to support directed and genome-wide sequencing efforts This unit describes a robust large-scale procedure for generation of agarose gel-based clone fingerprints from BAC clones MicroRNAs (miRNAs) are emerging as important albeit poorly characterized regulators of biological processes Key to further elucidation of their roles is the generation of more complete lists of their numbers and expression changes in different cell states Here we report a new method for surveying the expression of small RNAs including microRNAs using Illumina sequencing technology We also present a set of methods for annotating sequences deriving from known miRNAs identifying variability in mature miRNA sequences and identifying sequences belonging to previously unidentified miRNA genes Application of this approach to RNA from human embryonic stem cells obtained before and after their differentiation into embryoid bodies revealed the sequences and expression levels of 334 known plus 104 novel miRNA genes One hundred seventy-one known and 23 novel microRNA sequences exhibited significant expression differences between these two developmental states Owing to the increased number of sequence reads these libraries represent the deepest miRNA sampling to date spanning nearly six orders of magnitude of expression The predicted targets of those miRNAs enriched in either sample shared common features Included among the high-ranked predicted gene targets are those implicated in differentiation cell cycle control programmed cell death and transcriptional regulation To facilitate discovery of novel human embryonic stem cell (ESC) transcripts we generated 2 5 million LongSAGE tags from 9 human ESC lines Analysis of this data revealed that ESCs express proportionately more RNA binding proteins compared with terminally differentiated cells and identified novel ESC transcripts at least one of which may represent a marker of the pluripotent state We describe the details of a serial analysis of gene expression (SAGE) library construction and analysis platform that has enabled the generation of > 298 high-quality SAGE libraries and > 30 million SAGE tags primarily from sub-microgram amounts of total RNA purified from samples acquired by microdissection Several RNA isolation methods were used to handle the diversity of samples processed and various measures were applied to minimize ditag PCR carryover contamination Modifications in the SAGE protocol resulted in improved cloning and DNA sequencing efficiencies Bioinformatic measures to automatically assess DNA sequencing results were implemented to analyze the integrity of ditag structure linker or cross-species ditag contamination and yield of high-quality tags per sequence read Our analysis of singleton tag errors resulted in a method for correcting such errors to statistically determine tag accuracy From the libraries generated we produced an essentially complete mapping of reliable 21-base-pair tags to the mouse reference genome sequence for a meta-library of approximately 5 million tags Our analyses led us to reject the commonly held notion that duplicate ditags are artifacts Rather than the usual practice of discarding such tags we conclude that they should be retained to avoid introducing bias into the results and thereby maintain the quantitative nature of the data which is a major theoretical advantage of SAGE as a tool for global transcriptional profiling The cause of mental retardation in one-third to one-half of all affected individuals is unknown Microscopically detectable chromosomal abnormalities are the most frequently recognized cause but gain or loss of chromosomal segments that are too small to be seen by conventional cytogenetic analysis has been found to be another important cause Array-based methods offer a practical means of performing a high-resolution survey of the entire genome for submicroscopic copy-number variants We studied 100 children with idiopathic mental retardation and normal results of standard chromosomal analysis by use of whole-genome sampling analysis with Affymetrix GeneChip Human Mapping 100K arrays We found de novo deletions as small as 178 kb in eight cases de novo duplications as small as 1 1 Mb in two cases and unsuspected mosaic trisomy 9 in another case This technology can detect at least twice as many potentially pathogenic de novo copy-number variants as conventional cytogenetic analysis can in people with mental retardation We present the results of a simple statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment We analyse five gene expression profiling methods: Affymetrix GeneChip Long Serial Analysis of Gene Expression (LongSAGE) LongSAGELite 'Classic' Massively Parallel Signature Sequencing (MPSS) and 'Signature' MPSS We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and if detected the level of gene expression measured LongSAGE has the least bias while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5 0 RMA or GC-RMA) The bias in the Affymetrix data primarily impacts genes expressed at lower levels Despite the larger sampling of the MPSS library SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison) We analyzed 8 55 million LongSAGE tags generated from 72 libraries Each LongSAGE library was prepared from a different mouse tissue Analysis of the data revealed extensive overlap with existing gene data sets and evidence for the existence of approximately 24 000 previously undescribed genomic loci The visual cortex pancreas mammary gland preimplantation embryo and placenta contain the largest number of differentially expressed transcripts 25% of which are previously undescribed loci The Mammalian Gene Collection (MGC) consortium (http://mgc nci nih gov) seeks to establish publicly available collections of full-ORF cDNAs for several organisms of significance to biomedical research including human To date over 15 200 human cDNA clones containing full-length open reading frames (ORFs) have been identified via systematic expressed sequence tag (EST) analysis of a diverse set of cDNA libraries however further systematic EST analysis is no longer an efficient method for identifying new cDNAs As part of our involvement in the MGC program we have developed a scalable method for targeted recovery of cDNA clones to facilitate recovery of genes absent from the MGC collection First cDNA is synthesized from various RNAs followed by polymerase chain reaction (PCR) amplification of transcripts in 96-well plates using gene-specific primer pairs flanking the ORFs Amplicons are cloned into a sequencing vector and full-length sequences are obtained Sequences are processed and assembled using Phred and Phrap and analyzed using Consed and a number of bioinformatics methods we have developed Sequences are compared with the Reference Sequence (RefSeq) database and validation of sequence discrepancies is attempted using other sequence databases including dbEST and dbSNP Clones with identical sequence to RefSeq or containing only validated changes will become part of the MGC human gene collection Clones containing novel splice variants or polymorphisms have also been identified Our approach to clone recovery applied at large scale has the potential to recover many and possibly most of the genes absent from the MGC collection Using the human bacterial artificial chromosome (BAC) fingerprint-based physical map genome sequence assembly and BAC end sequences we have generated a fingerprint-validated set of 32 855 BAC clones spanning the human genome The clone set provides coverage for at least 98% of the human fingerprint map 99% of the current assembled sequence and has an effective resolving power of 79 kb We have made the clone set publicly available anticipating that it will generally facilitate FISH or array-CGH-based identification and characterization of chromosomal alterations relevant to disease Fingerprinted clone physical maps have proven useful in various applications supporting both whole-genome and region-specific DNA sequencing as well as gene cloning studies Fingerprint maps have been generated for several genomes including those of human mouse rat the nematodes Caenorhabditis elegans and Caenorhabditis briggsae Arabidopsis thaliana and rice Fingerprint maps of other genomes including those of fungi bacteria poplar and the cow are being generated The increasing use of fingerprint maps in genomic research has spawned a need in the research community for intuitive computer tools that facilitate viewing of the maps and the underlying fingerprint data In this report we describe a new Java-based application called iCE (Internet Contig Explorer) that has been designed to provide views of fingerprint maps and associated data Users can search for and display individual clones contigs clone fingerprints clone insert sizes and markers Users can also load into the software lists of particular clones of interest and view their fingerprints iCE is being used at our Genome Centre to offer up to the research community views of the mouse rat bovine C briggsae and several fungal genome bacterial artificial chromosome (BAC) fingerprint maps we have either completed or are currently constructing We are also using iCE as part of the Rat Genome Sequencing Project to manage our provision of rat BAC clones for sequencing at the Human Genome Sequencing Center at the Baylor College of Medicine Here we describe software tools for the automated detection of DNA restriction fragments resolved on agarose fingerprinting gels We present a mathematical model for the location and shape of the restriction fragments as a function of fragment size with model parameters determined empirically from " marker" lanes containing molecular size standards Automated identification of restriction fragments involves several steps including: image preprocessing to put the data in a form consistent with a linear model marker lane analysis for determination of the model parameters and data lane analysis a procedure for detecting restriction fragment multiplets while simultaneously determining the amplitude curve that describes restriction fragment amplitude as a function of mobility In validation experiments conducted on fingerprinted and sequenced Bacterial Artificial Chromosome (BAC) clones sensitivity and specificity of restriction fragment identification exceeded 96% on restriction fragments ranging in size from 600 base pairs (bp) to 30 000 bp The integrated suite of software tools written in MATLAB and collectively called BandLeader is in use at the BC Cancer Agency Genome Sciences Centre (GSC) and the Washington University Genome Sequencing Center and has been provided to the Wellcome Trust Sanger Institute and the Whitehead Institute Employed in a production mode at the GSC BandLeader has been used to perform automated restriction fragment identification for more than 850 000 BAC clones for mouse rat bovine and poplar fingerprint mapping projects Programmed cell death (PCD) important in normal animal physiology and disease can be divided into at least two morphological subtypes including type I or apoptosis and type II or autophagic cell death While many molecules involved in apoptosis have been discovered and studied intensively during the past decade autophagic cell death is not well characterized molecularly Here we report the first comprehensive identification of molecules associated with autophagic cell death during normal metazoan development in vivo During Drosophila metamorphosis the larval salivary glands undergo autophagic cell death regulated by a hormonally induced transcriptional cascade To identify and analyze the genes expressed we examined wild-type patterns of gene expression in three predeath stages of Drosophila salivary glands using serial analysis of gene expression (SAGE) [7] 1244 transcripts including genes involved in autophagy defense response cytoskeleton remodeling noncaspase proteolysis and apoptosis were expressed differentially prior to salivary gland death Mutant expression analysis indicated that several of these genes were regulated by E93 a gene required for salivary gland cell death Our analyses strongly support both the emerging notion that there is overlap with respect to the molecules involved in autophagic cell death and apoptosis and that there are important differences The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones which then were sequenced to high accuracy The MGC has currently sequenced and verified the full ORF for a nonredundant set of > 9 000 human and > 6 000 mouse genes Candidate full-ORF clones for an additional 7 800 human and 3 500 mouse genes also have been identified All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc nci nih gov) Gene expression in a developmentally arrested long-lived dauer population of Caenorhabditis elegans was compared with a nondauer (mixed-stage) population by using serial analysis of gene expression (SAGE) Dauer (152 314) and nondauer (148 324) SAGE tags identified 11 130 of the predicted 19 100 C elegans genes Genes implicated previously in longevity were expressed abundantly in the dauer library and new genes potentially important in dauer biology were discovered Two thousand six hundred eighteen genes were detected only in the nondauer population whereas 2016 genes were detected only in the dauer showing that dauer larvae show a surprisingly complex gene expression profile Evidence for differentially expressed gene transcript isoforms was obtained for 162 genes H1 histones were differentially expressed raising the possibility of alternative chromatin packaging The most abundant tag from dauer larvae (20-fold more abundant than in the nondauer profile) corresponds to a new unpredicted gene we have named tts-1 (transcribed telomere-like sequence) which may interact with telomeres or telomere-associated proteins Abundant antisense mitochondrial transcripts (2% of all tags) suggest the existence of an antisense-mediated regulatory mechanism in C elegans mitochondria In addition to providing a robust tool for gene expression studies the SAGE approach already has provided the advantage of new gene/transcript discovery in a metazoan