SSAKE is a de novo Genome Assembler for Short DNA Sequence Reads
SSAKE is a de novo assembler for short DNA sequence reads. It is designed to help leverage the information from short sequences reads by assembling them into contigs and scaffolds that can be used to characterize novel sequencing targets. SSAKE is the first published algorithm for genome assembly with short DNA sequences. Algorithms of SSAKE are the core of many genomics applications (eg. VCAKE, QSRA, SHARCGS, SSPACE, JR-Assembler) and their design continues to inspire new-generation assemblers1 (eg. JR-Assembler, PNAS 2013). Applications of SSAKE extend beyond genome assembly and the technology was applied to profiling T-cell metagenomes, targeted de novo assembly, HLA typing and was key to the discovery of Fusobacterium in colon cancer.
*Best performance is achieved by quality-trimming your reads before assembly (refer to the tools folder and SSAKE.readme/SSAKE.pdf)
Enjoy SSAKE responsibly!
About the author: www.renewarren.ca
SSAKE is written in PERL and runs on Linux. SSAKE cycles through short sequence reads stored in a hash table and progressively searches through a prefix tree for extension candidates. The algorithm assembled 25 to 250 bp sequence reads from viral, bacterial and fungal genomes. SSAKE is lightweight, simple to setup & run and robust.
Experimental, NGS test data
An experimental, quality-trimmed, Illumina MiSeq sequence dataset (PE150, Colorectal cancer tumor isolate bacteria C.showae CC57C [PRJNA189774]) is available for testing SSAKE : ftp://ftp.bcgsc.ca/supplementary/SSAKE
To download and assemble, simply execute the following script from the ./test repository included with SSAKE v3.8.2:
If you use the data in your research, please cite:
Warren RL, Freeman DJ, Pleasance S, Watson P, Moore RA, Cochrane K, Allen-Vercoe E, Holt RA. 2013. Co-occurrence of anaerobic bacteria in colorectal carcinomas. Microbiome 1:16
If you use SSAKE in your research, please cite:
Warren RL, Sutton GG, Jones SJM, Holt RA. 2007 (epub 2006 Dec 8). Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23:500
Copyright (c) 2006-2014 Canada's Michael Smith Genome Science Centre. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Released Apr 25, 2014
This release includes an option (-j) for adjusting the kmer length when running SSAKE in TASR mode (-s).
A recent Illumina MiSeq dataset is available for testing SSAKE's performance: ftp://ftp.bcgsc.ca/supplementary/SSAKE
More about this release…
- Get SSAKE for all platforms
- If you are using Plone 3.2 or higher, you probably want to install this product with buildout. See our tutorial on installing add-on products with buildout for more information.
|3.8.2||Apr 25, 2014||This release includes an option (-j) for adjusting the kmer length when running SSAKE in TASR mode (-s). A recent Illumina MiSeq dataset is available for testing SSAKE's performance: ftp://ftp.bcgsc.ca/supplementary/SSAKE More about this release…||GPL||final|
|3.8.1||Dec 24, 2013||fixed SSAKE for Perl >= 5.16.0, where deprecated getopts.pl has been removed. Thanks to Nicola Soranzo for sending the fix. More about this release…||GPL||final|
|3.8||May 03, 2011||v3.8+ is 30% faster than the previous release. Additional assembly control has been implemented (-w) that limits the generation of low depth of coverage contigs More about this release…||GPL||final|
|3.7||Nov 17, 2010||version 3.7 has improved support for seed-based assemblies, notably read-space restriction to that of the seed sequence (TASR behavior) More about this release…||GPL||final|
|3.6||Aug 25, 2010||v3.6 accepts an infinite number of sequence size libraries and offers preliminary support for paired-end Sanger reads. More about this release…||GPL||final|
|3.5||May 28, 2010||v3.5+ Uses mate pairs to help resolve repeats (preventing contig misassemblies) at run time and attempts to force-fill gaps with redundant sequences (improves contiguity and repeat resolution). More about this release…||GPL||final|
|3.4||Apr 14, 2009||Version 3.4 exploits paired-end reads to explore possible contig merges within scaffolds and allows users to track read position and individual base coverage for reads *fully embedded* within contigs. More about this release…||GPL||final|
|3.2.1||Mar 31, 2009||Optimized prefix tree implementation for faster assemblies with decreased memory usage. More about this release…||GPL||final|
|3.2||Dec 07, 2007||SSAKE 3.2 adjusts contig ends to find new extension possibilities. A bug that prevented SSAKE from exploring the entire read space for contig extensions seeded by shorter reads has been FIXED More about this release…||GPL||final|
|2.0||SSAKE can now handle error-rich [short sequence] data sets. For each seed sequence or contig being extended, SSAKE looks through the entire overlapping k-mer space and generates a consensus sequence from overhanging bases - It then extends contigs using that consensus, provided the bases it comprises pass user-defined thresholds. More about this release…||GPL||beta|