SSAKE

SSAKE is a genomics application for assembling millions of very short DNA sequences.

Project Description

The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets.

*Best performance is achieved by quality-trimming your reads before assembly


Enjoy SSAKE responsibly!

 

Summary

SSAKE is written in PERL and runs on Linux. SSAKE cycles through short sequence reads stored in a hash table and progressively searches through a prefix tree for the longest possible identical overlap between any two sequences. The algorithm was used to assemble 25-36 bp sequence reads from viral, bacterial and fungal genomes and on forty millions 25-mers simulated using the whole-genome shotgun (WGS) sequence data from the Sargasso sea metagenomics project. Considering the number of sequences to assemble, SSAKE is robust and tractable.

 

Documentation

René L Warren, Granger G Sutton, Steven JM Jones, Robert A Holt. 2007 (epub 2006 Dec 8). Assembling millions of short DNA sequences using SSAKE. bioinformatics. 23:500-501.

License

Copyright (c) 2006-2011 Canada's Michael Smith Genome Science Centre. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

 

Credits

René Warren, Granger Sutton, Steven Jones and Robert Holt

Current Release
SSAKE 3.8.1

Released Dec 24, 2013

fixed SSAKE for Perl >= 5.16.0, where deprecated getopts.pl has been removed. Thanks to Nicola Soranzo for sending the fix.
More about this release…

Download file Get SSAKE for all platforms
ssake_v3-8-1-tar.gz
If you are using Plone 3.2 or higher, you probably want to install this product with buildout. See our tutorial on installing add-on products with buildout for more information.

All Releases

Version Released Description Compatibility Licenses Status
3.8.1 Dec 24, 2013 fixed SSAKE for Perl >= 5.16.0, where deprecated getopts.pl has been removed. Thanks to Nicola Soranzo for sending the fix. More about this release… AFL final
3.8 May 03, 2011 v3.8+ is 30% faster than the previous release. Additional assembly control has been implemented (-w) that limits the generation of low depth of coverage contigs More about this release… GPL final
3.7 Nov 17, 2010 version 3.7 has improved support for seed-based assemblies, notably read-space restriction to that of the seed sequence (TASR behavior) More about this release… GPL final
3.6 Aug 25, 2010 v3.6 accepts an infinite number of sequence size libraries and offers preliminary support for paired-end Sanger reads. More about this release… GPL final
3.5 May 28, 2010 v3.5+ Uses mate pairs to help resolve repeats (preventing contig misassemblies) at run time and attempts to force-fill gaps with redundant sequences (improves contiguity and repeat resolution). More about this release… GPL final
3.4 Apr 14, 2009 Version 3.4 exploits paired-end reads to explore possible contig merges within scaffolds and allows users to track read position and individual base coverage for reads *fully embedded* within contigs. More about this release… GPL final
3.2.1 Mar 31, 2009 Optimized prefix tree implementation for faster assemblies with decreased memory usage. More about this release… AFL final
3.2 Dec 07, 2007 SSAKE 3.2 adjusts contig ends to find new extension possibilities. A bug that prevented SSAKE from exploring the entire read space for contig extensions seeded by shorter reads has been FIXED More about this release… GPL final
2.0 SSAKE can now handle error-rich [short sequence] data sets. For each seed sequence or contig being extended, SSAKE looks through the entire overlapping k-mer space and generates a consensus sequence from overhanging bases - It then extends contigs using that consensus, provided the bases it comprises pass user-defined thresholds. More about this release… GPL beta