Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) SSAKE v3.8.2 Rene L. Warren, 2006-2014 email: rwarren at bcgsc.ca ------------------------------------------------------------ Campylobacter showae CC57C (colorectal cancer tumor isolate) ------------------------------------------------------------ The assembly of ~1M quality-trimmed (../tools/TQSfastq.py -f Assemble_1_R1.fastq -t 30 -c 100 -e 33) Campylobacter showae CC57C (BioProject/Accession:PRJNA189774/AOTD00000000) bacterial NGS reads (1 lane, Illumina MiSeq, PE151, 1.8M pairs sequenced) with SSAKE v3.8.2 in paired-end mode took 10m31s and 3.8GB RAM on a 12-core 48GB RAM CentOS5 machine (Benchmark with syrupy @https://github.com/jeetsukumaran/Syrupy) and yielded 215 contigs with N50=41kbp (151 scaffolds, N50=124kbp), and a reconstruction of 2.2Mbp. Quality-Trimmed CC57C Illumina MiSeq sequence data available here: ftp://ftp.bcgsc.ca/supplementary/SSAKE/CC57C_paired.fa and CC57C_unpaired.fa ------------------------------------------------------------ SSAKE ASSEMBLY PIPELINE: ------------------------------------------------------------ ./tools/TQSfastq.py -f Assemble_1_R1.fastq -t 30 -c 100 -e 33 ./tools/TQSfastq.py -f Assemble_1_R2.fastq -t 30 -c 100 -e 33 cat Assemble_1_R2.fastq_T30C100E33.trim.fa |perl -ne 'if(/^(\>\@\S+)/){print "$1b\n";}else{print;}' >Assemble_1_R2.fastq_T30C100E33.trimFIX.fa cat Assemble_1_R1.fastq_T30C100E33.trim.fa |perl -ne 'if(/^(\>\@\S+)/){print "$1a\n";}else{print;}' >Assemble_1_R1.fastq_T30C100E33.trimFIX.fa ./tools/makePairedOutput2UNEQUALfiles.pl Assemble_1_R1.fastq_T30C100E33.trimFIX.fa Assemble_1_R2.fastq_T30C100E33.trimFIX.fa 400 ./SSAKE -f CC57C_paired.fa -p 1 -g CC57C_unpaired.fa -m 20 -w 5 -b run2014 ./ssake_v3-8-2/tools/getStats.pl run2014.contigs ------------------------------------------------------------ SSAKE CONTIG SEQUENCE STATS ------------------------------------------------------------ Mean (nt),10476.89 Max (nt),119107 Min (nt),200 n,215 Stdev (nt),19021.54 Variance (nt),361818996.83 TrimmedMean (nt),2763.44 Median (nt),412.00 Sum (nt),2252531.00 N20,64969 N50,41436 N80,17465 Size Range,#bases,#sequences >=100000,119107,1 10000-100000,1951445,63 200-1000,36330,119 1000-10000,145649,32 ./ssake_v3-8-2/tools/makeFastaFileFromScaffolds.pl run2014.scaffolds ./ssake_v3-8-2/tools/getStats.pl run2014.scaffolds.fa ------------------------------------------------------------ SSAKE SCAFFOLD SEQUENCE STATS ------------------------------------------------------------ Mean (nt),14931.05 Max (nt),252560 Min (nt),200 n,151 Stdev (nt),42086.22 Variance (nt),1771250106.52 TrimmedMean (nt),349.95 Median (nt),288.00 Sum (nt),2254589.00 N20,227805 N50,124387 N80,59173 Size Range,#bases,#sequences >=100000,1350704,8 10000-100000,805378,18 200-1000,33684,113 1000-10000,63291,12