From jkb at sanger.ac.uk Thu Dec 4 04:02:53 2008 From: jkb at sanger.ac.uk (James Bonfield) Date: Thu Dec 4 04:03:01 2008 Subject: [Ssrformat] illumina2srf tool and io_lib-1.11.5 Message-ID: <20081204120252.GN17529@sanger.ac.uk> Hello all, I just discovered a serious bug in illumina2srf, while improving support for the qcal format (well fixing a bug infact so that qcal data is stored in binary phred-scale format rather than as a fastq string with +64 for the solexa-scaled values). If you use *both* -qf and -qr then the quality lines from the fastq files were pasted together to form a single quality string, but with an errant newline character in the middle. Consequently the calibrated confidence values for the reverse read were shifted by one base. Uncalibrated data was not affected, and neither was qcal format data (except for the fact that until now it also still had +64 to all values). Using a single -qf file consisting of the concatenated forward and reverse reads, as used locally at Sanger until a week or two ago, worked just fine and is part of the reason I didn't spot this error. My apologies for the bug. To download the fixed copy see: https://sourceforge.net/project/showfiles.php?group_id=100316&package_id=108243&release_id=644805 There are some other minor changes in the release too (see the release notes), but the main qcal and fastq changes to illumina2srf are: http://staden.cvs.sourceforge.net/viewvc/staden/staden/src/io_lib/progs/solexa2srf.c?r1=1.48&r2=1.45 James -- James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.