JAGuaR
Junction Alignment to Genome for Repositioning RNA-seq Reads
Current release
No stable release available yet.
Project Description
JAGuaR is a Python package for whole transcriptome shotgun sequencing. This includes transcriptome reference sequences suitable for RNA sequencing based on either Ensembl annotation or various other formats. Reads of various sizes from 25mer to 100mers or higher can accurately be aligned to exon and are split over as many exons as needed. These are subsequently repositioned to genome coordinates. This is a novel approach as compared to traditional alignment based methods of the aligning to a genome reference concatenated with exon-exon junction sequences. As read lengths get longer, it becomes less feasible to align to a reference built from one exon-exon junction region. If a read has a length that is greater than the size of an exon, it is not possible to create a unique exon-exon junction sequence. Typically, clipping of a sequence read that align to this location is required resulting in less accurate coverage calculation and potentially less reads to contribute to SNVs and indels. There are approximately 45,000 exons that are smaller than 75 bp. In all of these situations, JAGuaR provides a reference built on at least 3 exons allowing for a read to completely cover these smaller exons and split into two other exons, one on each end. SNP results from alignments using this method have been compared to other methods and match or exceed concordance to dbSNP. This indicates the effectiveness of this approach and currently the reference and format required for TCGA is being implemented. JAGuaR meets the standards, guidelines and best practices for RNA-Seq as set by the ENCODE Consortium (V1.0 – June 2011)
See Poster for more details.
