Journal
Bioinformatics Advances
Authors
Rashedul Islam, Misha Bilenky, Andrew P Weng, Joseph M Connors, Martin Hirst

Motivation: B-cells display remarkable diversity in producing B-cell receptors through recombination of immunoglobulin V-D-J genes. Somatic hypermutation of immunoglobulin heavy chain variable (IGHV) genes are used as a prognostic marker in B-cell malignancies. Clinically, IGHV mutation status is determined by targeted Sanger sequencing which is a resource intensive and low-throughput procedure. Here we describe a bioinformatic pipeline, CRIS (Complete Reconstruction of Immunoglobulin IGHV-D-JSequences) that uses RNA sequencing (RNA-seq) datasets to reconstruct IGHV-D-J sequences and determine IGHV somatic hypermutation status.

Results: CRIS extracts RNA-seq reads aligned to immunoglobulin gene (Ig) loci, performs assembly of Ig-transcripts and aligns the resulting contigs to reference Ig sequences to enumerate and classify somatic hypermutations in the IGHV gene sequence. CRIS improves on existing tools that infer the B-cell receptor (BCR) repertoire from RNA-seq data using a portion IGHV gene segment by de novo assembly. We show that the somatic hypermutation status identified by CRIS using the entire IGHV gene segment is highly concordant with clinical classification in three independent chronic lymphocytic leukemia patient cohorts.

Back to top