ntCard

ntCard: a streaming algorithm for cardinality estimation in genomics data

Project Description

ntCard is a streaming algorithm for estimating the frequencies of k-mers in genomics datasets. At its core, ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to build a reduced representation multiplicity table describing the sample distribution. Finally, it uses a statistical model to reconstruct the population distribution from the sample distribution.

Visit our Github Repository for the latest version

 

Publications

  • Hamid Mohamadi, Hamza Khan, and Inanc Birol. ntCard: a streaming algorithm for cardinality estimation in genomics dataBioinformatics (2017) 33 (9): 1324-1330. 10.1093/bioinformatics/btw832

  • Hamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol. ntHash: recursive nucleotide hashingBioinformatics (2016) 32 (22): 3492-3494. doi:10.1093/bioinformatics/btw397

 

Current Release
ntCard 1.0.1

Released Jan 29, 2018

Change License to MIT License Fixing bugs and improving ops
More about this release…

Download file Get ntCard for all platforms
ntcard-1.0.1.tar.gz

All Releases

Version Released Description Compatibility Licenses Status
1.0.1 Jan 29, 2018 Change License to MIT License Fixing bugs and improving ops More about this release… BSD final
1.0.0 Jan 11, 2017 See ntCard GitHub page for details. More about this release… GPLv3 for non-commercial usage final