ntCard

ntCard: a streaming algorithm for cardinality estimation in genomics data

Project Description

ntCard is a streaming algorithm for estimating the frequencies of k-mers in genomics datasets. At its core, ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to build a reduced representation multiplicity table describing the sample distribution. Finally, it uses a statistical model to reconstruct the population distribution from the sample distribution.

Visit our Github Repository for the latest version

 

Publications

  • Hamid Mohamadi, Hamza Khan, and Inanc Birol. ntCard: a streaming algorithm for cardinality estimation in genomics dataBioinformatics (2017). 10.1093/bioinformatics/btw832

  • Hamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol. ntHash: recursive nucleotide hashingBioinformatics (2016) 32 (22): 3492-3494. doi:10.1093/bioinformatics/btw397

 

Current Release
ntCard 1.0.0

Released Jan 11, 2017

See ntCard GitHub page for details.
More about this release…

Download file Get ntCard for all platforms
ntcard-1.0.0.tar.gz

All Releases

Version Released Description Compatibility Licenses Status
1.0.0 Jan 11, 2017 See ntCard GitHub page for details. More about this release… GPLv3 for non-commercial usage final