BARCODE – A fast Lossless Read Compression tool based on Bloom Filters

BARCODE

:: DESCRIPTION

BARCODE achieves highly efficient compression by using a reference genome, but completely circumvents the need for alignment, affording a great reduction in the time needed to compress.

::DEVELOPER

Ron Shamir’s lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Windows/Linux/ MacOsX
  • Python

:: DOWNLOAD

  BARCODE

:: MORE INFORMATION

Citation

BMC Bioinformatics. 2014;15 Suppl 9:S7. doi: 10.1186/1471-2105-15-S9-S7. Epub 2014 Sep 10.
Fast lossless compression via cascading Bloom filters.
Rozov R, Shamir R, Halperin E.

HapZipper – Compression Scheme for HapMap Phase III Phased Data

HapZipper

:: DESCRIPTION

HapZipper is a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma.

::DEVELOPER

Joel Bader lab

:: SCREENSHOTS

N/A

::REQUIREMENTS

  • Linux / Windows
  • JRE

:: DOWNLOAD

 HapZipper

:: MORE INFORMATION

Citation

HapZipper: sharing HapMap populations just got easier.
Chanda P, Elhaik E, Bader JS.
Nucleic Acids Res. 2012 Nov 1;40(20):e159. doi: 10.1093/nar/gks709.

MINCE v0.5.0 ‐ Bucketing-based Reference-free Compression

MINCE v0.5.0

:: DESCRIPTION

MINCE is a technique for encoding collections of short reads so that they can be more effectively compressed via a standard compressor like LZIP.

::DEVELOPER

Kingsford Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOs

:: DOWNLOAD

MINCE

:: MORE INFORMATION

Citation

Bioinformatics. 2015 Sep 1;31(17):2770-7. doi: 10.1093/bioinformatics/btv248. Epub 2015 Apr 24.
Data-dependent bucketing improves reference-free compression of sequencing reads.
Patro R, Kingsford C.

Referee – Rapid, Separable Compression for Sequence Alignments

Referee

:: DESCRIPTION

Referee is a command-line tool that takes sequence alignment SAM files and compresses them in a lossless manner.

::DEVELOPER

Kingsford Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOs

:: DOWNLOAD

Referee

:: MORE INFORMATION

Citation

Darya Filippova, Carl Kingsford (2015).
Rapid, separable compression enables fast analyses of sequence alignments.
Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, pages 194-201.

Kpath 0.6.3 – Statistical Reference-based Compression for Short Reads

Kpath 0.6.3

:: DESCRIPTION

Kpath (PathEnc) is a reference-based compression software of short read data sets.

::DEVELOPER

Kingsford Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux/ MacOsX
  • Go

:: DOWNLOAD

 Kpath

:: MORE INFORMATION

Citation

Reference-based compression of short-read sequences using path encoding.
Kingsford C, Patro R.
Bioinformatics. 2015 Feb 2. pii: btv071.

CODOC 0.0.2 – Analysis and Compression of Depth of Coverage Signals

CODOC 0.0.2

:: DESCRIPTION

CODOC is a compressed data format and API for coverage data stemming from sequencing experiments

::DEVELOPER

Niko Popitsch

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux/ Windows/ MacOsX
  • Java

:: DOWNLOAD

 CODOC

:: MORE INFORMATION

Citation

Bioinformatics. 2014 May 28. pii: btu362. [Epub ahead of print]
CODOC: Efficient Access, Analysis and Compression of Depth of Coverage Signals.
Popitsch N.

HaMMLET – Fast Bayesian Hidden Markov Model with Wavelet Compression

HaMMLET

:: DESCRIPTION

HaMMLET is a fast Forward-Backward Gibbs sampler for Bayesian inference on Hidden Markov Models (HMM). It uses the Haar wavelet transform to dynamically compress the data based on the current variance sample in each iteration.

::DEVELOPER

Schliep lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux
  • GCC

:: DOWNLOAD

 HaMMLET

:: MORE INFORMATION

Citation

Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression.
Wiedenhoeft J, Brugel E, Schliep A.
PLoS Comput Biol. 2016 May 13;12(5):e1004871. doi: 10.1371/journal.pcbi.1004871.

MetaCRAM – Lossless Compression Tool for Metagenomic Reads

MetaCRAM

:: DESCRIPTION

MetaCRAM is a pipeline for taxonomy identification and lossless compression of FASTA-format metagenomic reads.  It integrates algorithms for taxonomy identification, read alignment, assembly, and finally, a reference-based compression method in a parallel manner.

::DEVELOPER

Minji Kim

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Windows/Linux/MacOsX
  • Perl
  • Java

:: DOWNLOAD

  MetaCRAM

:: MORE INFORMATION

Citation

MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression.
Kim M, Zhang X, Ligo JG, Farnoud F, Veeravalli VV, Milenkovic O.
BMC Bioinformatics. 2016 Feb 19;17(1):94. doi: 10.1186/s12859-016-0932-x

CAST 1.2.1 – Compression-accelerated BLAST and BLAT

CAST 1.2.1

:: DESCRIPTION

CAST is a set of tools that compress data in a way that allows direct computation on the compressed data. Compression-accelerated BLAST (CaBLAST) and Compression-accelerated BLAT (CaBLAT) are two prototype implementations of alignment and sequence search algorithms that apply “compressive genomics” : i.e., they exploit redundancy in genomic data sets by compressing data in a way that allows direct computation on the compressed data.

::DEVELOPER

Berger Lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux /  MacOsX
  • C++ Compiler
  • NCBI C++ Toolkit
  • BLAST+
  • BLAT

:: DOWNLOAD

  CAST

:: MORE INFORMATION

Citation:

Nat Biotechnol. 2012 Jul 10;30(7):627-30. doi: 10.1038/nbt.2241.
Compressive genomics.
Loh PR, Baym M, Berger B.

ERGC – Referential Genome Compression algorithm

ERGC

:: DESCRIPTION

ERGC (Efficient Referential Genome Compressor) is a genome compression tool. It compresses a target genome using a reference genome.

::DEVELOPER

Sanguthevar Rajasekaran

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux
  • Java

:: DOWNLOAD

 ERGC

:: MORE INFORMATION

Citation

ERGC: an efficient referential genome compression algorithm.
Saha S, Rajasekaran S.
Bioinformatics. 2015 Nov 1;31(21):3468-75. doi: 10.1093/bioinformatics/btv399