HaMMLET – Fast Bayesian Hidden Markov Model with Wavelet Compression

HaMMLET

:: DESCRIPTION

HaMMLET is a fast Forward-Backward Gibbs sampler for Bayesian inference on Hidden Markov Models (HMM). It uses the Haar wavelet transform to dynamically compress the data based on the current variance sample in each iteration.

::DEVELOPER

Schliep lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux
  • GCC

:: DOWNLOAD

 HaMMLET

:: MORE INFORMATION

Citation

Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression.
Wiedenhoeft J, Brugel E, Schliep A.
PLoS Comput Biol. 2016 May 13;12(5):e1004871. doi: 10.1371/journal.pcbi.1004871.

MetaCRAM – Lossless Compression Tool for Metagenomic Reads

MetaCRAM

:: DESCRIPTION

MetaCRAM is a pipeline for taxonomy identification and lossless compression of FASTA-format metagenomic reads.  It integrates algorithms for taxonomy identification, read alignment, assembly, and finally, a reference-based compression method in a parallel manner.

::DEVELOPER

Minji Kim

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Windows/Linux/MacOsX
  • Perl
  • Java

:: DOWNLOAD

  MetaCRAM

:: MORE INFORMATION

Citation

MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression.
Kim M, Zhang X, Ligo JG, Farnoud F, Veeravalli VV, Milenkovic O.
BMC Bioinformatics. 2016 Feb 19;17(1):94. doi: 10.1186/s12859-016-0932-x

CAST 1.2.1 – Compression-accelerated BLAST and BLAT

CAST 1.2.1

:: DESCRIPTION

CAST is a set of tools that compress data in a way that allows direct computation on the compressed data. Compression-accelerated BLAST (CaBLAST) and Compression-accelerated BLAT (CaBLAT) are two prototype implementations of alignment and sequence search algorithms that apply “compressive genomics” : i.e., they exploit redundancy in genomic data sets by compressing data in a way that allows direct computation on the compressed data.

::DEVELOPER

Berger Lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux /  MacOsX
  • C++ Compiler
  • NCBI C++ Toolkit
  • BLAST+
  • BLAT

:: DOWNLOAD

  CAST

:: MORE INFORMATION

Citation:

Nat Biotechnol. 2012 Jul 10;30(7):627-30. doi: 10.1038/nbt.2241.
Compressive genomics.
Loh PR, Baym M, Berger B.

ERGC – Referential Genome Compression algorithm

ERGC

:: DESCRIPTION

ERGC (Efficient Referential Genome Compressor) is a genome compression tool. It compresses a target genome using a reference genome.

::DEVELOPER

Sanguthevar Rajasekaran

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux
  • Java

:: DOWNLOAD

 ERGC

:: MORE INFORMATION

Citation

ERGC: an efficient referential genome compression algorithm.
Saha S, Rajasekaran S.
Bioinformatics. 2015 Nov 1;31(21):3468-75. doi: 10.1093/bioinformatics/btv399

DSRC 2.0 RC2 – DNA Sequence Reads Compression

DSRC 2.0 RC2

:: DESCRIPTION

DSRC is an application designed for compression of data files containing reads from DNA sequencing in FASTQ format. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Usually universal compression programs like gzip or bzip2 are used for this purpose, but it is obvious that a specialized tool can work better.

::DEVELOPER

REFRESH Bioinformatics Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOSX / Windows
  • C++ Compiler

:: DOWNLOAD

 DSRC

:: MORE INFORMATION

Citation

Bioinformatics. 2014 Apr 18. pii: btu208.
DSRC 2-Industry-oriented compression of FASTQ files.
Roguski L, Deorowicz S.

S. Deorowicz and Sz. Grabowski,
Compression of DNA sequence reads in FASTQ format,
Bioinformatics (2011) 27(6):860–862.

Kolmogorov – Compression-based Classification of Biological Sequences and Structures

Kolmogorov

:: DESCRIPTION

Kolmogorov is a multistep approach to classify and cluster Biological Sequences and Structures, via Compression.

::DEVELOPER

Raffaele Giancarlo

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOSX / Windows
  • Perl
  • BioPerl

:: DOWNLOAD

 Kolmogorov

:: MORE INFORMATION

Citation

BMC Bioinformatics. 2007 Jul 13;8:252.
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.
Ferragina P1, Giancarlo R, Greco V, Manzini G, Valiente G.

HapZipper – Compression Scheme for HapMap Phase III Phased Data

HapZipper

:: DESCRIPTION

HapZipper is a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma.

::DEVELOPER

Joel Bader lab

:: SCREENSHOTS

N/A

::REQUIREMENTS

  • Linux / Windows
  • JRE

:: DOWNLOAD

 HapZipper

:: MORE INFORMATION

Citation

HapZipper: sharing HapMap populations just got easier.
Chanda P, Elhaik E, Bader JS.
Nucleic Acids Res. 2012 Nov 1;40(20):e159. doi: 10.1093/nar/gks709.

oculus 0.1.2 – Faster Sequence Alignment by Compression

oculus 0.1.2

:: DESCRIPTION

Oculus is a bioinformatic algorithm designed to increase sequence alignment speed for redundant input. It acts as a wrapper around any existing alignment algorithm capable of producing SAM-formatted output.

::DEVELOPER

The Michigan Center for Translational Pathology

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux
  • Perl

:: DOWNLOAD

Oculus

:: MORE INFORMATION

BMC Bioinformatics. 2012 Nov 13;13:297. doi: 10.1186/1471-2105-13-297.
Oculus: faster sequence alignment by streaming read compression.
Veeneman BA1, Iyer MK, Chinnaiyan AM.

DELIMINATE – Method for Loss-less Compression of Genomic Sequences

DELIMINATE

:: DESCRIPTION

DELIMINATE is a novel compression algorithm that can rapidly compress genomic sequence data in a loss-less fashion.

::DEVELOPER

Bio-Sciences R&D Division, TCS Innovation Labs

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / Windows/ MacOsX
  • 7-zip compression package

:: DOWNLOAD

 DELIMINATE

:: MORE INFORMATION

Citation

DELIMINATE – A fast and efficient method for loss-less compression of genomic sequences
Mohammed, M.H., Dutta, A., Bose, T., Chadaram, S., and Mande, S.S.
Bioinformatics. 2012 Oct 1;28(19):2527-9

FQC – Efficient Compression, Archival and Dissemination of Fastq Datasets

FQC

:: DESCRIPTION

FQC – a novel fastq compression method that, in addition to providing significantly higher compression gains as compared to GZIP (as well as other specialised fastq compressors), incorporates the features necessary for universal adoption by data-repositories and end-users.

::DEVELOPER

Bio-Sciences R&D Division, TCS Innovation Labs

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / Windows/ MacOsX
  • 7ZIP

:: DOWNLOAD

 FQC

:: MORE INFORMATION

Citation

FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets.
Dutta A, Haque MM, Bose T, Reddy CV, Mande SS.
J Bioinform Comput Biol. 2015 Jun;13(3):1541003. doi: 10.1142/S0219720015410036.