Paraclu 9 – Find Clusters in Data attached to Sequences

Paraclu 9

:: DESCRIPTION

Paraclu finds clusters in data attached to sequences. It was first applied to transcription start counts in genome sequences, but it could be applied to other things too. Paraclu is intended to explore the data, imposing minimal prior assumptions, and letting the data speak for itself.

::DEVELOPER

Computational Biology Research Center[CBRC]

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux
  • C++ compiler

:: DOWNLOAD

 Paraclu

:: MORE INFORMATION

Citation

MC Frith, E Valen, A Krogh, Y Hayashizaki, P Carninci, A Sandelin,
A code for transcription initiation in mammalian genomes,
Genome Research 2008 18(1):1-12.

CD-HIT 4.8.1 – Cluster Large Protein database at High Sequence Identity Threshold

CD-HIT 4.8.1

:: DESCRIPTION

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

CD-HIT Online Version

::DEVELOPER

Group of Weizhong LiGodzik Lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux
:: DOWNLOAD

 CD-HIT

:: MORE INFORMATION

Citation:

Ying Huang, Beifang Niu, Ying Gao, Limin Fu and Weizhong Li.
CD-HIT Suite: a web server for clustering and comparing biological sequences.
Bioinformatics, 2010(26): 680-682

BlastR 2.2 – Searching and Clustering RNA

BlastR 2.2

:: DESCRIPTION

BlastR is a new method for searching Non-Coding RNAs in databases. The strategy we adopted relies on the use of the mutual information embedded in di-nucleotides. We have shown that this approach has better sensitivity and specifity than other softwares with comparable computational cost. BlastR package is a perl wrapper for BlastP and it is part of the T-Coffee distribution.

::DEVELOPER

Notredame’s Lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

:: DOWNLOAD

 BlastR

:: MORE INFORMATION

Citation

Bussotti G, Raineri E, Erb I, Zytnicki M, Wilm A, Beaudoing E, Bucher P, Notredame C.
BlastR—fast and accurate database searches for non-coding RNAs
Nucleic Acids Res. 2011 Sep 1;39(16):6886-95.

pong – Fast Analysis and Visualization of Latent Clusters in Population Genetic data

pong

:: DESCRIPTION

pong is a freely available software package for post-processing output from clustering inference using population genetic data

::DEVELOPER

the Ramachandran Lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOsX/ Windows
  • Python

:: DOWNLOAD

 pong

:: MORE INFORMATION

Citation

pong: fast analysis and visualization of latent clusters in population genetic data.
Behr AA, Liu KZ, Liu-Fang G, Nakka P, Ramachandran S.
Bioinformatics. 2016 Jun 9. pii: btw327.

DistMap 1.0 – A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster

DistMap 1.0

:: DESCRIPTION

DistMap is a user-friendly pipeline designed to map short reads in a MapReduce framework on a local Hadoop cluster.

DEVELOPER

Institute of Population Genetics, University of Veterinary Medicine Vienna

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOsX
  • Java
  • Perl
  • Mapper executable
  •  MergeSamFiles.jar and SortSam.jar from PICARD (http://picard.sourceforge.net).
  • A working Hadoop cluster.

:: DOWNLOAD

 DistMap

:: MORE INFORMATION

Citation:

Pandey RV, Schlötterer C. (2013)
DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
PLoS One. 8(8):e72614.

SeqGrapheR 0.4.8.5 – Graph based Visualization of Cluster of DNA sequence reads

SeqGrapheR 0.4.8.5

:: DESCRIPTION

 SeqGrapheR package provide interactive GUI for visualization of DNA sequence clusters.

::DEVELOPER

Laboratory of Molecular CytogeneticsInstitute of Plant Molecular Biology ,Biology Centre ASCR

:: SCREENSHOTS

:: REQUIREMENTS

  • Linux / Windows / MacOsX
  • R package

:: DOWNLOAD

 SeqGrapheR

:: MORE INFORMATION

Citation

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data
Petr Novák, Pavel Neumann and Jiří Macas
BMC Bioinformatics 2010, 11:378

PATTERN CLUSTERING 20060220 – Cluster a set of DNA patterns

PATTERN CLUSTERING 20060220

:: DESCRIPTION

Pattern clustering is a tool to cluster a set of DNA patterns onto smaller and more representative set of DNA patterns.Most pattern enumeration tools, such as POCO, report patterns overlapping. Therefore after discovering a set of statistically significant nucleotide patterns, it is useful to cluster the overlapping patterns into representative patterns.

::DEVELOPER

Liisa Holm’s Bioinformatics Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux

:: DOWNLOAD

  Pattern clustering

:: MORE INFORMATION

clusterMaker2 0.9.5 – Creat and Visualize Cytoscape Clusters

clusterMaker2 0.9.5

:: DESCRIPTION

UCSF clusterMaker is a Cytoscape plugin that unifies different clustering techniques and displays into a single interface. Current clustering algorithms include hierarchical, k-medoid, AutoSOME, and k-means for clustering expression or genetic data; and MCL, transitivity clustering, affinity propagation, MCODE, community clustering (GLAY), SCPS, and AutoSOME for partitioning networks based on similarity or distance values. Hierarchical, k-medoid, AutoSOME, and k-means clusters may be displayed as hierarchical groups of nodes or as heat maps. All of the network partitioning cluster algorithms create collapsible “meta nodes” to allow interactive exploration of the putative family associations within the Cytoscape network, and results may also be shown as a separate network containing only the intra-cluster edges, or with inter-cluster edges added back

::DEVELOPER

the Resource for Biocomputing, Visualization, and Informatics (RBVI) at UCSF

:: SCREENSHOTS

clusterMaker

:: REQUIREMENTS

:: DOWNLOAD

 clusterMaker

:: MORE INFORMATION

Citation:

clusterMaker: a multi-algorithm clustering plugin for Cytoscape.
Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, Bader GD, Ferrin TE.
BMC Bioinformatics. 2011 Nov 9;12:436. doi: 10.1186/1471-2105-12-436.

clusterGenomics 1.0 – Identifying Clusters in Genomics data by recursive Partitioning

clusterGenomics 1.0

:: DESCRIPTION

clusterGenomics is an R package of identifying clusters in genomics data by recursive partitioning

::DEVELOPER

Research Group for Biomedical Informatics (BMI)

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / Windows
  • R

:: DOWNLOAD

 clusterGenomics

:: MORE INFORMATION

Citation:

Stat Appl Genet Mol Biol. 2013 Oct 1;12(5):637-52. doi: 10.1515/sagmb-2013-0016.
Identifying clusters in genomics data by recursive partitioning.
Nilsen G, Borgan O, Liestøl K, Lingjærde OC.

CASSIS / SMIPS 201511 – Prediction of Secondary Metabolite Gene Clusters in Eukaryotic Genomes

CASSIS / SMIPS 201511

:: DESCRIPTION

CASSIS (cluster assignment by islands of sites) is a tool to predict secondary metabolite (SM) gene clusters around a given anchor (or backbone) gene.

SMIPS (secondary metabolites by InterProScan) is a tool to predict secondary metabolite (SM) anchor genes, also called SM backbone genes, in protein sequences.

::DEVELOPER

CASSIS team

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Windows/Linux
  • Perl

:: DOWNLOAD

 CASSIS / SMIPS

:: MORE INFORMATION

Citation

CASSIS and SMIPS: Promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes.
Wolf T, Shelest V, Nath N, Shelest E.
Bioinformatics. 2015 Dec 9. pii: btv713.