CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
:: MORE INFORMATION
Ying Huang, Beifang Niu, Ying Gao, Limin Fu and Weizhong Li.
CD-HIT Suite: a web server for clustering and comparing biological sequences.
Bioinformatics, 2010(26): 680-682