Minhashing based cluster analysis of sequences

MinHash techniques allow the comparison of large datasets of genomic sequences which is currently infeasible with alignment based approaches. During minhashing the k-mer content of each sequence is first determined and each k-mer is then passed through a hash function to obtain hashes. Retaining the lowest hashes enables the sampling of a random set of k-mers, which is called a ”sketch” or ”MinHash signature”. Using only these signatures the similarity of the original sequences can be compared in a rapid and accurate way.

Download demonstration database: 
WGS_demo_database_for_Listeria_monocytogenes

Demonstration database containing data for a set of 51 Listeria monocytogenes isolates. This database uses publicly available next-generation sequence reads from the Sequence Read Archive (SRA). For each isolate, NGS reads were de novo assembled into genome sequences. wgMLST alleles were called using the assembly-based and assembly-free method.

Note that the downloaded database backup file (.bnbk) can be restored via the Restore database... functionality in the BIONUMERICS startup screen.