Sequences

Nucleic acid or amino acid sequences, from import and assembly over multiple alignment to annotation, SNP analysis, primer design and comparison of complete chromosomes.

Minhashing based cluster analysis of sequences

MinHash techniques allow the comparison of large datasets of genomic sequences which is currently infeasible with alignment based approaches. During minhashing the k-mer content of each sequence is first determined and each k-mer is then passed through a hash function to obtain hashes. Retaining the lowest hashes enables the sampling of a random set of k-mers, which is called a ”sketch” or ”MinHash signature”. Using only these signatures the similarity of the original sequences can be compared in a rapid and accurate way.