Spaced seeds improve k-mer-based metagenomic classification


Břinda, K., Sykulski, M. & Kucherov, G., 2015. Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics , 31 (22) , pp. 3584–3592. Copy at

Date Published:



MOTIVATION: Metagenomics is a powerful approach to study genetic content of environmental samples, which has been strongly promoted by next-generation sequencing technologies. To cope with massive data involved in modern metagenomic projects, recent tools rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes.

RESULTS: Within this general framework, we show that spaced seeds provide a significant improvement of classification accuracy, as opposed to traditional contiguous k-mers. We support this thesis through a series of different computational experiments, including simulations of large-scale metagenomic projects.

Availability and implementation, Supplementary information: Scripts and programs used in this study, as well as supplementary material, are available from

Publisher's Version

Last updated on 12/09/2018