- Romain Menegaux
- Jean-Philippe Vert
We propose a new model for fast classification of DNA sequences output by next generation sequencing machines. The model, which we call fastDNA, embeds DNA sequences in a vector space by learning continuous low-dimensional representations of the k-mers it contains. We show on metagenomics benchmarks that it outperforms state-of-the-art methods in terms of accuracy and scalability.