Flash-KMeans achieves 200× speedup over FAISS on GPUs by reorganizing data movement in Lloyd's k-means without altering the algorithm. It is IO-aware and designed for high-frequency calls in AI training and inference loops. The implementation runs on NVIDIA H200 hardware.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
Google's DiffusionGemma boosts speed 4x