Superkmeans

Indexing large vector embedding collections remains a pain point when setting up vector search infrastructures. Indexes like HNSW or IVF can take hours or even days to construct. To tackle this issue, we have developed and open-sourced SuperKMeans, a super-fast clustering library for high-dimensional vector embeddings that drastically reduces indexing time from hours to mere seconds. In this blog post, I will show how SuperKMeans can index 10 million 1024-dimensional vector embeddings in just 1 minute on a single CPU. Additionally, I will explain the secret sauce behind SuperKMeans’s extremely fast clustering performance. ...