Posts

Indexing 10M vector embeddings in 1 minute on a single CPU

Indexing large vector embedding collections remains a pain point when setting up vector search infrastructures. Indexes like HNSW or IVF can take hours or even days to construct. To tackle this issue, we have developed and open-sourced SuperKMeans, a super-fast clustering library for high-dimensional vector embeddings that drastically reduces indexing time from hours to mere seconds. In this blog post, I will show how SuperKMeans can index 10 million 1024-dimensional vector embeddings in just 1 minute on a single CPU. Additionally, I will explain the secret sauce behind SuperKMeans’s extremely fast clustering performance. ...

Sub-millisecond similarity search on IVF indexes with PDX

In a previous blog post, I talked about PDX. PDX is a data layout that transposes vectors in a column-major order. This layout unleashes the true potential of dimension pruning algorithms for similarity search. Since then, we noticed that PDX fell short in certain settings, such as retrieving more than 10 neighbors or when targeting recalls below 0.95. In this blog post, I discuss how we addressed these issues and achieved sub-millisecond similarity search on millions of vectors using only vanilla IVF indexes on the PDX layout. This is remarkable, as vanilla IVF indexes are deemed “slow” by many vector database vendors. ...

Graviton3 vs Graviton4 in Vector Similarity Search

AWS Graviton 3 > Graviton 4 for Vector Similarity Search

If you are doing vector search with a vector library that supports SVE, you should use a Graviton 3 machine. It is cheaper, and it will also deliver more raw performance. A few months ago, we started working on a vertical layout for vector similarity search (PDX). As part of the benchmarks that we were running on different microarchitectures and vector systems like FAISS, Milvus, and Usearch, there was an observation that puzzled us: Graviton3 performed better than Graviton4 in almost all vector search scenarios, not only in queries per dollar (QP$) but also in queries per second (QPS). This was the case across vector libraries and even in our implementations of vector search algorithms. Here is one example of the QPS and QP$ of both microarchitectures on queries to an IVF index: ...

What if we store vector embeddings vertically?

By using a columnar layout for vectors, you can speed up vector similarity search thanks to the more efficient distance kernels and efficient pruning of dimensions. This entry is a summary of our work, PDX: A Data Layout for Vector Similarity Search. A few months ago, we came across this 20-year-old paper proposing a vertical layout for vectors. That means not storing vectors one after the other but storing the same dimension of different vectors together (see image above). In databases, this is referred to as “columnar storage.” ...