Sub-millisecond similarity search on IVF indexes with PDX

Sub-millisecond similarity search on IVF indexes with PDX

In a previous blog post, I talked about PDX. PDX is a data layout that transposes vectors in a column-major order. This layout unleashes the true potential of dimension pruning algorithms for similarity search. Since then, we noticed that PDX fell short in certain settings, such as retrieving more than 10 neighbors or when targeting recalls below 0.95. In this blog post, I discuss how we addressed these issues and achieved sub-millisecond similarity search on millions of vectors using only vanilla IVF indexes on the PDX layout. This is remarkable, as vanilla IVF indexes are deemed “slow” by many vector database vendors. ...

July 24, 2025 · 7 min · 1316 words · Leonardo Kuffo
What if we store vector embeddings vertically?

What if we store vector embeddings vertically?

By using a columnar layout for vectors, you can speed up vector similarity search thanks to the more efficient distance kernels and efficient pruning of dimensions. This entry is a summary of our work, PDX: A Data Layout for Vector Similarity Search. A few months ago, we came across this 20-year-old paper proposing a vertical layout for vectors. That means not storing vectors one after the other but storing the same dimension of different vectors together (see image above). In databases, this is referred to as “columnar storage.” ...

March 26, 2025 · 12 min · 2471 words · Leonardo Kuffo