RAG Solutions
IVF Indexing for Product Search: Scaling Beyond Brute-Force Vector Comparisons
Key Takeaway
“IVF (Inverted File) indexing transforms vector similarity search from O(N) to approximately O(sqrt(N)) by clustering embeddings and searching only relevant partitions, enabling sub-100ms product search queries at the cost of minor recall degradation.”
The Brute-Force Bottleneck in Vector Search
Implementing semantic search for e-commerce products using Postgres vector columns is straightforward: convert product descriptions into embeddings, store them in a vector-type column, and compute cosine distances against incoming query embeddings. This naive approach works initially but becomes a performance liability as the catalog scales.
Every search query triggers a full table scan, computing cosine similarity between the query vector and every single product embedding. For marketplaces with tens of thousands or millions of SKUs, this brute-force computation creates unacceptable query latency. The search operation degrades from milliseconds to seconds, making real-time user experiences impossible and directly impacting conversion rates.
How IVF Clustering Reduces Search Space
Inverted File Index (IVF) addresses this by partitioning the vector space into clusters before any queries arrive. Using K-means clustering, the indexing phase groups similar product embeddings together based on their vector proximity. Each cluster gets a centroid that represents its center point in the high-dimensional space.
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create products table with vector column
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
embedding vector(1536)
);
-- Create IVF index with 100 clusters
CREATE INDEX ON products
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Set number of clusters to probe during search
SET ivfflat.probes = 10;
-- Perform similarity search
SELECT id, name, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM products
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;
When a search query arrives, IVF first compares the query embedding against cluster centroids (a much smaller set) to identify the nearest clusters. It then performs brute-force search only within those selected clusters, dramatically reducing the comparison space from N products to approximately N/k products, where k is the cluster count.
Tuning the Recall-Performance Trade-Off
The critical parameter in IVF is the number of clusters (nlist) and the number of clusters probed during search (nprobe). Setting nlist higher creates smaller, more homogeneous clusters but increases indexing time and centroid comparison overhead. Setting nprobe higher improves recall by searching more clusters but reduces the speed advantage.
In production, selecting nlist = sqrt(N) where N is the dataset size provides a reasonable starting point. For a 50,000 product catalog, creating 200-250 clusters balances cluster granularity with centroid lookup speed. Setting nprobe to 8-12 clusters typically maintains 95%+ recall while delivering 12-18x speedup over brute-force search.
Applied Insight: When to Use IVF Indexing
Use IVF indexing when your product catalog exceeds 10,000 embeddings and search latency becomes user-facing. For smaller catalogs under 5,000 products, brute-force search remains fast enough and avoids index maintenance overhead. Monitor your recall metrics closely—if business requirements demand >98% recall for revenue-critical queries, consider more sophisticated indexing like HNSW (Hierarchical Navigable Small World) or product quantization techniques that better preserve search quality at scale.
© 2025 BeautifulCode. All rights reserved.