Vector databases, first contact
AI
A practitioner's first look at vector databases. They are a new architectural primitive for semantic search, but the real challenge isn't storage—it's the embedding model dependency.
The first time I saw a vector database demo, my reaction wasn't excitement. It was suspicion. For twenty years, I’d built systems on the bedrock of deterministic logic: query for an ID, get a row. This new tool felt fuzzy, imprecise. It promised to find "similar" things, but what does that mean to a machine?
My journey started there, not with a vendor pitch, but with a recurring architectural problem. Our systems were getting better at storing information but remained stubbornly literal, unable to grasp the intent behind a user's query. We needed a new primitive.

Beyond the Limits of Lexical Search
For decades, we solved search with SQL's LIKE clause or a dedicated engine like Apache Lucene. These tools are powerful, fast, and mature, indexing documents by breaking them down into keyword tokens. But they operate on a lexical level—the level of words, not meaning. A user searching for "ways to reduce car running costs" would get no results for a perfectly relevant article titled "Tips for Improving Automobile Fuel Efficiency."
To the engine, "car" and "automobile" are just different strings. You can layer on synonym lists and stemmers, but you’re always patching the symptom, not solving the problem. The system doesn't understand intent. This is the wall: as applications need to become smarter, the lexical approach becomes the wrong tool for the job.

The Embedding is the Architecture
The breakthrough for me was realizing the vector database isn't the magic. The magic is the embedding. An embedding is a vector—a long list of numbers—that represents content in a high-dimensional space. The model that generates this vector is trained to place items with similar meanings close to each other.
Suddenly, "car" and "automobile" are no longer just strings. An embedding model maps them to two points that are, mathematically, very close together. The query is no longer "find documents with this word," but "find the 10 points in this space that are closest to the point representing my query." The database, then, is a highly specialized tool for one job: storing these vectors and finding their nearest neighbors at speed. It's an index for meaning.
Database, Library, or Extension?
A dedicated vector database isn't the only way to solve this. For many projects, especially early on, a full database is overkill. A library like Meta AI's Faiss can be embedded directly in your application to build an index in-memory. This is simple, fast, and avoids adding another piece of infrastructure.
The trade-off is coupling. As your system grows, you might need to decouple the search index from the application service for independent scaling. That's when a dedicated database starts to make sense. A third option is the hybrid approach: using vector capabilities now built into systems you already run. Projects like pg_vector for PostgreSQL or the native k-NN features in OpenSearch allow you to add similarity search to your existing stack. The choice depends on your scale, your team's operational load, and whether the specialized performance of a dedicated system justifies its overhead.
Trading Certainty for Relevance
A traditional database gives an exact answer. A vector search gives a ranked list of candidates. This is because finding the absolute nearest neighbors at scale is computationally infeasible for the low-latency queries most applications need. Instead, these systems use Approximate Nearest Neighbor (ANN) algorithms.
An algorithm like HNSW, detailed in the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs", builds a clever graph structure to navigate the vector space rapidly. As the name implies, the result is an approximation. You sacrifice a small amount of recall for a massive gain in speed. For a deep dive into how these algorithms stack up, Erik Bernhardsson's open-source ANN Benchmarks site is the canonical resource. For most AI work, like providing context for Retrieval-Augmented Generation (RAG), this trade-off is a fantastic bargain. A "good enough" relevant document is infinitely more useful than "no results found."
The Dependency the Demos Don't Show
A proof-of-concept is easy. Pick a model, embed some documents, and run a query. It works like magic. Production is harder. Your single biggest architectural dependency is the embedding model itself. If you switch to a better model a year later, you can't just repoint your application. The vector space created by the new model is completely incompatible with the old one.
Every single item in your database must be re-embedded and re-indexed. This is a full-scale data migration. This "re-indexing" problem is the key operational challenge. It means versioning your models, planning for these migrations, and understanding that your vector store is a living system that co-evolves with the models that populate it. The demos never show you the pain of a full-cluster re-index at 3am.
A New Primitive, Not a Replacement
My initial suspicion has been replaced by a respect for this tool's proper place. A vector database doesn't replace PostgreSQL or a full-text search engine; it complements them. The modern data stack is about composing these specialized tools, letting the deterministic and the probabilistic work together.
- Start with the right question. Do you need to match exact, structured data (SQL), find specific keywords (full-text search), or understand user intent and meaning (vector search)?
- The embedding model is your biggest commitment. The database is just an index. The model defines the structure of that index. Choose it carefully and plan for its eventual replacement.
- Embrace the trade-off. You are giving up deterministic precision for semantic relevance. This is the right trade for AI features but the wrong one for financial ledgers.
- Consider starting small. A library like Faiss or an extension like pg_vector may be all you need. Graduate to a dedicated database when your scaling or decoupling needs demand it.