Skip to content
~/dipjyoti
Go back

RAG Is Not Always Vector Search

· 7 min read

Ask most engineers to sketch a RAG pipeline and you’ll get the same drawing: chunk the documents, embed them, drop them in a vector database, embed the query, return the nearest neighbours. Vector search has become so synonymous with RAG that the two terms get used interchangeably.

They shouldn’t be. RAG is not always vector search. Semantic similarity is one retrieval method among several — powerful, but reaching for it by default is how you end up with a pipeline that recalls plenty and retrieves the wrong thing. The job of RAG is to put relevant external knowledge in front of the model; vector search is just one way to find it, and often not the best one.

The “Typical” RAG Workflow (and its hidden assumptions)

Let’s start with the standard RAG paradigm that often leads to the vector-search-only misconception:

  1. Ingestion: Your external knowledge (documents, articles, data) is split into smaller chunks. These chunks are then converted into high-dimensional numerical representations called “embeddings” using an embedding model. These embeddings are stored in a vector database.
  2. Query: A user’s query is also converted into an embedding.
  3. Retrieval: The vector database is searched for document chunks whose embeddings are “similar” (closest in the vector space) to the query’s embedding.
  4. Generation: The retrieved chunks are passed as context to the LLM, which then generates a response grounded in this information.

This workflow is highly effective for finding conceptually similar content, even when exact keywords don’t match. It’s why vector search has gained such prominence.

Beyond the Embedding: When Vector Search Falls Short

However, relying solely on vector search can lead to limitations:

The Broader Spectrum of RAG Retrieval Methods

RAG, at its core, is about augmenting LLMs with relevant external information. How that information is retrieved can vary widely. Here are several powerful alternatives and complements to pure vector search:

1. Hybrid Search (Vector + Keyword/Full-Text)

This is perhaps the most common and effective evolution. Hybrid search combines the strengths of both semantic (vector) search and lexical (keyword/full-text) search.

2. Knowledge Graphs

For highly structured or interconnected knowledge, knowledge graphs offer a powerful alternative to flat document chunks.

3. Rule-Based Retrieval and Metadata Filtering

Sometimes, simple rules or metadata can be the most efficient retrieval mechanism.

4. Agentic RAG / Multi-step Reasoning

This advanced approach involves breaking down complex queries into sub-queries and using different retrieval strategies for each step.

5. Summarization as Retrieval

Instead of retrieving entire documents or chunks, the “retrieval” step can involve generating a concise summary of relevant information.

Illustrative Workflow: RAG with Hybrid Search and Re-ranking

Let’s visualize a more comprehensive RAG workflow that moves beyond just vector search:

graph TD
    A[User Query] --> B[Query Transformation & Routing]
    B --> C[Keyword Search BM25]
    B --> D[Semantic Search Vector Database]
    C --> E[Initial Document Candidates]
    D --> E
    E --> F[Re-ranking Cross-Encoder]
    F --> G[Top-K Relevant Chunks]
    G --> H[LLM Augmented with Context]
    H --> I[Generated Response]

    subgraph "Indexing Pipeline"
        J[Documents] --> K[Chunking & Metadata Extraction]
        K --> L[Embedding Generation]
        K --> M[Keyword Index Creation]
    end

    L -.-> D
    M -.-> C

Workflow Breakdown:

  1. User Query: The user asks a question.
  2. Query Transformation & Routing: An initial LLM or a rule-based system might refine the user’s query for better searchability or route it to specific retrieval modules based on its nature.
  3. Keyword Search: A traditional full-text search engine (like Elasticsearch or Lucene, often powered by algorithms like BM25) retrieves documents based on keyword matches.
  4. Semantic Search (Vector Database): Concurrently, the query is embedded, and a vector database performs a similarity search to find semantically related document chunks.
  5. Initial Document Candidates: Results from both keyword and semantic searches are combined, forming a broader set of potential candidates.
  6. Re-ranking: A more computationally intensive model (often a cross-encoder) takes each candidate chunk and the original query, and re-ranks them based on a deeper understanding of their relevance. This step is crucial for boosting precision.
  7. Top-K Relevant Chunks: The highest-ranked chunks are selected.
  8. LLM (Augmented with Context): These top-K chunks are then provided as context to the LLM.
  9. Generated Response: The LLM synthesizes the information and generates a grounded response.

Indexing Pipeline (Preparation Phase):

Conclusion

While vector search has undeniably propelled RAG into the spotlight, it’s vital for generative AI engineers to recognize that it’s a powerful tool, not the only tool. The true strength of RAG lies in its flexibility to integrate diverse retrieval strategies. By understanding and strategically combining methods like hybrid search, knowledge graphs, rule-based systems, and agentic approaches, we can build more robust, accurate, and truly intelligent RAG applications that push the boundaries of what LLMs can achieve. So, the next time you design a RAG system, ask yourself: “Is vector search truly the only way to retrieve this information, or can I leverage a broader arsenal of retrieval techniques?” Your answers might surprise you.


Share this post:

Related Posts


Previous Post
Fine-tuning Generative AI Models: A Practical Guide with LlamaIndex
Next Post
Building Robust Intent Classifiers with Generative AI