Hybrid Search

What It Is

Hybrid search combines two fundamentally different retrieval approaches — dense vector search (semantic similarity) and sparse keyword search (lexical matching like BM25) — then merges their results using a fusion algorithm. This captures both semantic meaning and exact keyword matches, consistently outperforming either method used alone.

The Problem It Solves

Dense retrieval (embeddings) captures meaning but misses exact terms. A query for “error code E-4012” will not match a document about that error code if the embedding space was not trained on that specific identifier. Sparse retrieval (BM25) captures exact terms but misses semantics. “How to fix authentication issues” will not match a document titled “Resolving login failures.”

Using either approach alone leaves a class of queries poorly served:

Dense only: Misses exact identifiers, product names, error codes, and specialized terminology.
Sparse only: Misses paraphrases, synonyms, and conceptual similarity.

Hybrid search addresses both weaknesses by retrieving candidates from both systems and fusing the results.

How It Works

flowchart TD
    A["Query"] --> B["Dense retrieval (embeddings)"]
    A --> C["Sparse retrieval (BM25)"]
    B --> D["Top-K dense results"]
    C --> E["Top-K sparse results"]
    D --> F["Reciprocal Rank Fusion (RRF)"]
    E --> F
    F --> G["Optional reranking"]
    G --> H["Merged top-K results"]

The query is sent to both retrieval systems in parallel.
Dense retrieval embeds the query and performs approximate nearest neighbor search against the document embeddings.
Sparse retrieval runs BM25 (or similar) against the keyword index.
Results from both systems are merged using a fusion algorithm. The most common approach is Reciprocal Rank Fusion (RRF), which combines rankings without requiring score normalization.
The fused top-K results are returned, optionally followed by a reranking step.

Reciprocal Rank Fusion formula:

$$\text{RRF}(d) = \sum_{r \in R} \frac{1}{k + \text{rank}_r(d)}$$

Where $R$ is the set of result lists, $\text{rank}_r(d)$ is the rank of document $d$ in list $r$, and $k$ is a constant (typically 60) that controls how much weight is given to top-ranked results.

When to Use It

Your document corpus contains both natural language content and structured identifiers (codes, product names, technical terms).
You need robust retrieval across diverse query types — from keyword lookups to conceptual questions.
You are building a RAG system and want the highest baseline retrieval quality before investing in more complex approaches.
Your evaluation shows that neither dense nor sparse retrieval alone achieves acceptable recall.

When not to Use It

Your queries and documents are pure natural language with no specialized terminology. Dense retrieval alone may be sufficient, and the added complexity of hybrid search provides marginal improvement.
Your corpus is extremely small (under 1,000 documents). At small scale, the difference between retrieval methods is negligible, and a simple embedding search is adequate.
You cannot operationally maintain two search indices. Hybrid search requires both a vector store and a keyword index, doubling indexing infrastructure.
Latency constraints are extremely tight. Running two searches in parallel and fusing results adds overhead compared to a single retrieval path.

Trade-offs

Infrastructure complexity — Two search systems to maintain, index, and keep in sync. Any document update must be reflected in both indices.
Tuning the fusion — The relative weight of dense vs. sparse results affects quality. The optimal ratio varies by domain and query distribution. RRF is a good default but may not be optimal.
Increased indexing cost — Documents must be both embedded (compute-intensive) and tokenized for BM25 (I/O-intensive). Indexing takes longer and uses more storage.
Diminishing returns with reranking — If you add a reranking step after hybrid search, the reranker often compensates for weaknesses in individual retrieval methods, reducing the marginal benefit of hybrid over dense-only + reranker.

Implementation Example

from dataclasses import dataclass
import math


@dataclass
class SearchResult:
    doc_id: str
    content: str
    score: float


def bm25_search(query: str, corpus: dict[str, str], k: int = 10) -> list[SearchResult]:
    query_terms = query.lower().split()
    scores: dict[str, float] = {}
    n = len(corpus)
    avg_dl = sum(len(doc.split()) for doc in corpus.values()) / max(n, 1)

    df: dict[str, int] = {}
    for doc in corpus.values():
        seen = set(doc.lower().split())
        for term in seen:
            df[term] = df.get(term, 0) + 1

    k1, b = 1.5, 0.75

    for doc_id, doc in corpus.items():
        doc_terms = doc.lower().split()
        dl = len(doc_terms)
        tf: dict[str, int] = {}
        for t in doc_terms:
            tf[t] = tf.get(t, 0) + 1

        score = 0.0
        for term in query_terms:
            if term not in tf:
                continue
            term_df = df.get(term, 0)
            idf = math.log((n - term_df + 0.5) / (term_df + 0.5) + 1)
            term_tf = tf[term]
            numerator = term_tf * (k1 + 1)
            denominator = term_tf + k1 * (1 - b + b * dl / avg_dl)
            score += idf * numerator / denominator

        if score > 0:
            scores[doc_id] = score

    ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:k]
    return [SearchResult(doc_id=d, content=corpus[d], score=s) for d, s in ranked]


def dense_search(
    query_embedding: list[float],
    doc_embeddings: dict[str, list[float]],
    corpus: dict[str, str],
    k: int = 10,
) -> list[SearchResult]:
    import numpy as np

    q = np.array(query_embedding)
    scores = {}
    for doc_id, emb in doc_embeddings.items():
        d = np.array(emb)
        cos_sim = float(np.dot(q, d) / (np.linalg.norm(q) * np.linalg.norm(d) + 1e-10))
        scores[doc_id] = cos_sim

    ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:k]
    return [SearchResult(doc_id=d, content=corpus[d], score=s) for d, s in ranked]


def reciprocal_rank_fusion(
    *result_lists: list[SearchResult],
    k: int = 60,
    top_n: int = 10,
) -> list[SearchResult]:
    rrf_scores: dict[str, float] = {}
    content_map: dict[str, str] = {}

    for results in result_lists:
        for rank, result in enumerate(results):
            rrf_scores[result.doc_id] = rrf_scores.get(result.doc_id, 0.0) + (
                1.0 / (k + rank + 1)
            )
            content_map[result.doc_id] = result.content

    ranked = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:top_n]
    return [
        SearchResult(doc_id=d, content=content_map[d], score=s)
        for d, s in ranked
    ]


def hybrid_search(
    query: str,
    query_embedding: list[float],
    corpus: dict[str, str],
    doc_embeddings: dict[str, list[float]],
    k: int = 10,
) -> list[SearchResult]:
    sparse_results = bm25_search(query, corpus, k=k)
    dense_results = dense_search(query_embedding, doc_embeddings, corpus, k=k)
    return reciprocal_rank_fusion(sparse_results, dense_results, top_n=k)

For production systems, use a database that natively supports hybrid search (Weaviate, Qdrant, Elasticsearch with vector search) instead of running separate systems.

Tool Landscape

Tool	Type	Notes
Weaviate	Vector database	Native hybrid search with configurable dense/sparse weighting
Qdrant	Vector database	Supports sparse vectors alongside dense, enabling hybrid retrieval
Elasticsearch	Search engine	kNN vector search + BM25 in a single query
Pinecone	Managed vector DB	Sparse-dense hybrid search with integrated dot-product fusion
LanceDB	Embedded vector DB	Full-text search + vector search with reranking support

Classic RAG Pattern — Hybrid search is an upgrade to the retrieval component in classic RAG.
Reranking Pattern — Typically applied after hybrid search for further precision improvement.
Chunking Strategy Pattern — Chunk size affects both dense and sparse retrieval quality.
GraphRAG Pattern — For relational queries where neither hybrid nor dense search is sufficient.