Hybrid Search in RAG Pipelines: Why It Matters

Written by Ivan Kotev | Feb 21, 2025 4:04:43 PM

What is Hybrid Search?

Hybrid search is a powerful search methodology that combines multiple search techniques—such as keyword-based (lexical) search and semantic (vector) search—to deliver more accurate and relevant search results. By leveraging the strengths of both approaches, hybrid search enhances information retrieval, making it an essential component in AI-driven search applications, including Retrieval-Augmented Generation (RAG) pipelines.

Why Use Hybrid Search?

Using a single search approach may not fully capture a user’s intent. Each search method has its own strengths and weaknesses:

Lexical (Keyword-Based) Search: Excels at precise keyword matching and structured queries, ensuring exact term-based retrieval. However, it lacks an understanding of meaning and may miss relevant results expressed differently.

Semantic (Vector-Based) Search: Understands intent and context through natural language processing, identifying conceptually similar content. However, it may return results that are contextually related but not strictly relevant to the query.

Hybrid search balances these strengths by retrieving both exact matches and meaning-based results, increasing both precision and recall.

Real-World Applications (Appendix F)

E-Commerce: A user searching for "affordable black sneakers" benefits from keyword matches (exact product descriptions) and semantic understanding (retrieving discounted options even without "affordable" in the description).

Customer Support: A search for "can't log in to my account" retrieves keyword-matching troubleshooting articles and semantically relevant guides on account recovery.

Healthcare: A doctor searching for "treatment for chronic headaches with nausea" finds keyword-matching studies and semantically related research on migraines or medication side effects.

Hybrid search enhances search accuracy, improving user experience across various domains.

Hybrid Search in RAG Pipelines

RAG models rely on a retrieval mechanism to fetch the most relevant documents before generating responses. Hybrid search optimizes this process by ensuring that retrieved documents are not only lexically similar but also semantically meaningful. This dual approach significantly improves response accuracy and relevance, as the model has access to high-quality contextual data.

For example, in a customer support chatbot, if a user asks, “How do I reset my internet router?”, a keyword search might retrieve FAQs containing exact matches for “reset” and “router”, while a semantic search could identify articles about troubleshooting internet connectivity. By combining both, hybrid search ensures the chatbot provides the most precise yet.

What is RRF?

Reciprocal Rank Fusion (RRF) is a method used to combine the results of multiple search or ranking algorithms into a single, unified ranking list. RRF ensures results are fairly represented by assigning higher scores to results that appear near the top of multiple rankings.

RRF formula is defined as:

RRFd =r∈R1k+rankr(d)

R: set of ranking sources (e.g., different search algorithms)
rankr: rank of the document d in the ranking list r
k: a smoothing constant to control the impact of the rank

RRF Example

Input data
Document rank1(d) rank2(d)

D1 1 2

D2 2 1

D3 3 3

Compute Ranking Scores, k = 60

Document Ranking list 1 Ranking list 2 RRF(d)

D1 0.01639 0.01613 0.03252

D2 0.01613 0.01639 0.03252

D3 0.01587 0.01587 0.03174

Hybrid search in PostgreSQL

Let's implement hybrid search in PostgreSQL using tsvector for keyword search and pgvector for semantic search.

We will start by creating a documents table to store the searchable documents.

CREATE TABLE IF NOT EXISTS documents

(

id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,

content TEXT NOT NULL,

embedding VECTOR(1024) NOT NULL,

content_fts TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', content)) STORED NOT NULL

);

CREATE INDEX IF NOT EXISTS idx_documents_content_fts

ON documents USING GIN (content_fts);

CREATE INDEX IF NOT EXISTS idx_documents_embedding_hnsw

ON documents USING HNSW (embedding vector_cosine_ops);

The table contains 4 columns:

id - auto-generated unique Id. It is used by RRF to match records from different search models.
content - contains the text to be searched over
embedding - vector column that stores the vector generated from the embedding model. It is used for semantic search and set to use 1024 dimensions in the current example, but can be adjusted to match the size other embedding models
content_fts is an auto-generated tsvector column designed for efficient full-text search. It represents preprocessed text, breaking it into lexemes (normalized tokens like words or stems) for fast searching and ranking

We also created 2 indexes to speed up querying times.

For full-text search, a Generalized Inverted (GIN) index is used, which is a type of index designed for handling composite data types such as those stored in a tsvector.
For semantic vector search, a HNSW index is used, a high-performance algorithm for approximate nearest neighbor (ANN) searches. It is used for efficiently finding the closest vectors to a given query vector in high-dimensional vector spaces. In this example, the index is configured with the vector_cosine_ops operator, as the cosine distance(<=>) operator is used later in queries.

Finally, we will create the hybrid search function and RRF:

CREATE OR REPLACE FUNCTION hybrid_search(

query_text text,

query_embedding vector(1024),

results_limit int,

full_text_weight float DEFAULT 1.0,

semantic_weight float DEFAULT 1.0,

rrf_k int DEFAULT 60

)

RETURNS TABLE (id int, content text)

LANGUAGE sql

AS $$

WITH tsquery AS (

SELECT websearch_to_tsquery('english', query_text) AS query

),

full_text AS (

SELECT id,

row_number() OVER (ORDER BY ts_rank_cd(content_fts, query, 16) DESC) AS rank

FROM documents, tsquery

WHERE content_fts @@ query

),

semantic AS (

SELECT id,

row_number() OVER (ORDER BY embedding <=> query_embedding) AS rank

FROM documents

)

SELECT documents.id, documents.content

FROM full_text

FULL OUTER JOIN semantic

ON full_text.id = semantic.id

JOIN documents

ON COALESCE(full_text.id, semantic.id) = documents.id

ORDER BY

COALESCE(1.0 / (rrf_k + full_text.rank), 0.0) * full_text_weight +

COALESCE(1.0 / (rrf_k + semantic.rank), 0.0) * semantic_weight DESC

LIMIT results_limit;

$$;

A few things to consider:

Input parameters. The first 3 of are required: query_text, query_embedding and results_limit, while the rest are optional and used to fine tune the rank fusion:
- query_text - the user's query text
- query_embedding - the vector representation of the user's query generated by the embedding model. It uses 1024 dimensions, but can be adjusted to match other embedding models. It must match the size of the embedding vector and embedding mode used on the documents table
- results_limit - the number of records returned in the limit clause
- full_text_weight and semantic_weight - change the weight of each search method in the final ranking score
- k - smoothing constant, default 60. For more details, you can refer to Appendix A: Methodology and Data Sources.
Return type: the function returns a list of matching documents with the highest combined scores from the 2 search modes.

Common Table Expression: The full-text and semantic search logic is encapsulated in CTEs:
- Full-text search CTE: uses the following built-in PostgreSQL full-text search functions:

websearch_to_tsquery: converts query text using web search-like syntax. Depending on the specific requirements and user input, it can be substituted with to_tsquery, phraseto_tsquery, or plainto_tsquery
ts_rank_cd: ranks search results based on their relevance to the search query, using a coverage distance ranking method. It considers not only the occurrence of the terms but also their proximity in the text. Gives a higher score when query terms are closely grouped together, enhancing phrase and proximity-based search relevance and reduces the score for documents where query terms are widely dispersed, even if they occur frequently. It works better for short text chunks, which are commonly used in RAG systems where documents are divided into smaller sections. The function accepts a normalization parameter that specifies how a document's length should impact rank result. A value of 16 means the function divides the rank by 1 + the logarithm of the number of unique words in the document, prioritizing on documents more focused on search query terms. This option is very similar to normalization = 8, but penalizes documents slightly less as the penalty grows more slowly. The range or returned values is from 0.0 to 1.0

Semantic search CTE uses the cosine distance(<=>) operator to measure the similarity between embeddings by calculating the cosine of the angle between two vectors, ignoring their magnitudes. It must match the vector operator in index creation on the documents table. The cosine distance is derived from the cosine similarity, which measures the cosine of the angle between two vectors in a multi-dimensional space:

Cosine Distance = 1 − Cosine Similarity.

Cosine Distance returns values from 0 to 2:

0 - When the vectors are identical (cosine similarity = 1).
1 - When the vectors are orthogonal (cosine similarity = 0).
2 - When the vectors are completely opposite (cosine similarity = -1)

Reciprocal Rank Fusion: combines the results from full-text search and semantic search. In the example we use an extension of the standard RRF method named Weighted Reciprocal Rank Fusion. WRRF improves RRF by introducing weights to different ranking sources. Instead of treating all input rankers equally, WRRF allows adjusting their importance:

where:

wr: the weight assigned to ranking list r

Finally to use the hybrid_search function we can execute the following query:

SELECT * FROM hybrid_search('coffee', '[...]',10, 1.0, 1.0, 60)

Multilingual Hybrid search

Many full-text search engines support multiple languages through language-specific analyzers for stemming, stop words, and tokenization. In PostgreSQL we can use the tsvector to store searchable text in a structured format (lexemes + positions) and tsquery to represent search terms and conditions. They both accept a language parameter to enable language-specific text processing and it must match for both of them to have correct search results. To support multiple languages in full-text search, tsquery must be associated with the appropriate language parameter, and it must match tsvector ‘s one.

Unlike keyword matching, semantic search focuses on the meaning of the text. It uses embeddings (vector representations) to map text into high-dimensional spaces where similar meanings are close together, regardless of exact wording. Using multilingual LLMs allows semantic search to work across different languages without the need for separate language-specific columns, i.e. text embeddings can be in different languages than query embedding. However the performance and quality of multilingual support is not always equal across languages and can vary.

Appendixes

Appendix A: Choose the right smoothing factor k in RRF

The choice of k depends on the specific use case and dataset, some typical values are:

k = 60 (most common one):
- Balances the impact of ranks in most search scenarios
- Ensures a reasonable distribution of influence across higher and lower ranks
K = 10:
- Emphasizes results from the top ranks
- Suitable for applications where only the top few results are critical (high-precision tasks)
k = 100:
- Reduces the dominance of top-ranked results and spreads influence more evenly across ranks
- Useful when combining results from highly diverse models where lower-ranked results may still have value
k = 1:
- Maximizes the emphasis on top-ranked results
- Used in scenarios where the quality of the top results is highly trusted and should dominate the final ranking

Generally, if users typically focus on only the top few results, smaller k is more appropriate, whereas for more generalized queries where users explore additional results, larger k values are preferable. Additionally, if rankings from different models vary significantly, a larger k helps ensure greater diversity in the results.

Appendix B: Choose the right distance function

For vector similarity search in PostgreSQL we use pgvector and cosine distance, however this is not the only function supported, there are several others, let’s discuss why we chose cosine distance for hybrid search:

<-> - L2 distance: Euclidean distance measures the straight-line distance between two points in an n-dimensional space, so vectors with different magnitudes might be far apart even if they are directionally similar. Semantic embeddings have hundreds or thousands of dimensions, that’s why Euclidean distance could become less reliable and not a good option for semantic search
<#> - inner product: measures how aligned two vectors are, without normalizing their magnitude and could give unfairly high scores to longer vectors, regardless of their actual semantic similarity. If vectors are normalized than results are similar to cosine distance method
<=> - cosine distance: measures the angle between two vectors, ignoring their magnitude. It focuses only on direction and works well in high-dimensional spaces
<~> - Hamming distance: measures the number of positions where two binary vectors differ and cannot be used for semantic embeddings which are continuous vectors
<%> - Jaccard distance: measures the dissimilarity between two sets and cannot be used over continuous vectors

Appendix C: Choose the right rank function

Different search methods use different scoring mechanisms:

Full-text search scores depend on term frequency and document length, ranging from 0.0 to 1.0.
Semantic search scores depend on cosine similarity values, ranging from 0 to 2.

Since the score distributions could vary greatly between these methods, directly combining raw scores would be problematic. Instead, using ranks ensures that we are comparing the relative importance of documents within each method rather than their absolute scores.

In the current example we use PostgreSQL’s ROW_NUMBER() function to assign a unique integer value to each row in the 2 result sets. Rank-based fusion makes the method resilient to outliers or score scaling issues, so if one retrieval model assigns much higher scores than another, simply summing scores would unfairly bias the final ranking toward one method. Using rank values avoids this issue because it normalizes the impact of each ranking method.

However RRF could also lead to loss of information, consider the following example:

Document	FTS Score	FTS Rank
D1	0.7	1
D2	0.3	222
D3	0.25	3

Document D1 has a much higher lexical score than the other documents, this advantage is lost when considering only ranks. One way to minimize the impact of lower scored documents is to apply a threshold value a document must meet to be included in the search results. However the main issue still persists - ranking discards the magnitude of differences between document scores.

There are several strategies to deal with this issue:

Min-Max Normalized Score Fusion

Z-Score Normalization

: mean of scores
:standard deviation of scores

Appendix D: Search comparison

Hybrid search combines lexical search and semantic search to offer the best of both worlds. Below are some examples illustrating how hybrid search outperforms lexical search or pure semantic search alone:

1. Handling Exact Keyword Matches & Synonyms

Lexical search alone: Prioritizes exact keyword matches but struggles with synonyms or different word forms
Semantic search alone: Finds related meanings but may miss documents with precise keyword matches
Hybrid search: Ensures exact keyword matches while also incorporating semantic similarity

Example Query:
"Electric car environmental benefits"

Lexical search: might rank a document higher if it contains "electric car" multiple times, even if it doesn't discuss environmental benefits deeply.
Semantic search might retrieve documents discussing “sustainability of electric vehicles” or “green energy cars” but miss out on those that specifically mention "electric car" as a key term
Hybrid search ranks results that include both exact keyword matches ("electric car") and semantically related content (e.g., "environmental impact of electric vehicles").

2. Handling Out-of-Vocabulary (OOV) Terms

Lexical search: Cannot retrieve relevant documents if the query contains a new term or uncommon phrase.
Semantic search: May still understand the meaning but could miss documents without explicit matches.
Hybrid search: Can leverage semantic understanding while ensuring keyword presence.

Example Query:
"Quantum computing algorithms for optimization"

Lexical search: may fail if "quantum computing algorithms for optimization" was not common when the documents were indexed.
Semantic search Could still retrieve documents on topics like "quantum algorithms for problem solving" or "optimization in quantum computing," even if they don’t mention the exact phrase "quantum computing algorithms for optimization."
Hybrid search Finds documents that explicitly mention "quantum computing algorithms for optimization" as well as those discussing similar topics semantically, like "quantum algorithms for optimization problems" or "quantum computing for algorithmic improvement."

3. Managing Ambiguity in Queries

Lexical search: May retrieve irrelevant results if the query term has multiple meanings.
Semantic search: May infer intent incorrectly and omit useful keyword-based matches.
Hybrid search: Balances both, retrieving keyword-strong and contextually relevant results.

Example Query:
"Python tutorials for beginners"

Lexical search might return results about the snake if "Python" is common in those documents.
Semantic search might retrieve machine learning content because "Python" is often associated with AI/ML.
Hybrid search ensures documents explicitly mentioning “Python programming” and ranking those explaining beginner concepts.

4. Improving Recall for Long-Tail Queries

Lexical search: Fails when there’s no exact match.
Semantic search: Struggles with domain-specific jargon.
Hybrid search: Retrieves documents that match in some form—either keywords or meaning.

Example Query:
"Legal implications of AI-generated contracts"

Lexical search might struggle if legal cases use different phrasing (e.g., "contracts drafted by artificial intelligence").
Semantic search might retrieve general AI ethics papers rather than legal discussions.
Hybrid search finds results with both legal terminology (BM25) and semantically similar discussions on AI contracts (semantic search).

5. Improving Search in Multilingual or Noisy Data

Lexical search: Struggles with translated documents or typos.
Semantic search: Does well with meaning but can omit keyword-strong matches.
Hybrid search: Handles both spelling variations and semantic meaning.

Example Query:
"Effects of metaverse on social interaction"

Lexical search might miss content if "metaverse" is spelled differently (e.g., "meta-verse" or in another language).
Semantic search might retrieve results about "virtual worlds and human relationships" but not necessarily those mentioning "metaverse."
Hybrid search can match documents with both exact and conceptually related terms.

Conclusion: Why Hybrid Search is Better

Feature	Lexical Search	Semantic Search	Hybrid Search
Exact keyword match	✅ Yes	❌ No	✅ Yes
Synonym & paraphrase understanding	❌ No	✅ Yes	✅ Yes
Handles ambiguous queries	❌ No	✅ Yes	✅ Yes
Works with unseen terms (OOV)	❌ No	✅ Yes	✅ Yes
Handles long-tail or complex queries	❌ No	✅ Yes	✅ Yes
Works well across languages & typos	❌ No	✅ Yes	✅ Yes

Hybrid search outperforms both BM25 and semantic search by:

Ensuring exact keyword matches (BM25).
Capturing semantic meaning even when keywords differ (semantic search).
Balancing precision (BM25) and recall (semantic search).
Handling new terms, typos, and multilingual searches effectively.

Appendix E: Cosine Distance vs. Euclidean Distance

1. Cosine similarity

Cosine distance is derived from the cosine similarity, which measures the angle between two vectors in a multi-dimensional space. The formula for cosine similarity between two vectors A and B is:

where:

is the dot product of the vectors
and are the magnitudes (norms) of the vectors

Cosine distance is then computed as:

This means that if two vectors are perfectly aligned, cosine distance is 0, while if they are completely opposite, cosine distance is 1.

2. Euclidean Distance

Euclidean distance is a measure of the straight-line distance between two points in space. It is computed as:

where:

A and B are n-dimensional vectors.

Euclidean distance considers the absolute differences between vector components and is sensitive to magnitude, so vectors with different magnitudes might be far apart even if they are directionally similar.

Why Use Cosine Distance in RAG?

In RAG, cosine distance is often preferred over Euclidean distance for several reasons:

Focus on Direction, Not Magnitude
- In vector embeddings (like from BERT, OpenAI embeddings, or other NLP models), the semantic meaning of words is captured more in the direction of the vector rather than its magnitude.
- Cosine distance only measures the angle between vectors, ensuring that two similar meaning vectors are close regardless of their size.
Normalization Eliminates bias from Vector Length
- Many embeddings are not normalized, and their magnitude can vary based on factors like sentence length or model training dynamics.
- Euclidean distance is magnitude-sensitive, meaning longer vectors (with higher norm values) may appear farther apart even if they are semantically similar
- Cosine distance normalizes the vectors, ensuring that only semantic similarity matters.
Works Well with High-Dimensional Data
- NLP embeddings are typically high-dimensional (e.g., 768 for BERT, 1536 for OpenAI's embeddings).
- Euclidean distance suffers from the “curse of dimensionality”, where distances become less meaningful as dimensions increase.
- Cosine distance remains effective in high-dimensional spaces.
Better for Sparse Vectors (Common in NLP)
- Many NLP applications use sparse or TF-IDF representations where many elements are zero.
- Cosine similarity/distance is more stable in such cases, whereas Euclidean distance can be skewed.

When to Use Euclidean vs. Cosine Distance?

Scenario	Use Cosine Distance	Use Euclidean Distance
Text Similarity (Embeddings, RAG, NLP)	✅ Preferred	❌ Not ideal
High-dimensional data (e.g., 768D embeddings)	✅ Works well	❌ Loses meaning
Sparse representations (TF-IDF, word vectors)	✅ Handles well	❌ Can be misleading
Geometric distances (e.g., real-world coordinates)	❌ Not suitable	✅ Best choice
Dense, small-dimensional data (e.g., 3D, 5D points)	❌ Not necessary	✅ Works well

Appendix F: Hybrid Search use cases

Here are some use cases for hybrid search that demonstrate its versatility and power across various domains:

1. E-Commerce

Scenario: A customer searches for "affordable black sneakers."
Hybrid Search in Action:
- Keyword Search: Matches exact product descriptions (e.g., "black sneakers").
- Semantic Search: Understands intent and context (e.g., "affordable" might retrieve discounted or low-cost products even if the word "affordable" isn’t in the product description).
Benefit: Combines precision with contextual understanding, improving product discovery.

2. Customer Support

Scenario: A customer searches a knowledge base for help with "can't log in to my account."
Hybrid Search in Action:
- Keyword Search: Matches exact phrases from troubleshooting articles.
- Semantic Search: Retrieves contextually relevant articles (e.g., guides related to account recovery, even if the words "log in" aren’t explicitly mentioned).
Benefit: Faster, more relevant results lead to better customer satisfaction.

3. Healthcare

Scenario: A doctor searches for "treatment for chronic headaches with nausea" in a medical research database.
Hybrid Search in Action:
- Keyword Search: Matches documents containing specific medical terms like "chronic headaches."
- Semantic Search: Identifies studies on related topics, such as migraines or side effects of medications, by understanding medical concepts and context.
Benefit: Helps professionals make more informed decisions by retrieving comprehensive and nuanced information.

4. Legal and Compliance

Scenario: A lawyer searches for case precedents related to "intellectual property disputes in digital media."
Hybrid Search in Action:
- Keyword Search: Finds cases with the exact phrase "intellectual property disputes."
- Semantic Search: Retrieves contextually similar cases, even if they use synonymous terms like "copyright disputes" or "IP conflicts."
Benefit: Increases the chances of finding relevant legal precedents, even when terminology varies.

5. Academic Research

Scenario: A student or researcher looks for papers on "renewable energy in developing countries."
Hybrid Search in Action:
- Keyword Search: Matches papers with exact phrases like "renewable energy."
- Semantic Search: Understands related topics, such as "solar power adoption" or "green technology in low-income nations."
Benefit: Improves the breadth and depth of research material retrieved.

6. Human Resources and Recruiting

Scenario: A recruiter searches for "software engineers experienced in Python and AI."
Hybrid Search in Action:
- Keyword Search: Matches résumés with exact skills ("Python," "AI").
- Semantic Search: Finds candidates with related experience (e.g., "machine learning engineer" or "data scientist") even if the specific keywords aren’t listed.
Benefit: Surfaces better-matched candidates and expands the talent pool.

7. Media and Entertainment

Scenario: A user searches a streaming platform for "thrilling sci-fi movies."
Hybrid Search in Action:
- Keyword Search: Matches titles or descriptions containing "sci-fi" or "thrilling."
- Semantic Search: Recommends related movies based on plot, genre, or user sentiment (e.g., movies like "Inception" or "Interstellar").
Benefit: Enhances user experience by combining exact matches with contextually similar content.

8. Financial Services

Scenario: A customer searches for "low-interest personal loans for small businesses."
Hybrid Search in Action:
- Keyword Search: Matches offerings labeled "low-interest personal loans."
- Semantic Search: Retrieves related options, such as "small business loans" or "microloans with competitive rates."
Benefit: Improves financial product discovery for users with complex needs.

9. Knowledge Management in Organizations

Scenario: Employees search for "project management tools for agile teams" in an internal document repository.
Hybrid Search in Action:
- Keyword Search: Matches documents containing "agile teams" and "project management tools."
- Semantic Search: Identifies relevant resources, such as case studies, team workflows, or related tools, even if the specific keywords are absent.
Benefit: Boosts productivity by surfacing relevant knowledge efficiently.

10. Retail and Inventory Management

Scenario: A store manager searches for "products under $50 related to summer camping."
Hybrid Search in Action:
- Keyword Search: Matches products explicitly tagged with "summer camping."
- Semantic Search: Suggests related items like "portable coolers" or "lightweight tents," even if the exact query terms aren’t included.
Benefit: Facilitates smarter inventory searches and better recommendations.

Hybrid search shines in scenarios where exact matches alone may not suffice, providing a balanced and enriched user experience.

View full post

Document	Ranking list 1	Ranking list 2	RRF(d)
D1	0.01639	0.01613	0.03252
D2	0.01613	0.01639	0.03252
D3	0.01587	0.01587	0.03174

Document	rank1(d)	rank2(d)
D1	1	2
D2	2	1
D3	3	3