RAG Quality Depends on Ranking: Rerankers as a Way to Reread Search Results¶
For / Key Points
For: People using RAG who feel that answers are slightly off, and readers who are hearing the word reranker for the first time.
Key Points:
- RAG quality problems are often caused by search ranking, not the generative model
- A strong RAG pipeline separates "retrieve broadly" from "reread carefully"
- Ettin Reranker, published in May 2026, is a new option that is easy to test in existing RAG systems
When a RAG answer fails, the first suspect is often the generative model. "Maybe a smarter LLM would fix it" is a natural reaction. In practice, that is often the wrong place to start.
The question for this article is: when RAG is wrong, how should you inspect ranking before blaming the generator?
Who Is Really at Fault When RAG Gives the Wrong Answer?¶
Upgrading the generator does not always improve a RAG system.
The reason is simple: the LLM can only read the documents it receives. A person cannot answer correctly if they are handed the wrong reference material. The same is true for an LLM. If retrieval sends documents that do not answer the question, the final answer will drift.
That means RAG quality is not decided only at generation time. It is largely decided earlier, when the system chooses which documents to send to the LLM.
Once you see the problem this way, the improvement target changes. Before switching to a larger model, inspect the top five retrieved documents.
Separate "Retrieve Broadly" from "Reread Carefully"¶
The reason retrieval is often split into two stages is easier to understand through hiring.
If 1,000 people apply, a company may shortlist 100 by resumes and interview 5. It does not deeply interview all 1,000 people because that would be too slow. The first stage is broad and cheap. The second stage is slower and more careful.
RAG retrieval follows the same pattern. When a corpus has tens of thousands of documents, reading every document carefully is too expensive. So the system first retrieves roughly similar candidates, then rereads only those candidates to decide the final ranking.
The component that retrieves broadly is the embedder. The component that rereads carefully is the reranker.
| Component | Good At | Weak At |
|---|---|---|
| embedder | Quickly collecting candidates from many documents | Missing fine differences in meaning |
| reranker | Rereading candidates and fixing order | Too expensive to run over every document |
Apartment hunting is another useful analogy. You search broadly with conditions, then visit a few places before deciding. Some things only become clear after a second look.
Why an Embedder Alone Is Not Enough¶
An embedder turns the query and each document into vectors independently. Then it treats nearby vectors as similar.
This is fast. But the judgment is compressed into one distance score.
The problem is that texts can be close while pointing in different directions. "Mac will not boot" and "how to boot a Mac" are close in topic and wording. But the user wants troubleshooting, not a startup manual.
Negation, prerequisites, product versions, and document granularity make this harder. If the top 10 results feel "similar but not answering," you are probably seeing the limit of embedding retrieval.
A reranker addresses this by reading the query and document together. An embedder sees the two texts separately and compares distance afterward. A reranker reads them side by side and asks whether the document actually answers the question.
It sees more information. It also costs more to run.
Where to Start When RAG Quality Feels Weak¶
Start with the search results, not the generated answer.
Ask these questions in order.
- Is the correct document included in the top N results?
- If not, the problem is likely retrieval scope or the embedder
- If yes, but it appears too low, the problem is ranking
- If yes, and it is already near the top, inspect the prompt and generation step
Without this split, it is easy to waste time on the wrong fix. RAG problems live in three layers: retrieval, ranking, and generation. Different layers need different interventions.
A reranker is useful in the second case: the correct document was found, but ranked too low. Changing the embedder may still leave rough ordering problems. Adding a reranker lets the system correct the order after broad retrieval.
You do not need to find the perfect embedder first. It is simpler to retrieve loosely, then improve precision in the narrowing stage.
Ettin Reranker as a Concrete Option¶
Rerankers are not new, but Hugging Face published the Ettin Reranker family on 2026-05-19, adding a practical new option1. It includes six sizes from 17M to 1B parameters, all under Apache 2.0.
Three points are worth watching.
| Point | Official Positioning | Likely Use |
|---|---|---|
| 17M | Smaller than older MiniLM rerankers while improving quality | CPU inference and low latency |
| 150M | Strong mid-sized model under 600M parameters | Normal GPU-backed operation |
| 1B | Nearly matches the 1.54B teacher model | Quality-first or overnight batch reranking |
The small models matter in practice. Many teams hesitate because reranking sounds expensive. The 17M and 32M models lower that first testing cost.
You can add a reranker to an existing RAG system and start small. If it helps, move to a larger model later.
The training data is also public, which opens a path to domain-specific retraining2. This gives teams an option between using the public model as-is and building everything from scratch.
What to Watch in Japanese RAG¶
Do not treat English benchmark strength as proof of Japanese RAG quality.
Ettin Reranker is evaluated mainly on the English MTEB benchmark in the official article1. If your documents are Japanese FAQs, meeting notes, contracts, or internal policies, you still need local evaluation.
A practical test can be small. Prepare 30 to 100 query-document examples from your own data. Have a human mark the expected ranking, then compare top-N accuracy with and without the reranker. The set does not need to be perfect to reveal the trend.
Behavior can differ by language and document type. Public benchmarks show expectations. Your own data shows whether the change works in your environment.
Summary¶
When improving RAG, fixing search ranking is often more effective than changing the generative model. The basic pattern is to retrieve broadly with an embedder and reread candidates with a reranker.
Ettin Reranker is a new way to start that second stage. It offers Apache 2.0 licensing, six sizes, public data, and Sentence Transformers compatibility, making it easy to compare against an existing RAG baseline.
The next step is to measure how many of the top five documents actually answer the question. Once that number is visible, you can tell whether the problem is retrieval, ranking, or generation. That makes the next fix much clearer.