The hallucination problem is a retrieval problem

Shahed DaoudApril 22, 20266 min read

The first time a partner watches a model invent a citation, the reaction is predictable. The model is broken. The technology is not ready. The vendor was selling vapor.

The reaction is wrong in a specific and important way. In almost every case we have investigated, the model did not invent the citation in a vacuum. It invented it because the system around it asked the wrong question, or showed the model the wrong documents, or showed it nothing at all and asked it to pretend otherwise.

Hallucination is a retrieval problem.

What the model is actually doing

A language model, asked a legal question with no documents attached, will produce something that looks like a legal answer. It does this because the question pattern is in its training distribution. It is not lying; it is completing.

The fix is not "a better model." The fix is to never put the model in that position. A practitioner asking the model about a deposition should be answered from the deposition transcript that sits in the firm's document store — not from the model's recollection of depositions in general.

The discipline is to refuse to answer when the retrieval system cannot produce a source. A model that says "I do not have the document required to answer this" is worth ten that confidently produce a plausible-sounding wrong answer.

What "retrieval" actually means

Most firms have heard the word "retrieval-augmented generation" and assume it is solved. It is not. There are three places retrieval breaks, and each one is the engagement.

Index scope. What documents does the system search? If the answer is "everything the firm has ever filed," the retrieval will be too noisy and the model will be too tempted to fill gaps. If the answer is "the documents related to this matter," the noise drops, the citations resolve, and the failure mode changes from invention to "I do not have it."

Ethical-wall awareness. A practitioner working on Matter A cannot be shown documents from Matter B, even if Matter B has the better answer. Retrieval that does not enforce the wall is worse than no retrieval — it produces *confident*, *sourced*, *unauthorised* responses that look defensible and are not.

Provenance. Every claim the model makes has to resolve to a paragraph in a named document. Not a hand-wave to the corpus. Not a footnote that says "based on the firm's case files." A paragraph, a page, a date. If the citation does not resolve, the answer does not ship.

What this looks like in production

A retrieval-augmented system that works has three properties an honest practitioner can describe in writing:

It refuses to answer when the index is empty. The refusal is the feature.
It cites paragraph-level, not document-level. A reviewer can verify each claim in under a minute.
It enforces matter scope and ethical walls at the index layer, not the prompt layer. The model never sees the documents it cannot use.

A firm that builds retrieval to that bar will find that "hallucination" becomes a rare, narrow, and diagnosable event. Not a daily occurrence to be managed with vendor apologies.

The hallucination headlines are real. The mistake is treating them as a property of language models, instead of a property of the systems around them. The engineering work that closes the gap is unglamorous, specific, and well within reach. It is also the only honest answer to the question every partner is asking right now: can we trust this?

← All insights Request a consultation →