A document QA agent is only useful when it can show its work

The fastest way to make a document chatbot impressive is to make it sound confident.

The fastest way to make it useful is to make it accountable.

That is why the document QA agent is built around citations first. The goal was not just to answer questions about uploaded PDFs or text files. The goal was to show which chunks the answer came from, keep the retrieval path understandable, and let someone run the project without paid API keys.

The pipeline

The app has a simple shape:

upload a document
split it into chunks
store a bounded corpus in memory
build a lightweight inverted index
retrieve likely chunks for a question
generate an answer with citations attached

There is an OpenAI-backed path, but the default mode is a deterministic mock generator. That choice makes the demo easier to review and easier to test.

Why mock mode is not a shortcut

Mock mode proves that the product surface, retrieval contract, citation builder, and tests work even when no external model is available.

A reviewer can see the important system behavior immediately: documents come in, chunks are created, retrieval returns evidence, and the answer points back to source material. The model can improve later. The accountability pattern should already be there.

This is the part I keep coming back to with RAG systems. The retrieval quality and citation design determine whether anyone can trust the output. A confident hallucination with no citations is worse than a hedged answer with good source attribution. The system should make it easy to verify the answer, not just easy to accept it.