Building an Agentic RAG Pipeline for Legal Question-Answering

It's hard to keep up with changes in the law, especially if those documents are too long to read.
Why not have an AI read all of them for you - and just prompt control for the other side's (or your own) political bias (or lack thereof!)

Overall Performance & Quality (317 runs)

Latency Waterfall Breakdown (ms)

Total Latency Distribution

Min / Max

22.7s / 135.6s

Median

39.7s

Pipeline Explained

Query Expansion

Expands user's query to consider complementing angles to enrich IR and final answer thoroughness.

Retrieval

BM25 sparse retrieval, then dense retrieval with Qwen3-Embedding-0.6B:F16, and reranking with Qwen3-Reranker-0.6B on RTX 4070 Super.

Synthesis

One-shot agentic answer for top result, then batch calls for reranked docs, followed by a final merge step. Implemented with ChatOllama using gemma3:4b-it-q4_K_M on RTX 3060


Best Answers Showcase

Question: At what age am I protected from age discrimination in employment?

(Score: 5/5 Correct, 5/5 Faithful)

TL;DR: Age discrimination protection under federal law begins at age 40...

High-Consistency Citations

  • [100.0%] 29 U.S. Code § 623
  • [100.0%] 29 U.S. Code § 621
  • [100.0%] 42 U.S. Code § 6103

Recall Success Rate (p=0.9)

  • BM25 Success Rate: 100.00%
  • Hybrid Success Rate: 100.00%

IR Percentage for 80% Final Used
IR Percentage Hit for 100% Used

Ben Truong
Ben Truong
ML Engineer

I build ML/GenAI apps for the cloud and private clusters.

Related