SecureAgentRAG
BYOK demoPrivacy-first, multi-agent Retrieval-Augmented Generation. Four production patterns most RAG demos skip: RBAC at the vector-DB layer, sensitivity-based inference routing, NLI citation-faithfulness gate, and a SHA-256 hash-chained audit log.
Live at secureagentrag-web.vercel.app on a fully-free stack: Next.js 16 + Vercel Edge → FastAPI on Hugging Face Spaces → Qdrant Cloud + Groq Free Tier. $0/month, no credit card, recruiter-pasteable from any device.
Your document never mixes with anyone else's — it lands in a private, session-scoped vector collection and is auto-deleted within 24h. ⓘ what this hosted demo does vs self-hosted
🔒RBAC at the vector layer
Qdrant payload filters enforce org_id + sensitivity_level + roles on every search — dense and sparse share the same filter, so the cross-tenant bypass class is structurally impossible.
🛡️Sensitivity-based routing
HIGH-classified data never leaves local Ollama in self-hosted mode. The public demo flags HIGH-on-cloud explicitly with a sensitivity badge so the visitor is informed.
🧠NLI faithfulness gate
Each cited sentence is re-checked for entailment against the source chunk. Citation marker ≠ claim entailed — we enforce the gap. Off in BYOK mode for Groq RPM budget; live in self-hosted mode.
📜Tamper-evident audit chain
Every operation lands in a SHA-256 hash-chained JSONL log. Visitors can download their session's chain and re-verify integrity offline.
🇪🇬 افهم عقدك — Arabic-first RAG
اسأل بالعربية عن عقد العمل أو الإيجار أو التسجيل الضريبي، واحصل على إجابة مُستشهَدة بالمصدر من مستندات توضيحية مصرية — مع نفس ضوابط الخصوصية والتحكم في الوصول.
BGE-M3 multilingual embeddings + an Arabic-aware chunker + an Arabic-terminator faithfulness gate mean Arabic questions retrieve, cite, and answer end-to-end — privacy-first, $0.
How it works
- 1You pick a persona (engineer / compliance / executive) — RBAC + clearance get applied to every Qdrant search.
- 2You ask a question — 9 LangGraph nodes run end-to-end with token-by-token SSE streaming.
- 3The UI shows the proof — trace pills for every node, citation chips with source / page / score, NLI faithfulness percentage, query rewrite if it fired, downloadable audit log.
- 4Switch personas + re-ask — some chunks vanish from the citations panel. That's RBAC at the Qdrant payload layer.
- 5Bring your own LLM key (Groq / OpenAI / Anthropic / Ollama) and skip the owner-key throttle. Keys live in browser localStorage only — never persisted server-side.
- 6Upload your own docs (PDF / TXT / MD up to 5 MB · 5 files / session). Each upload lands in a session-scoped Qdrant collection fused with the base corpus via reciprocal rank fusion.
By the numbers
706
tests passing
40
ADRs
36k
Python LOC
$0
monthly cost
🛠️ Run your own copy
The whole stack is forkable and $0. Duplicate the Space, deploy the frontend, point it at your own Qdrant + Groq keys.