SecureAgentRAG

BYOK demo

Privacy-first, multi-agent Retrieval-Augmented Generation. Four production patterns most RAG demos skip: RBAC at the vector-DB layer, sensitivity-based inference routing, NLI citation-faithfulness gate, and a SHA-256 hash-chained audit log.

Live at secureagentrag-web.vercel.app on a fully-free stack: Next.js 16 + Vercel Edge → FastAPI on Hugging Face Spaces → Qdrant Cloud + Groq Free Tier. $0/month, no credit card, recruiter-pasteable from any device.

📄 Upload a doc, get a cited answer 🚀 Try the demo corpus 📑 Extract to JSON ⭐ Source on GitHub 📊 Live status

Your document never mixes with anyone else's — it lands in a private, session-scoped vector collection and is auto-deleted within 24h. ⓘ what this hosted demo does vs self-hosted

🔒RBAC at the vector layer

Qdrant payload filters enforce org_id + sensitivity_level + roles on every search — dense and sparse share the same filter, so the cross-tenant bypass class is structurally impossible.

🛡️Sensitivity-based routing

HIGH-classified data never leaves local Ollama in self-hosted mode. The public demo flags HIGH-on-cloud explicitly with a sensitivity badge so the visitor is informed.

🧠NLI faithfulness gate

Each cited sentence is re-checked for entailment against the source chunk. Citation marker ≠ claim entailed — we enforce the gap. Off in BYOK mode for Groq RPM budget; live in self-hosted mode.

📜Tamper-evident audit chain

Every operation lands in a SHA-256 hash-chained JSONL log. Visitors can download their session's chain and re-verify integrity offline.

🇪🇬 افهم عقدك — Arabic-first RAG

اسأل بالعربية عن عقد العمل أو الإيجار أو التسجيل الضريبي، واحصل على إجابة مُستشهَدة بالمصدر من مستندات توضيحية مصرية — مع نفس ضوابط الخصوصية والتحكم في الوصول.

BGE-M3 multilingual embeddings + an Arabic-aware chunker + an Arabic-terminator faithfulness gate mean Arabic questions retrieve, cite, and answer end-to-end — privacy-first, $0.

جرّب بالعربية ←

How it works

1You pick a persona (engineer / compliance / executive) — RBAC + clearance get applied to every Qdrant search.
2You ask a question — 9 LangGraph nodes run end-to-end with token-by-token SSE streaming.
3The UI shows the proof — trace pills for every node, citation chips with source / page / score, NLI faithfulness percentage, query rewrite if it fired, downloadable audit log.
4Switch personas + re-ask — some chunks vanish from the citations panel. That's RBAC at the Qdrant payload layer.
5Bring your own LLM key (Groq / OpenAI / Anthropic / Ollama) and skip the owner-key throttle. Keys live in browser localStorage only — never persisted server-side.
6Upload your own docs (PDF / TXT / MD up to 5 MB · 5 files / session). Each upload lands in a session-scoped Qdrant collection fused with the base corpus via reciprocal rank fusion.

📚 Corpus browserSee the 18 demo docs (incl. 8 Arabic): filename · sensitivity · roles · chunk counts.👤 Persona inspectorClearance level + roles + synth style for each RBAC preset.📊 Stack statusLive health of HF Space + Qdrant Cloud + Groq, polled every 30 s.

By the numbers

706

tests passing

ADRs

36k

Python LOC

monthly cost

🛠️ Run your own copy

The whole stack is forkable and $0. Duplicate the Space, deploy the frontend, point it at your own Qdrant + Groq keys.

🤗 Duplicate the Space ▲ Deploy frontend to Vercel