Stack status
Live health probes for the four moving parts of the SecureAgentRAG BYOK production stack. Polled every 30 s from your browser.
Probing…
Vercel Edge
This Next.js 16 frontend, served from the Vercel Edge network.
/
HF Space · /healthz
FastAPI backend liveness probe. Cold-start can take 30–60s on first hit.
https://LeomordKaly-secureagentrag-api.hf.space/healthz
HF Space · /readyz
Backend + Qdrant Cloud + Groq /models reachability (BYOK-aware).
https://LeomordKaly-secureagentrag-api.hf.space/readyz
Vercel Edge · /api/chat
Edge proxy that forwards chat requests to the HF Space.
/api/chat
Answer quality (nightly Ragas eval)
—
context precision
—
faithfulness
—
answer relevancy
The committed baseline from the labelled golden set — proof, not claims. Live demo activity since the Space last woke: 0 questions answered · 0 documents grounded.
This hosted demo vs self-hosted
| Feature | Hosted $0 demo | Self-hosted |
|---|---|---|
| HIGH-sensitivity routing | Cloud (Groq) — labelled with a sensitivity badge | Local Ollama only — never leaves the host |
| NLI faithfulness gate | Off (Groq RPM budget) | On — per-sentence entailment |
| LLM-as-judge grader / evaluator | Bypassed (cost) | On |
| Guardrails escalation | Regex only | Regex → LlamaGuard (S1–S14) |
| Owner-key throttle | 10 req/IP/hour (BYOK unlocks) | N/A — your own keys |
| Uploads + audit | Session-scoped, 24h purge, /tmp audit | Durable, your infra |
Everything off in the demo is a cost/latency choice on the free tier, not a missing capability — see BYOK_PRIVACY_TRADEOFFS.md.