← back

Stack status

Live health probes for the four moving parts of the SecureAgentRAG BYOK production stack. Polled every 30 s from your browser.

Probing…

Vercel Edge

This Next.js 16 frontend, served from the Vercel Edge network.

/

HF Space · /healthz

FastAPI backend liveness probe. Cold-start can take 30–60s on first hit.

https://LeomordKaly-secureagentrag-api.hf.space/healthz

HF Space · /readyz

Backend + Qdrant Cloud + Groq /models reachability (BYOK-aware).

https://LeomordKaly-secureagentrag-api.hf.space/readyz

Vercel Edge · /api/chat

Edge proxy that forwards chat requests to the HF Space.

/api/chat

Answer quality (nightly Ragas eval)

context precision

faithfulness

answer relevancy

The committed baseline from the labelled golden set — proof, not claims. Live demo activity since the Space last woke: 0 questions answered · 0 documents grounded.

This hosted demo vs self-hosted

FeatureHosted $0 demoSelf-hosted
HIGH-sensitivity routingCloud (Groq) — labelled with a sensitivity badgeLocal Ollama only — never leaves the host
NLI faithfulness gateOff (Groq RPM budget)On — per-sentence entailment
LLM-as-judge grader / evaluatorBypassed (cost)On
Guardrails escalationRegex onlyRegex → LlamaGuard (S1–S14)
Owner-key throttle10 req/IP/hour (BYOK unlocks)N/A — your own keys
Uploads + auditSession-scoped, 24h purge, /tmp auditDurable, your infra

Everything off in the demo is a cost/latency choice on the free tier, not a missing capability — see BYOK_PRIVACY_TRADEOFFS.md.