Commercial AI-behavior benchmark and service
Sovereign Bench
Live service · alpha API
Existing benchmarks measure intelligence. Sovereign Bench measures respect.
What It Is
Sovereign Bench is a behavioral benchmark for AI language models. Run any model — local, API, frontier, open-source — through Standard, Hard, or AGI prompt tiers and receive an Agency Score across ten axes in four domains: Operator Respect, Reasoning Integrity, Behavioral Stability, and Structural Honesty.
Why It Exists
Models get smarter every quarter and simultaneously less useful — safety tuning that prioritizes provider liability over operator agency. Sovereign Bench tracks that regression systematically, model by model, version by version.
What Exists Today
- Live at sovereign-bench.com with the $49 one-time Sovereign tier: batch API, priority scoring, permanent run storage, custom judge panels, webhooks, regression alerts
- Ten axes across four domains; three difficulty tiers up to 74 prompts
- Three open-source judge models on independent infrastructure — no frontier model judges itself, or its peers
Current Operational Boundary
Live service · alpha API
The API is in alpha and expanding. The judging panel is deliberately external so no lab — including ours — can capture it.
Position in the Ecosystem
- The quantitative complement to the MABOS posture: competence over compliance, operator agency by design
- Scoring-council pattern ported into the QuoteChecker 2.5 engine
- Fleet web pattern: vanilla stack, per-request CSP nonces