Commercial AI-behavior benchmark and service

Sovereign Bench

Live service · alpha API

Existing benchmarks measure intelligence. Sovereign Bench measures respect.

What It Is

Sovereign Bench is a behavioral benchmark for AI language models. Run any model — local, API, frontier, open-source — through Standard, Hard, or AGI prompt tiers and receive an Agency Score across ten axes in four domains: Operator Respect, Reasoning Integrity, Behavioral Stability, and Structural Honesty.

Why It Exists

Models get smarter every quarter and simultaneously less useful — safety tuning that prioritizes provider liability over operator agency. Sovereign Bench tracks that regression systematically, model by model, version by version.

What Exists Today

  • Live at sovereign-bench.com with the $49 one-time Sovereign tier: batch API, priority scoring, permanent run storage, custom judge panels, webhooks, regression alerts
  • Ten axes across four domains; three difficulty tiers up to 74 prompts
  • Three open-source judge models on independent infrastructure — no frontier model judges itself, or its peers

Current Operational Boundary

Live service · alpha API

The API is in alpha and expanding. The judging panel is deliberately external so no lab — including ours — can capture it.

Position in the Ecosystem

  • The quantitative complement to the MABOS posture: competence over compliance, operator agency by design
  • Scoring-council pattern ported into the QuoteChecker 2.5 engine
  • Fleet web pattern: vanilla stack, per-request CSP nonces

Engage