Ceridwen.ai

Benchmarking & Evaluation Engineer

Full-Time | Remote (US) | Equity + Salary Upon Revenue

Apply Now All Ceridwen.ai Roles All Careers

The Role

You are the immune system. You catch the failures before they reach users, before they reach the board, before they spend three months inventing plausible percentages. You build the frameworks that prove MABOS works — or prove it doesn’t, which is more valuable.

What You'll Do

Design and implement comprehensive evaluation frameworks for all MABOS subsystems
Build automated testing pipelines that catch hallucination, drift, and degradation
Develop domain-specific benchmarks for cognitive architecture performance
Create monitoring systems that detect failure modes in production
Publish evaluation methodology and results to maintain transparency

Requirements

MS or PhD in computer science, statistics, or adjacent field
5+ years building evaluation and testing frameworks for ML/AI systems
Statistical literacy. You know the difference between a metric that matters and a metric that flatters
Failure mode expertise. Identifying and characterizing failures in production AI systems
You have personally caught a critical failure that no one else noticed, and you can tell us about it

Compensation

All positions include equity in Ceridwen.ai. Salaries are TBD and will be determined based on role scope, experience, and what you bring to the table. We will not insult you with a lowball offer, and we expect you not to waste our time with inflated expectations disconnected from contribution.

The Builder Clause

We don't care where you went to school. We don't care if you went to school. Our founder is self-taught, started coding at 13, and built a 602,000-line cognitive architecture without a CS degree.

Meet the qualifications, or show us what you've built. Either path works. Both paths demand excellence.

Apply

Ready to move? Send a short note of relevant proof-of-work — past shipped projects, metrics you moved, or a draft of your first 30 days.

Apply Now Back to Ceridwen.ai Roles