Ceridwen.ai

DevOps / Site Reliability Engineer

Full-Time | Remote (US) | Equity + Salary Upon Revenue

Apply Now All Ceridwen.ai Roles All Careers

The Role

You keep things alive. MABOS Cloud must maintain the same reliability standard as the local architecture — production-grade, government-grade, built to run unattended. When something breaks at 2am, you fix it. When something almost breaks at 2pm, you fix it before anyone notices.

What You'll Do

Build and maintain CI/CD pipelines for MABOS across local and cloud deployments
Design monitoring, alerting, and incident response systems
Implement automated recovery and self-healing infrastructure
Manage deployment rollouts across millions of local installations
Define and enforce SLOs that reflect 100-year reliability targets

Requirements

5+ years in SRE or DevOps roles at scale
Expert-level Kubernetes, Terraform, and infrastructure-as-code
GPU workload orchestration experience
You’ve written postmortems that actually prevented the next incident
You understand that “it works on my machine” is not a deployment strategy

Compensation

All positions include equity in Ceridwen.ai. Salaries are TBD and will be determined based on role scope, experience, and what you bring to the table. We will not insult you with a lowball offer, and we expect you not to waste our time with inflated expectations disconnected from contribution.

The Builder Clause

We don't care where you went to school. We don't care if you went to school. Our founder is self-taught, started coding at 13, and built a 602,000-line cognitive architecture without a CS degree.

Meet the qualifications, or show us what you've built. Either path works. Both paths demand excellence.

Apply

Ready to move? Send a short note of relevant proof-of-work — past shipped projects, metrics you moved, or a draft of your first 30 days.

Apply Now Back to Ceridwen.ai Roles