Ceridwen.ai
DevOps / Site Reliability Engineer
Full-Time | Remote (US) | Equity + Salary Upon Revenue
The Role
You keep things alive. MABOS Cloud must maintain the same reliability standard as the local architecture — production-grade, government-grade, built to run unattended. When something breaks at 2am, you fix it. When something almost breaks at 2pm, you fix it before anyone notices.
What You'll Do
- Build and maintain CI/CD pipelines for MABOS across local and cloud deployments
- Design monitoring, alerting, and incident response systems
- Implement automated recovery and self-healing infrastructure
- Manage deployment rollouts across millions of local installations
- Define and enforce SLOs that reflect 100-year reliability targets
Requirements
- 5+ years in SRE or DevOps roles at scale
- Expert-level Kubernetes, Terraform, and infrastructure-as-code
- GPU workload orchestration experience
- You’ve written postmortems that actually prevented the next incident
- You understand that “it works on my machine” is not a deployment strategy
Compensation
All positions include equity in Ceridwen.ai. Salaries are TBD and will be determined based on role scope, experience, and what you bring to the table. We will not insult you with a lowball offer, and we expect you not to waste our time with inflated expectations disconnected from contribution.
The Builder Clause
We don't care where you went to school. We don't care if you went to school. Our founder is self-taught, started coding at 13, and built a 602,000-line cognitive architecture without a CS degree.
Meet the qualifications, or show us what you've built. Either path works. Both paths demand excellence.
Apply
Ready to move? Send a short note of relevant proof-of-work — past shipped projects, metrics you moved, or a draft of your first 30 days.