Coding is lightning fast, everything else is slow
The SDLC used to look like this:
Now it looks like this:
The coding part feels like a highway since coding agents have access to cheap verification via linters, tests, CI, and the like. They require very little input from humans to realize they're wrong, adjust trajectory, and get to correct.
Mitchell Hashimoto calls this harness engineering: "give the agent fast, high quality tools to automatically tell it when it is wrong."
This is not true for the stretch between PR to production. Production is complicated and agents need a LOT more hand holding from engineers to be productive here. After what feels like uninhibited speed in the coding phase, this stretch feels like a dirt road.
It needs what coding already has: a harness that gives fast, automatic feedback so the work can move on its own. That is the missing piece for extending the highway all the way to production.
Cleric is that production harness. It carries each change as far as its confidence goes, and asks for your call on the rest.
What does a production harness look like?
A coding agent's harness verifies code against a spec. A production harness verifies a change against its intent. It needs to do so for every change, all the way from your IDE into production. Three properties of production make this a hard problem:
- Unbounded. For coding agents, understanding is scoped to the codebase. With production there is no limit. It could be the code, the infra, an external dependency, or your users. It differs every time.
- Dynamic. When you create a worktree of your repository to work on a feature, you're effectively giving the agent a static version of your code. That's not possible in production. It's changing every minute, if not every second. It's a bit like Schrödinger's cat: every time you peek at it, it's changed a little bit.
- Transient. Issues in code are reproducible. Finding how to reproduce one is the tough part, but it's possible. In production, an issue could be a complete one-off, or only happen under certain extreme conditions. A re-run or a replay isn't always possible.
The investigation portion of Cleric was the easy part. Cleric's production harness does a few things to make it easy to verify and improve a change going to production:
- Proposes theories on what could have caused the issue, before any evidence is collected, to keep the agent grounded on dynamic terrain.
- Learns in the background to ground the agent in the idiosyncrasies of your specific environment.
- Maintains a model of "normal": the payments service p95 response is 200ms, so 500ms is not normal.
- Cross-references multiple findings across multiple sources to raise confidence. The assertion "service A's errors are caused by a change in service B" yields a trace showing the service B call and timestamp, plus the actual code change in service B.
- Verifies post facto. Production feedback is slow. You have to follow a change for days, sometimes weeks, to know whether it worked, or whether something unintended crept in that needs addressing. Cleric performs this work automatically in the background.
Where this is going
Cleric started as an AI SRE: find an issue, fix it, verify the fix. We're now pointing the same harness at the whole SDLC, tracking every change as it moves from your IDE into production, making sure it did what you meant, and fixing it when it didn't.
We're opening early access to interested teams who are all-in on coding agents.
Let Cleric take your changes to production.
For teams all-in on coding agents. Leave your work email and we'll be in touch to set up access.
You're on the list.
We'll reach out to set up your access.
Trusted by AI-native engineering teams