How Cleric verifies its own answers without a human in the loop
Code has unit tests. Self-driving has miles per disengagement. Production investigation has no equivalent, which is why every AI investigation tool needs an engineer to grade its answers. Cleric reconstructs that signal from environment state after the fact, and feeds the verdict back into the model that produced it.
ScrollMost agents stop at the answer
When an event arrives, the agent reads the data it can see, proposes a fix, and ships it, and there it stops. It never finds out whether the fix held, whether the incident recurred, or whether an engineer overrode the call, so there is no feedback signal and nothing to learn from.
Cleric models how your team already solves problems
The Decision Model is built unsupervised from the traces your engineers and agents already leave behind: Slack threads, PRs, tool calls, corrections. It learns which alerts your team treats as noise, which services depend on what, and how a similar incident was resolved last time, so every investigation reasons against the specifics of your stack.
Cleric grades its own answers from the environment
After every investigation, a separate engine triangulates across alert recurrence, downstream stability, metric recovery, and engineer overrides. None of those signals is conclusive on its own, and combining them is what we built. The verdict becomes ground truth for that investigation, written back to the Decision Model.
Two things unlock once you can measure correctness
The agent takes on more work because it isn't bottlenecked on human review, and the system has a feedback signal it can train on: accuracy per problem type, updated every investigation. Those scores tell you where Cleric is reliable and where to keep an engineer in the loop.
Five components close the loop
- 01 Investigation engine On every event, forms hypotheses against your environment and proposes a root cause.
- 02 Decision Model What the investigation queries, built unsupervised from the traces your engineers and agents leave behind.
- 03 Verification engine Grades the answer from the environment, without a human in the loop.
- 04 Calibration engine Turns verified outcomes into better strategies through replay and self-play.
- 05 Discovery engine Maps your environment so the Decision Model reasons against your stack, not a generic one.
Investigation engine
On every event, the investigation engine forms hypotheses, queries whatever it needs from your production environment to test them, and proposes a root cause.
Its quality is determined by the Decision Model underneath it rather than by the reasoning model itself, which is why the same investigation on a different stack produces different answers: the priors it pulls are different.
Decision Model
The Decision Model holds what Cleric knows about your environment and what has worked in it.
It contains three structures: a world model of your services and their dependencies, strategies distilled from real investigations, and verified outcomes scored by problem type. None of it is configured by hand; Cleric learns each layer from what your engineers and agents are already doing.
Verification engine
After every investigation, the verification engine grades the answer from the environment.
It triangulates across alert recurrence, downstream stability, metric recovery, and engineer overrides. None of those signals is conclusive on its own, and combining them well is what we built: understanding their failure modes, weighting by context, and calibrating across environments. The verdict becomes ground truth for that investigation, and feeds calibration.
Calibration engine
Calibration converts verified outcomes into better strategies.
It replays history, runs self-play to test whether alternative paths would have caught a known issue, and distills patterns from runbooks, threads, and the agent's own decision traces. Strategies that consistently produce correct diagnoses get reinforced; the ones that don't are dropped.
Discovery engine
Discovery keeps the Decision Model accurate as your environment changes.
It maps services, dependencies, deploy patterns, observability conventions, and code ownership. As infrastructure changes, the world model re-maps, so investigations always reason against the current state of your stack rather than a snapshot.
See it run on your real incidents
Cleric publishes accuracy per problem type, so you know where to let it act and where to keep an engineer in the loop.