August 14, 2025

What's Good for AI Agents is Good for Engineers

Engineers and AI agents work best under the same conditions. They need clean codebases, fast feedback, and clear observability.
By
Shahram Anver

We found that the engineering practices that make our system legible to an AI agent are the same ones that reduce toil for our own developers.

This isn't a coincidence; it's a direct fix for the operational load that kills developer velocity. Our best engineers were constantly being pulled from feature work to put out fires, leading to endless context-switching and burnout.

We built a better, more sustainable environment for our team by enforcing the discipline an AI needs: clear interfaces, minimal abstractions, and fast feedback loops. It turns out, what's good for the agent is good for the engineer.

The Three Loops of Development

Every dev process runs through three concentric feedback loops:

  • Inner Loop: From your IDE to running code locally (seconds to minutes)
  • Outer Loop: Pushing code to CI, getting feedback, and fixing issues (minutes to hours)
  • Production Loop: Deploying to production, monitoring, and responding to real-world behavior (hours to days)

bigpic

The cost and complexity of fixing issues explode as you move outward. A syntax error in your IDE takes seconds to fix. A production incident might require hotfixes, rollbacks, and war rooms.

Until GitHub Copilot showed up in 2022, each loop was entirely human-driven. We wrote code character-by-character, spelunked through CI logs manually, and got paged at 3 AM to dig through dashboards. AI is now rewiring each of these loops.

The Foundation: Good Hygiene for Humans and Agents

An agent getting lost in a codebase that's seven abstractions deep is facing the same problem as a new hire. A cryptic alert is useless to both.

So we built our process around a few key principles:

  • Keep all context in one place. We use a monorepo where all of our code (UI, backend, agents, and scripts) lives. This makes it easier for humans to navigate and gives AI agents the complete context they need without having to fetch information from disparate sources.
  • Minimize cognitive overhead with a flat architecture. Each engineer owns a specific domain, which forces us to maintain clean boundaries. This results in what we call a “flat” codebase, where we intentionally keep the layers of indirection and abstraction low. An engineer shouldn't have to dig through a deep call stack to understand what a piece of code does; this makes the system easier for anyone to reason about, from a new hire to an AI agent.
  • Treat tests as the primary feedback loop. A change is only "good" if there's a clear, automated signal confirming it. For us, that signal is a comprehensive test suite that acts as a reward function. If the tests pass, the change is valid. We use CI as an enforcement mechanism to run these tests before merging, but the core principle is that the same suite can be run locally, giving developers a tight, immediate feedback loop on their own machine.
  • Make system behavior self-evident. Our environments are instrumented with the metrics, logs, and traces needed to make debugging obvious and fast. Instead of guessing, both engineers and agents can directly query the system to get a clear picture of its current state and past behavior.

None of this is revolutionary, and that’s the point. For years, we’ve treated these principles as ideals. AI agents treat them as requirements. It turns out the secret to building a great environment for AI is to first build a great environment for your engineers.

The Inner Loop: Getting Past the Boilerplate

In the inner loop, what some call “vibe coding” is becoming standard. And no, this isn't an excuse to turn off your brain and let the AI write spaghetti code. What gets merged is ultimately your responsibility. An AI still needs clear direction, and the best way to provide that is with a well-defined reward function: a solid suite of tests.

But building that test suite first requires you to do the actual engineering work: have a clear product vision, design how new code fits into the existing architecture, and then capture that expected behavior in your tests. Once that framework is in place, then you can prompt your way to passing them.

We use tools like Claude Code and Gemini extensively for this. Sure, the code gets written faster, but the real win is in reinvesting that time in up-front thinking, so that the final product is simpler and more maintainable. Our agents run tests and end-to-end checks as we code, creating the same rapid feedback that helps us stay in flow.

The Outer Loop: AI for Broken Builds

A CI build breaks. The developer stops what they're doing, sighs, and starts digging through logs. 

We have Cleric do that first pass automatically. Instead of just a red build, the developer gets a diagnosis:

This turns a 10-minute manual investigation into a 1-minute review. You immediately know why it failed and what to do next. That means fewer interruptions and faster fixes.

Code review is also a great use case in the outer loop. The hardest part of code review isn't catching bugs; it's enforcing architectural principles and reducing accidental complexity. To solve this, we created a simple vibes test in our CI.

Any engineer can add rules to a markdown file in our monorepo called code_requirements.md. It contains simple, plain-English guardrails like:

## Rules

- Do not add new docstrings to code that does not already have them

- The long term vision for the ResponseHandlers are for them to be event driven. Don't add methods that assume synchronous execution.

Our "vibes test" compares every code diff against these requirements, giving engineers a way to check for architectural alignment on demand. Anyone can run it locally while coding to get immediate feedback, ensuring their changes are on the right track long before a pull request is ever opened. It’s a simple way to codify our conventions and keep the codebase consistent, making it easier for both humans and agents to build in the right direction.

The Production Loop: Taming Production Alerts

The production loop is the most stressful. Production alerts, especially false alarms, are a constant source of anxiety. So instead of our on-call engineer dropping everything to investigate a scary-looking memory alert, we have Cleric do the first analysis.

In one recent case, we got a memory alert from a customer’s instance. Cleric determined it was normal post-deployment behavior and confirmed the system was stable. Our engineer scanned the report in 30 seconds, confirmed it was a false alarm, and got back to coding.

For real incidents, the AI doesn’t just alert - it investigates. Instead of a cryptic error, engineers get a hypothesis of what could have gone wrong backed by evidence.

The fix stays in the engineer’s hands, but the detective work disappears.

So What's the Point? Smaller, Better Teams.

The old ideal of staying small and lean is more achievable than ever. Using AI as a force multiplier lets us build a team with a high signal-to-noise ratio. This lets us operate as a focused group of domain experts who can dedicate their time to high-value problems, not operational noise.

Platform teams can focus on architecture, not alert triage. Application engineers can focus on complex business logic and innovation. Both are supported by an AI assistant that handles the rote investigation.

The future of engineering is a human-agent collaboration that closes the loop between idea, code, and impact. It gives engineers what they crave most: uninterrupted time to actually build.

Ready to give your on-call a headstart?

Start for free, or talk to us about a plan built for your team’s scale and security needs.
Book a Demo