The Engine Room: How We Use AI to Code, Build, and Ship Real Software

The gap between "we use AI" and actually using AI
In 2024, GitHub published research showing that developers using AI coding assistants completed tasks 55% faster than those working without them. That number gets quoted constantly in agency marketing. What almost nobody explains is the other side of that finding: the speed gain only held when developers actively reviewed and corrected AI output. When they accepted suggestions uncritically, error rates went up.
This is the gap between agencies that say "we use AI" and teams that have actually built a repeatable workflow around it. The tool is not the differentiator. The protocol is.
Here is exactly how AI fits into our development process — what it does well, where it fails, and the specific guardrails we enforce to keep it from quietly breaking things.
What AI is genuinely good at in a codebase
AI coding tools — we work primarily with Claude Code, Cursor, and GitHub Copilot depending on the task — excel at a specific category of work: high-volume, low-ambiguity generation. This includes:
- Boilerplate and scaffolding. Setting up a new API route, creating a component with standard props, writing repetitive CRUD operations. Tasks where the pattern is clear and the stakes of a small error are low.
- Test generation. Given a function signature and a few example inputs, AI can generate a thorough test suite faster than any developer. This is one of the highest-ROI applications we have found.
- Documentation and inline comments. Explaining what a complex function does, generating JSDoc, writing migration notes. Tasks that developers deprioritize under deadline pressure.
- First-draft implementations. Translating a well-scoped requirement into working code. The output usually needs revision, but having a concrete starting point cuts time significantly.
The common thread is that AI performs best when the problem is well-defined and the output is easy to verify. It performs worst when the problem is ambiguous, when it requires understanding the full history of a codebase, or when a subtle logical error could propagate silently.
Where AI reliably fails — and why agencies don't talk about it
AI coding tools have a specific failure mode that is easy to miss in demos but shows up constantly in production: they are confidently wrong. The output looks correct, passes a surface-level review, and only breaks under a specific condition you did not think to test for.
The most common failure patterns we have seen:
- Context collapse. AI tools have a limited context window. On a large codebase, they often generate code that is locally correct but globally inconsistent — it conflicts with a pattern established elsewhere that the model did not see.
- Hallucinated dependencies. AI will sometimes import a function that does not exist, reference an API that was deprecated, or invent a library method that was never real. These fail at runtime, not at write time.
- Refactor drift. When asked to fix a bug, AI frequently "improves" surrounding code it was not asked to touch. These unsolicited changes are a common source of regressions.
- Infinite debugging loops. When an AI-generated fix does not work, asking the AI to fix its own fix often produces an increasingly convoluted solution. Without a human to step back and rethink the approach, the codebase accumulates layers of compensating patches.
None of these are arguments against using AI. They are arguments for having a senior engineer in the loop at every step.
The Anchor Fix protocol
The single most useful practice we have developed for AI-assisted development is what we call the Anchor Fix. Before rolling out any architectural change across a codebase, we force the AI to implement the change on one isolated endpoint, screen, or module first.
We then run that single implementation through its full test suite, check the logs, and verify behavior manually. If it passes, we use that implementation as the anchor — the reference pattern — and apply it to the rest of the codebase. If it fails, we catch the failure at the smallest possible scope before it propagates.
This approach adds a small amount of time to each implementation cycle. It saves a large amount of time when something breaks, because the blast radius is contained and the root cause is obvious.
The three rules we enforce on every AI interaction
Beyond the Anchor Fix, we operate with three standing rules that apply to every AI coding session:
1. No blind refactoring
When debugging a specific issue, AI is explicitly instructed not to apply formatting improvements, rename variables, or restructure unrelated code. Every change in a debugging session must be traceable to the symptom being fixed. This keeps the diff clean and makes rollbacks predictable.
2. Map before you touch
Before the AI modifies any complex logic, it must produce a written list of every file and function it intends to change and why. If the proposed change list is vague or unexpectedly long, the prompt is rejected and rewritten. An AI that cannot explain what it is about to do is not ready to do it.
3. Restore points before risk
Before any risky architectural change, a restore point is committed to version control. If an AI-generated block causes a regression, we roll back to the restore point and approach the problem differently — we do not ask the AI to fix what the AI broke. That loop wastes hours and produces unstable code.
What this means for your project timeline
The practical effect of this workflow is that we can deliver a production-ready MVP in a fraction of the time a traditional agency would quote — without the instability that comes from treating AI as a shortcut rather than a tool.
The speed gain is real. GitHub's research and our own project data both support it. But the gain is only durable when a senior engineer owns the output, enforces the guardrails, and makes the judgment calls that AI cannot.
When you work with a team that has built a real AI workflow, the deliverable is not just faster code. It is faster code that holds up when your user base grows, when requirements change, and when the next developer opens the repo for the first time.
Questions to ask any agency that claims to use AI
If you are evaluating development partners, these questions will separate teams with a real workflow from teams using AI as a marketing claim:
- What specific tools do you use, and at which stages of development?
- How do you handle AI-generated regressions?
- Does a senior engineer review every AI-generated commit before it merges?
- What is your rollback procedure when an AI change breaks something in production?
- Can you show me a recent project where AI accelerated delivery and explain exactly how?
An agency with a real process will answer these without hesitation. An agency using AI as a buzzword will give you vague answers about "leveraging cutting-edge tools."