The framework applied to real problems
Real examples of Plan, Implement, Review applied to AI-delegated work. Each one shows a specific failure mode and how structured human review caught what implementation alone would have missed.
Security migration — landed in under an hour with proper planning
A multi-tenant auth migration scoped at one to two weeks. The plan absorbed the complexity. The review caught three issues implementation missed.
Security migration — landed in under an hour with proper planning
A multi-tenant auth migration scoped at one to two weeks. The plan absorbed the complexity. The review caught three issues implementation missed.
The problem.A multi-tenant application needed to replace its entire authentication system — the identity provider, the session model, the tenant resolution chain, and every route that touched user identity. The migration was security-sensitive: a mistake in tenant separation would expose one client's data to another. Under a conventional approach, this would be scoped at one to two weeks of engineering with manual QA.
What the operating model changed. The plan absorbed the complexity before implementation started. Every coupling between the old and new auth system was mapped in text. Non-negotiables were defined: tenant isolation could not degrade, session handling had to be verified against every entry point, and no route could be left on the old identity model. Acceptance criteria were written as tests before a single line of implementation began. Implementation compressed once the brief was stable — the core migration landed in a fraction of the conventional estimate, not because anyone typed faster, but because the specification removed the ambiguity that normally slows engineering down.
What changed.Verification was repeatable. Instead of a developer clicking through flows and remembering what worked, every critical path had an automated check. Failures were concrete and specific, not "I think this might be broken." The review checkpoint caught three issues that implementation alone missed — including a subtle tenant-scoping bug that would only have surfaced in production under a specific authentication flow.
System refactoring — AI agent re-scoped itself beyond the agreed brief
An AI execution agent drifted back to cancelled workstreams. Human review against the agreed plan caught it instantly — preserving three rounds of deliberate scope reduction.
System refactoring — AI agent re-scoped itself beyond the agreed brief
An AI execution agent drifted back to cancelled workstreams. Human review against the agreed plan caught it instantly — preserving three rounds of deliberate scope reduction.
The problem. A complex system refactoring had been scoped, reviewed, and deliberately simplified over three rounds of planning. The original scope called for migrating every consumer of a legacy data model to a new one. After careful review, most consumers were already working correctly — the existing display layer was the right interface, not technical debt. The plan was reduced from eleven workstreams to three, with the cancelled items explicitly documented.
What the operating model changed.The implementation was delegated to an AI execution agent against the simplified brief. Partway through, the agent drifted back to the original broader scope. It had identified code using the old model and pattern-matched "old model equals needs migration" — a reasonable inference in isolation, but one that directly contradicted three rounds of deliberate planning. Three cancelled workstreams were quietly re-queued as planned work. The review checkpoint caught it immediately. The reviewer checked the agent's work queue against the agreed plan and saw scope that had been explicitly removed. The correction was instant: these are cancelled, not planned. The display layer is working as designed.
What changed. Without the structured review, the agent would have spent hours migrating code that was already correct — introducing risk, burning time, and creating a false sense of progress. This is a pattern inherent to AI delegation. The agent optimises for completeness rather than the brief. It sees something that looks wrong and fixes it, regardless of whether the plan agreed it was wrong. Only a review checkpoint anchored to the original plan catches the drift before it compounds. Human review of AI execution is not optional — it is the control surface.
Performance fix — one bug became fourteen through structured review
An AI sub-agent scanned for siblings of a single performance issue. Found fourteen instances. Ranked them by business impact. The pattern definition became an automated CI gate.
Performance fix — one bug became fourteen through structured review
An AI sub-agent scanned for siblings of a single performance issue. Found fourteen instances. Ranked them by business impact. The pattern definition became an automated CI gate.
The problem. A performance audit identified a slow database query in one function. The conventional response would be to fix that function, verify the fix, and move on. The function was part of a repository layer containing dozens of similar functions, all written in the same period, all following the same conventions.
What the operating model changed.Instead of fixing the single instance, a fresh-context AI sub-agent was tasked with scanning the entire repository layer for the same anti-pattern. The sub-agent identified fourteen functions with uncached database calls that should have been wrapped in the application's caching layer. It ranked them by business impact: root-level fetches that blocked downstream queries were prioritised over leaf functions. It correctly excluded functions that looked similar but were genuinely different — single-record lookups that didn't benefit from caching, and functions where the call pattern made caching counterproductive. The human reviewer validated the ranked results, confirmed which to fix, and approved the sequencing. The sub-agent's pattern definition was then promoted into an automated check wired into the continuous integration pipeline.
What changed. A single bug became a systemic fix. Fourteen functions were corrected in priority order rather than discovered one at a time through future performance regressions. The structured scan also produced a reusable prevention mechanism — a clear pattern definition checked automatically on every future code change. Reactive discovery became proactive prevention.
Tenant security — a validation that had never actually worked
A security check on every route looked correct and passed code review. An AI audit found it was decorative — it had never rejected a single invalid request.
Tenant security — a validation that had never actually worked
A security check on every route looked correct and passed code review. An AI audit found it was decorative — it had never rejected a single invalid request.
The problem. A multi-tenant application had a security validation on every public-facing route. The validation checked whether an incoming request referenced a legitimate tenant before allowing access. The code had been in place since the route layer was built. It had passed code review. It looked correct. It had never once rejected an invalid request.
What the operating model changed. An AI agent was tasked with a structured audit of specific subsystems — tenancy perimeter, authentication boundaries, and route handlers. The agent performed static analysis against the actual execution path, not a surface-level code read. It discovered that the validation function was called without an await keyword. In the language used, this meant the function returned a Promise object rather than the resolved result. A Promise object is always truthy. The validation therefore passed every input, every time, regardless of whether the tenant existed. The check was decorative — present in the code, visible in review, and completely inert at runtime. The same audit pass flagged additional issues: a proxy that failed open on database errors, fifteen unscoped tenant-enrichment joins, and an admin route with no authentication check.
What changed.The fix was one word. The discovery required a review methodology that didn't trust appearance. A human reviewer reading the code would see a function call, see it used in a conditional, and move on — the pattern looks correct. Only a review that asks "does this actually reject bad input?" rather than "does this look like it would reject bad input?" catches the failure. Code review checks intent. Structured validation checks behaviour. The AI agent's value was systematic coverage — it checked every route handler against the same criteria, without fatigue. The human's value was defining what to check and deciding what the findings meant.
Want to apply this framework to your business?
Book a briefing, commission a due diligence, or start with a conversation about where Plan, Implement, Review creates the most value in your operations.