Stop Paying for Mediocre Code Reviews – Build Exceptional Ones Yourself

Oleksandr Kuzminskyi
November 22, 2025

Table of Contents

If you write or review infrastructure code-Terraform, AWS IaC, CI/CD pipelines, automation scripts - you’ve likely felt the pain points in this story. Maybe you’ve tried commercial AI review tools and found them shallow. Maybe your team struggles with inconsistent reviews. Or maybe you’re scaling quickly and need a way to enforce standards without slowing development down.

This article is for engineers, DevOps teams, and technical leaders who want deeper, context-aware code reviews - without adding another paid SaaS to their stack.

I never planned to build an AI code reviewer.

Like most infrastructure engineers, I had my workflow dialed in: write code, test thoroughly, investigate bugs, reproduce issues in tests before fixing them. It worked. I wasn’t looking for AI assistance-I was getting things done.

Then I stumbled upon a GitHub app that promised automated code reviews. My reaction was neutral skepticism. It had opinions-sometimes wrong, sometimes useful. But every so often, it surfaced something genuinely new: a security issue I’d overlooked, a more efficient approach I hadn’t considered. Small wins that slowly eroded my skepticism.

And then, without warning, the company behind the service shut down. Poof. Gone.

We’ve all been there: a tool you rely on suddenly disappears. It’s frustrating-but it made me realize how much I’d grown to appreciate having that second set of eyes on my code.

The Great Disappointment

Friends and colleagues suggested alternatives. “Try X.” “Y is supposed to be the market leader.”

So I did. Premium service. Glowing reviews. ~$20 per seat per month. I expected something great.

Here’s what the review looked like:

Review Summary:
* Added four `aws_caller_identity` data sources: production, sandbox, staging, qa, each bound to provider aliases.
* Removed `data.aws_iam_roles.infrahouse-registration` and removed duplicate `data.aws_caller_identity.production`
* Replaced hard-coded ARNs with dynamic ARNs using data.aws_caller_identity

This isn’t insight. It’s narration-information already visible in the git diff. No evaluation of whether the change is good or bad. No security implications. No best-practice recommendations. Just description.

I started calling it the “grass is green” level of insight-accurate, but not useful.

And yes, it literally ended one review with:

Here's a little poem for you
About the code you grew

I’m not exaggerating. This is what $20 per seat per month buys you: a service that narrates your git-diff and occasionally writes poetry.

Meanwhile, I was working on our terraform-aws-lambda-monitored module and decided to experiment. I created a Claude Code agent to review the module comprehensively.

The difference was night and day. Here’s an actual comparison:

Commercial Service ($20/seat/month):

"Consider using more descriptive variable names"
"Add validation to your inputs"

Our Claude Agent ($0.50/review):

Line 47: CRITICAL – IAM policy uses 's3:*' which violates least privilege.
Scope to specific actions: s3:GetObject, s3:PutObject on arn:aws:s3:::${var.bucket_name}/*

Line 122: Lambda timeout of 900s exceeds API Gateway limit of 30s.
This will cause 504 errors. Either reduce timeout or implement an async pattern with Step Functions.

And here’s the real kicker-it caught issues I actually missed. In one PR, the agent flagged:

## Before Merge Checklist

✅ Code Review Complete – This review
⚠️ Verify Production Exclusion – Confirm production doesn't need this access
✅ Terraform Plan Review – Ensure plan shows expected changes

That warning made me pause. I looked again-and indeed, the PR granted a role access to secrets, but I’d forgotten to include the production environment. Without that catch, we would’ve deployed to production with missing permissions. The agent didn’t just review code-it understood the implications across environments.

The Problem That Changed Everything

As good as it was, my agent had one fatal flaw: it reviewed everything, every single time.

Imagine fixing a one-line typo in a README and waiting 10 minutes while the agent re-analyzes 2,000 lines of Terraform, reminding you of the same IAM issue it flagged last week.

Three problems emerged:

Cost: $3–$5 per review, even when 99% of the code hadn’t changed
Speed: 5–10 minutes reviewing untouched legacy code
Signal-to-noise: Repetitive feedback on code you didn’t modify

The solution was obvious-commercial tools already do this:

Remember previous findings
Focus only on what changed
Track which issues were fixed

So I built the same capability into my Claude agent:

First review: “You have 3 security issues in this PR” After fixes: “✅ 2 issues fixed • ⚠️ 1 still present • 🆕 1 new issue found”

Developers immediately saw progress, not repetition. They knew exactly what was fixed and what still needed attention.

Building the Solution: Technical Deep Dive

Architecture Overview

We built the solution around three principles:

GitHub Actions for execution – No new SaaS subscription
PR comments as storage – Review history lives with the code
Pay-per-use LLM calls – You control when to spend

The architecture looks like this:

Pull Request Created/Updated
    ↓
GitHub Actions Triggered
    ↓
Fetch Previous Review ←-- PR Comment
    ↓
Generate Diff of Changes
    ↓
Claude Code Reviews Diff
    ↓
Compare with Previous Review
    ↓
Post/Update PR Comment --→ Single Source of Truth

Incremental Review Flow

# First Review
npx @anthropic-ai/claude-code --print \
  "Review the changes in pr-changes.diff"

# Follow-up Review
npx @anthropic-ai/claude-code --print \
  "Compare previous review with current changes.
   Mark issues as:
   - '✅ FIXED'
   - '⚠️ STILL PRESENT'
   - '🆕 NEW'"

Key Technical Decisions

1. Why PR comments instead of artifacts? Artifacts aren’t accessible across runs-but PR comments are persistent, visible, and exactly where developers expect reviews to be.

2. Concurrency control Stops multiple review runs from overlapping:

concurrency:
  group: terraform-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

3. Progressive prompt engineering The follow-up prompt compares the previous review to the new one, marking issues as fixed, still present, or new, and generating a progress summary developers love.

The Custom Agent: Teaching Claude Your Standards

The real power comes from the custom agent. Our terraform-module-reviewer.md encodes our standards:

Provider v5 and v6 compatibility
InfraHouse naming conventions
Required encryption for CloudWatch and SNS
Lambda timeout vs API Gateway constraints
Testing requirements across runtimes and architectures

Your organization can encode its own rules:

# Your Company's Terraform Standards

CRITICAL REQUIREMENTS:
- All S3 buckets must have encryption
- IAM policies must follow least privilege
- Must support both us-east-1 and eu-west-1
- All resources must include cost allocation tags

The agent becomes institutional knowledge, applied consistently.

Real-World Results

Example progression from a real PR:

Review #1:

12 issues found:
- 3 CRITICAL
- 5 HIGH
- 4 MEDIUM

Review #2:

Progress: 8 fixed • 3 still present • 1 new

⚠️ STILL PRESENT: Missing variable validation for lambda_timeout  
🆕 NEW: Typo in variable description (line 45)

Review #3:

Progress: 11 fixed • 1 still present

Only lambda_timeout validation remains.

Seeing a PR move toward “all green” is incredibly motivating.

Cost Reality Check

Let’s talk money-and freedom.

Traditional services:

$20/seat/month
$200/month for a 10-person team
$2,400/year for lukewarm reviews

Our approach:

$0.10–$3.00 per PR
~100 PRs/month ≈ $100
~50% cheaper
Far higher quality

And consider the real cost: senior engineering time. A thorough manual review takes 30–60 minutes ($50–$200 in time). Our AI review takes seconds and costs a couple dollars.

The economics speak for themselves.

Lessons Learned

AI augments-it doesn’t replace. Let the agent catch the obvious; humans focus on architecture.
False positives happen. It’s a reviewer, not an enforcer.
Document your standards. Future you will be grateful.
Start simple. Iterate. The first version reviewed everything. Then incremental reviews. Then concurrency. Then timeout handling.
Progress tracking changes everything. “Fixed”, “Still Present”, and “New” turn reviews from criticism into coaching.

Beyond Terraform: What’s Next?

This approach works far beyond Terraform:

Python/Node.js code reviews
Puppet/Ansible playbook validation
Compliance checks (HIPAA, PCI-DSS, SOC2)
Dockerfile security and optimization

We’re exploring integrations with Trivy, OWASP tools, and technical-debt scoring systems.

AI won’t replace careful engineering-but it can eliminate the repetitive parts. What remains becomes more thoughtful, more consistent, and far more secure.

Call to Action: Try It, Use It, Hire Us

Try It Yourself

Copy our workflow file
Copy our agent template
Add your ANTHROPIC_API_KEY to GitHub secrets
Customize the agent with your standards
Create a PR and watch it work

Full documentation at the GitHub repo.

Use InfraHouse Modules

Our modules are:

Production-tested
Well-documented. Some of them :)
Provider v5 and v6 compatible
Used by companies processing millions of requests

Explore them at the Terraform Registry.

Need Help? Hire InfraHouse

We can help with:

Custom Terraform module development
CI/CD modernization (including AI-powered reviews)
Infrastructure audits
24/7 infrastructure support

We built this because we needed it. We understand the pain because we’ve lived it.

P.S. Yes, this post was reviewed by our AI agent before publishing. It found three typos and suggested two clarity improvements. Time: 47 seconds. Cost: $0.32. Time saved: priceless.