Stop Paying for Mediocre Code Reviews – Build Exceptional Ones Yourself
- Oleksandr Kuzminskyi
- November 22, 2025
Table of Contents
If you write or review infrastructure code-Terraform, AWS IaC, CI/CD pipelines, automation scripts - you’ve likely felt the pain points in this story. Maybe you’ve tried commercial AI review tools and found them shallow. Maybe your team struggles with inconsistent reviews. Or maybe you’re scaling quickly and need a way to enforce standards without slowing development down.
This article is for engineers, DevOps teams, and technical leaders who want deeper, context-aware code reviews - without adding another paid SaaS to their stack.
I never planned to build an AI code reviewer.
Like most infrastructure engineers, I had my workflow dialed in: write code, test thoroughly, investigate bugs, reproduce issues in tests before fixing them. It worked. I wasn’t looking for AI assistance-I was getting things done.
Then I stumbled upon a GitHub app that promised automated code reviews. My reaction was neutral skepticism. It had opinions-sometimes wrong, sometimes useful. But every so often, it surfaced something genuinely new: a security issue I’d overlooked, a more efficient approach I hadn’t considered. Small wins that slowly eroded my skepticism.
And then, without warning, the company behind the service shut down. Poof. Gone.
We’ve all been there: a tool you rely on suddenly disappears. It’s frustrating-but it made me realize how much I’d grown to appreciate having that second set of eyes on my code.
The Great Disappointment
Friends and colleagues suggested alternatives. “Try X.” “Y is supposed to be the market leader.”
So I did. Premium service. Glowing reviews. ~$20 per seat per month. I expected something great.
Here’s what the review looked like:
Review Summary:
* Added four `aws_caller_identity` data sources: production, sandbox, staging, qa, each bound to provider aliases.
* Removed `data.aws_iam_roles.infrahouse-registration` and removed duplicate `data.aws_caller_identity.production`
* Replaced hard-coded ARNs with dynamic ARNs using data.aws_caller_identity
This isn’t insight. It’s narration-information already visible in the git diff. No evaluation of whether the change is good or bad. No security implications. No best-practice recommendations. Just description.
I started calling it the “grass is green” level of insight-accurate, but not useful.
And yes, it literally ended one review with:
Here's a little poem for you
About the code you grew
I’m not exaggerating. This is what $20 per seat per month buys you: a service that narrates your git-diff and occasionally writes poetry.
Meanwhile, I was working on our terraform-aws-lambda-monitored
module and decided to experiment.
I created a Claude Code agent
to review the module comprehensively.
The difference was night and day. Here’s an actual comparison:
Commercial Service ($20/seat/month):
"Consider using more descriptive variable names"
"Add validation to your inputs"
Our Claude Agent ($0.50/review):
Line 47: CRITICAL – IAM policy uses 's3:*' which violates least privilege.
Scope to specific actions: s3:GetObject, s3:PutObject on arn:aws:s3:::${var.bucket_name}/*
Line 122: Lambda timeout of 900s exceeds API Gateway limit of 30s.
This will cause 504 errors. Either reduce timeout or implement an async pattern with Step Functions.
And here’s the real kicker-it caught issues I actually missed. In one PR, the agent flagged:
## Before Merge Checklist
✅ Code Review Complete – This review
⚠️ Verify Production Exclusion – Confirm production doesn't need this access
✅ Terraform Plan Review – Ensure plan shows expected changes
That warning made me pause. I looked again-and indeed, the PR granted a role access to secrets, but I’d forgotten to include the production environment. Without that catch, we would’ve deployed to production with missing permissions. The agent didn’t just review code-it understood the implications across environments.
The Problem That Changed Everything
As good as it was, my agent had one fatal flaw: it reviewed everything, every single time.
Imagine fixing a one-line typo in a README and waiting 10 minutes while the agent re-analyzes 2,000 lines of Terraform, reminding you of the same IAM issue it flagged last week.
Three problems emerged:
- Cost: $3–$5 per review, even when 99% of the code hadn’t changed
- Speed: 5–10 minutes reviewing untouched legacy code
- Signal-to-noise: Repetitive feedback on code you didn’t modify
The solution was obvious-commercial tools already do this:
- Remember previous findings
- Focus only on what changed
- Track which issues were fixed
So I built the same capability into my Claude agent:
First review: “You have 3 security issues in this PR” After fixes: “✅ 2 issues fixed • ⚠️ 1 still present • 🆕 1 new issue found”
Developers immediately saw progress, not repetition. They knew exactly what was fixed and what still needed attention.
Building the Solution: Technical Deep Dive
Architecture Overview
We built the solution around three principles:
- GitHub Actions for execution – No new SaaS subscription
- PR comments as storage – Review history lives with the code
- Pay-per-use LLM calls – You control when to spend
The architecture looks like this:
Pull Request Created/Updated
↓
GitHub Actions Triggered
↓
Fetch Previous Review ←-- PR Comment
↓
Generate Diff of Changes
↓
Claude Code Reviews Diff
↓
Compare with Previous Review
↓
Post/Update PR Comment --→ Single Source of Truth
Incremental Review Flow
# First Review
npx @anthropic-ai/claude-code --print \
"Review the changes in pr-changes.diff"
# Follow-up Review
npx @anthropic-ai/claude-code --print \
"Compare previous review with current changes.
Mark issues as:
- '✅ FIXED'
- '⚠️ STILL PRESENT'
- '🆕 NEW'"
Key Technical Decisions
1. Why PR comments instead of artifacts? Artifacts aren’t accessible across runs-but PR comments are persistent, visible, and exactly where developers expect reviews to be.
2. Concurrency control Stops multiple review runs from overlapping:
concurrency:
group: terraform-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
3. Progressive prompt engineering The follow-up prompt compares the previous review to the new one, marking issues as fixed, still present, or new, and generating a progress summary developers love.
The Custom Agent: Teaching Claude Your Standards
The real power comes from the custom agent. Our terraform-module-reviewer.md encodes our standards:
- Provider v5 and v6 compatibility
- InfraHouse naming conventions
- Required encryption for CloudWatch and SNS
- Lambda timeout vs API Gateway constraints
- Testing requirements across runtimes and architectures
Your organization can encode its own rules:
# Your Company's Terraform Standards
CRITICAL REQUIREMENTS:
- All S3 buckets must have encryption
- IAM policies must follow least privilege
- Must support both us-east-1 and eu-west-1
- All resources must include cost allocation tags
The agent becomes institutional knowledge, applied consistently.
Real-World Results
Example progression from a real PR:
Review #1:
12 issues found:
- 3 CRITICAL
- 5 HIGH
- 4 MEDIUM
Review #2:
Progress: 8 fixed • 3 still present • 1 new
⚠️ STILL PRESENT: Missing variable validation for lambda_timeout
🆕 NEW: Typo in variable description (line 45)
Review #3:
Progress: 11 fixed • 1 still present
Only lambda_timeout validation remains.
Seeing a PR move toward “all green” is incredibly motivating.
Cost Reality Check
Let’s talk money-and freedom.
Traditional services:
- $20/seat/month
- $200/month for a 10-person team
- $2,400/year for lukewarm reviews
Our approach:
- $0.10–$3.00 per PR
- ~100 PRs/month ≈ $100
- ~50% cheaper
- Far higher quality
And consider the real cost: senior engineering time. A thorough manual review takes 30–60 minutes ($50–$200 in time). Our AI review takes seconds and costs a couple dollars.
The economics speak for themselves.
Lessons Learned
- AI augments-it doesn’t replace. Let the agent catch the obvious; humans focus on architecture.
- False positives happen. It’s a reviewer, not an enforcer.
- Document your standards. Future you will be grateful.
- Start simple. Iterate. The first version reviewed everything. Then incremental reviews. Then concurrency. Then timeout handling.
- Progress tracking changes everything. “Fixed”, “Still Present”, and “New” turn reviews from criticism into coaching.
Beyond Terraform: What’s Next?
This approach works far beyond Terraform:
- Python/Node.js code reviews
- Puppet/Ansible playbook validation
- Compliance checks (HIPAA, PCI-DSS, SOC2)
- Dockerfile security and optimization
We’re exploring integrations with Trivy, OWASP tools, and technical-debt scoring systems.
AI won’t replace careful engineering-but it can eliminate the repetitive parts. What remains becomes more thoughtful, more consistent, and far more secure.
Call to Action: Try It, Use It, Hire Us
Try It Yourself
- Copy our workflow file
- Copy our agent template
- Add your ANTHROPIC_API_KEY to GitHub secrets
- Customize the agent with your standards
- Create a PR and watch it work
Full documentation at the GitHub repo.
Use InfraHouse Modules
Our modules are:
- Production-tested
- Well-documented. Some of them :)
- Provider v5 and v6 compatible
- Used by companies processing millions of requests
Explore them at the Terraform Registry.
Need Help? Hire InfraHouse
We can help with:
- Custom Terraform module development
- CI/CD modernization (including AI-powered reviews)
- Infrastructure audits
- 24/7 infrastructure support
We built this because we needed it. We understand the pain because we’ve lived it.
Contact us:
P.S. Yes, this post was reviewed by our AI agent before publishing. It found three typos and suggested two clarity improvements. Time: 47 seconds. Cost: $0.32. Time saved: priceless.