AI Loops: What the Best Engineers Are Actually Building…

Most people who use AI every day still use it the slowest way possible. Type a request. Wait. Fix it. Ask again. All by hand. Not because the faster way is hard — because nobody showed them what it actually looks like.

The faster way is called a loop. Right now it is the one concept the best AI engineers in the world care about. This post explains it properly — what loops are, how they work under the hood, when they are worth building, and how to run one yourself in Claude or ChatGPT today.

How a loop works

🔍

DISCOVER

→

📋

PLAN

→

⚙️

EXECUTE

→

✅

VERIFY

→

🔄

ITERATE

Not there yet? Feed the result back in and go again.

The ceiling most people never notice

Look closely at how most people use AI. Every step runs through you. You decide what to ask, you judge the answer, you decide what comes next. The AI never moves unless you push it — and the moment you stop, everything stops.

This works fine. But it has a ceiling. You are the engine. The AI is only the tool in your hand, and a tool does nothing on its own.

There is another way. Instead of walking the AI through every step yourself, you give it the goal once and let it run the steps itself. It plans, does the work, checks its own result, fixes what is weak, and repeats until the goal is met. You step out. The work keeps going.

"You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." — Peter Steinberger

Prompt vs Loop — the actual difference

❌ A Prompt

→ One instruction
→ One answer
→ Waits for you to decide what's next
→ Stops when you stop
→ You are the engine

✅ A Loop

→ One goal
→ Runs until it gets there
→ Plans, does, checks, and repeats
→ Stops on success or a hard limit
→ AI is the engine

The three parts that matter most

Click each one — these are where people get loops wrong.

✅ Verify — the heart of the loop

Without a real check on the result, you don't have a loop — you have the agent agreeing with itself on repeat. The check is what turns repetition into progress.

It can be:

A hard test — "does the code pass all tests?"
A measurable condition — "is the score above 8/10?"
A rubric the model scores against — "does it meet all 5 criteria?"

No gate = the agent grades its own homework. The model that did the work is a very generous marker of its own output.

🧠 State — what makes the loop learn

Each pass, the AI must remember what it already tried — or it repeats the same mistake forever. A real loop keeps a small record: what is done, what failed, what is next.

Without state, every iteration starts from zero. That's why a loop is more expensive than a single prompt — the context grows each pass, and it all gets re-read every time.

Think of it as the AI's working memory. The longer it runs, the heavier it gets — and the bigger the bill.

🚩 Stop condition — what keeps it sane

A loop with no exit runs until it succeeds, breaks, or drains your account. Every serious loop has two ways to stop:

Success — the verify check passes
Hard limit — "after 8 tries, stop and report what's left"

Skip this and you have built a machine that can run all night for nothing. Engineer Geoffrey Huntley calls this the "Ralph Wiggum loop" — the agent convinces itself it's done, exits on a half-finished job, and the loop keeps billing you in silence.

Do you actually need a loop?

Most articles sell you the loop before they tell you when it's a mistake. Check all four — if you miss one, keep it as a manual prompt.

Loop worthiness checklist

Repeats at least weekly. Less than that and the setup cost never pays itself back. Something can automatically reject bad output. A test, a type check, a build, a hard rule. If nothing can fail the work for you, the loop just spins. The agent can do the work end-to-end. Not hand half of it back to you mid-run. "Done" is objective, not a judgement call. If quality is a matter of taste, a human still wins.

How Claude handles loops natively

Claude Code is built around loops as a first-class concept. Here is what that looks like under the hood:

⚡

Automation

/loop re-runs on an interval. /goal keeps a session going until your condition is actually true. Push to cron or GitHub Actions for unattended runs.

📁

Skills

Save instructions once as a file the loop reads every time — rules, patterns, hard limits. The loop calls the skill by name. No copy-pasting.

🤖

Sub-agents

Separate the agent that does the work from the agent that checks it. The model that wrote the code grades itself too generously. A second agent catches what the first talked itself into.

🔌

Connectors

The difference between an agent that says "here is the fix" and one that opens the pull request, links the ticket, and pings the channel — by itself.

🛡️

Verifier

The test, type check, or build that automatically rejects bad work. Everything else is plumbing. This is the part that makes it real.

▶ Watch: Claude agents in action

Claude Code — building and running agentic loops in a real codebase

A real coding loop in Claude Code

Loops took off in software first because code is the easiest thing to verify. A test passes or it fails — the AI always knows whether it is finished. Here is the exact spec used in production:

LOOP SPEC — paste into Claude Code
GOAL:
Every test in /tests/auth passes, lint is clean, no type errors.

EACH ITERATION:
1. Run the test suite and read every failure
2. Pick the single highest-impact failure
3. Write the smallest change that fixes it
4. Re-run tests, lint, and type checker

VERIFY:
Green tests + zero lint warnings + zero type errors
STOP WHEN:
Verify passes, OR 8 iterations reached
ON STOP:
Summarise what changed and what still fails

Build your own loop right now — no tools needed

You do not need Claude Code or any special setup. You can run a self-checking loop inside any AI chat right now. The trick is giving the model all three loop parts at once: a goal, strict success criteria, and a protocol that forces it to check itself before it can stop.

SELF-CHECKING LOOP — paste into Claude or ChatGPT
You will work in a loop until the task meets the bar.

TASK:
[describe exactly what you want produced]

SUCCESS CRITERIA (be strict, no soft passes):
- [criterion 1]
- [criterion 2]
- [criterion 3]

LOOP PROTOCOL, repeat every turn:
1. PLAN   — state the single next step
2. DO     — produce or improve the work
3. VERIFY — score the result 1-10 on each criterion.
           Be brutally honest. List exactly what is still weak.
4. DECIDE — if every criterion is 8+, print "FINAL" and stop.
           Otherwise print "ITERATING" and go again,
           fixing the weakest point first.

RULES:
- Never call it done until every criterion is 8 or higher.
- Each pass must fix the weakest score from the last VERIFY.
- Do not ask me questions. Make a sensible assumption,
  note it, and keep going.

Begin. Run the loop until FINAL.

Watch what happens. The model drafts, grades its own work against your criteria, finds the weak spot, and rewrites — over and over — until it actually clears the bar instead of handing you the first thing that looked close. That is a loop. You just built one with a paragraph.

The cost nobody mentions

Loops run on tokens, and tokens cost money. The problem is not that each step costs something — it is how the cost compounds.

Every time the loop goes around, the agent re-reads its context: the goal, the previous results, what failed, what is next. That whole pile grows each pass.

ROUGH COST OF ONE LOOP
Single agent, one medium task~50,000 – 200,000 tokens
Context re-sent every iterationgrows each pass
Fleet of agents in parallelmultiply all of the above

The metric that actually matters — and almost nobody tracks — is cost per accepted change. Not tokens spent. Not loops run. If a loop gives you ten results and you throw out six, you are doing the review work it was meant to save. Below a 50% accept rate, it costs more than it gives back.

The order that actually works

If you do build a loop, the order matters more than the tools. Every team that ships loops that survive in production does it the same way:

Get ONE manual run reliable first

Prove it works by hand before you automate anything

Turn that into a Skill

Save the instructions once so the loop reads them every time

Wrap the Skill in a loop

Add the gate and the stop condition

THEN put it on a schedule

Scheduling something unreliable is how loops blow up while you sleep

Loop ideas for IT and endpoint engineers

These work today in Claude Code or any AI that can run scripts and call tools:

Intune

Compliance drift monitor

Every 6 hours, check for new non-compliant devices. If any appear, create a ticket, log the device ID, and summarise what failed.

PowerShell

Script hardener

Loop until the script passes PSScriptAnalyzer, has no suppressed warnings, and all error paths are handled. Max 6 iterations.

Azure AD / Entra

Stale account sweep

Weekly loop: find accounts inactive 90+ days, draft a disable list, wait for approval flag, then disable confirmed accounts and log them.

WHfB / Security

NGC key health loop

Run Detect-WHfB.ps1. If exit 1, run Remediate-WHfB.ps1, wait 5 minutes, re-run detect. If still failing after 3 passes, escalate to a ticket.

Documentation

Run-book writer

Loop until the run-book passes a 10-point technical accuracy rubric AND a readability check. Stops only when both clear 8/10.

GitHub Actions

Pipeline fixer

When a workflow fails, loop: read the log, identify the root cause, write the smallest fix, push to a branch, re-trigger the run. Stop at success or 5 tries.

The honest version

Loops are real, and most people do not need the heavy version yet. What everyone can use right now is the light version — the self-checking prompt template above. Copy it into Claude or ChatGPT, fill in your task and criteria, and watch what happens.

The heavy version — scheduled, multi-agent, connector-wired, running while you sleep — belongs to teams with the budget and guardrails to run it. If that is not you today, you are not missing out. Start by using what is already there for free. Only once you actually feel that it is not enough should you start thinking about what you truly need.

Start here: Copy the self-checking loop template above into Claude or ChatGPT. Give it a real task you do every week. Watch it iterate. That is your proof of concept — and it costs you nothing but 5 minutes.

AI Loops: What the Best Engineers Are Actually Building Right Now