A prompt drives one turn. A loop is a small control system that finds the work, does it, checks its own output, remembers what happened, and decides the next move — while you watch instead of type. This page teaches you to build one, with copy-paste assets at every step.
Both run on a schedule. The difference is the decision-maker inside. A cron job runs a fixed script. A loop runs an agent that reads the current state, chooses the next action, does it, checks the result, and decides what to do next.
For about two years the workflow was high-effort ping-pong: write a prompt, wait, skim the output, adjust, send the next one. Your attention was the engine — step away from the keyboard and the agent froze mid-turn. Loop engineering is the move where you stop being the engine. You write a small program — a shell loop, a hosted task, a few hundred lines of code — and that program is what talks to the model.
Inner loop — reason → act → observe → repeat. An agent that can only suggest isn't running a loop. One that can run code, read the result, fix it, and run again is.
Outer loop — the control system you design around it: the trigger that starts a run, the verifier that grades it, the memory that survives between runs, and the stop rules that decide when it quits.
| If the task is… | Do this |
|---|---|
| A one-off | Just prompt the model. A loop is overhead you don't need. |
| Repeating + clear pass/fail | Build a loop. This is the sweet spot — verification is possible and the payoff compounds. |
| Vague, no crisp "done" | Don't hand it to a while-loop. Define the goal first, then decide. |
Every loop that survives production solves the same responsibilities. Tools name them differently, but the parts line up. Two of these — verification and memory — are what separate a loop you can trust from one that just makes mistakes faster.
Something starts a run that isn't you typing — a schedule, a webhook, an event. Must come bundled with a stop condition.
Give each agent its own checkout so parallel runs don't overwrite each other. Skip it if you only ever run one agent.
A markdown file holding the context you keep re-explaining — structure, rules, formats. Loaded on demand, written once.
Tools the agent can touch — APIs, databases, inbox, task board, MCP servers. A loop locked to local files is a glorified script.
The agent that does the work must not be the one grading it. A separate reviewer — often on a leaner model — checks it against the rules.
On-disk state that survives runs, crashes, and context resets. Read at the start of every run, appended at the end.
Five copy-paste assets, in order. Fill the [brackets], run the charter once while you watch, then wire the pieces around it. Every block below has a working Copy button.
This is the prompt that turns a one-off into a repeatable, self-checking unit of work. It is deliberately strict about what "done" means and how many items to touch per run.
# LOOP CHARTER ROLE: You run one pass of a maintenance loop. You are not chatting. FIND: Look at [source: failing tests / inbox / board column]. Pick at most [N = 1] item(s) to work on this run. If there is nothing to do, output "NOTHING_TO_DO" and stop. DO: Make the smallest change that resolves the item. Touch only files related to it. CHECK: The item is done ONLY when [tests in path/ pass AND lint is clean]. Run the check. Do not claim success without running it. RECORD: Append one line to [STATE.md]: what you did + the check result. If you hit the same failure twice, write the lesson to [RULES.md]. STOP: After [N] item(s), or if the check fails twice on the same item.
A goal-conditioned loop runs across turns until a condition is true. The key: a separate, faster model grades the condition after each turn — the writer doesn't get to mark its own homework. Write the condition so a grader can verify it from the run output, not from a vibe.
/goal all tests under test/auth pass and `npm run lint` exits 0 and STATE.md has a new line for every item processed
Two agent definitions. The builder proposes a change; the reviewer independently grades it on a leaner model with its own rubric. This single split is what lets you trust a loop running unattended.
--- name: reviewer model: [a fast, cheaper model] description: Independently grades the builder's change. Never edits code. --- You are the checker, not the author. Grade the change against: 1. Does it pass the stated CHECK condition? Run it yourself. 2. Does it touch only files relevant to the item? 3. Does it violate any rule in RULES.md? Output exactly one of: PASS — with the check output that proves it FAIL — with the single reason and the rule or test it broke
Boring on purpose. A flat file the loop reads at the start and appends at the end. This is how a run survives a crash, a context reset, or a 3 a.m. process kill.
# LOOP STATE — append-only log ## Done - 2026-06-27 14:02 fixed test/auth/login.spec → PASS (12/12) - 2026-06-27 14:09 refactored token refresh → PASS (lint clean) ## Open - test/auth/session.spec → FAILED twice, escalated to human ## Lessons (mirror critical ones into RULES.md) - Never edit generated/ — it is overwritten on build.
The canonical blueprint people cite: morning triage → merged PR. Design it once, never prompt it again. Note the budget cap — it is independent of the goal and non-negotiable.
while not goal_met: state = read_memory("STATE.md") # survives restarts work = find_next_item(state) # inbox · board · failing tests if work is None: break branch = new_worktree() # isolate this unit of work result = builder(work, skills, connectors) verdict = reviewer(result, rubric, tests) # different agent · leaner model if verdict.passed: commit(branch); open_pr(branch) append_memory(state, done=work) else: append_memory(state, lesson=verdict.reason) # write the lesson back if spend_so_far > budget_cap: # the line that saves you money halt("budget cap hit")
A single unattended pass. The builder makes a change, the reviewer grades it independently, state gets appended, and the budget guard watches the whole time.
An unattended loop without these guards isn't leverage — it's unattended mistakes. Each failure has a one-line fix.
Don't build the full system on day one. Earn each layer of autonomy before you add the next.
Pick one annoying, repeatable, pass/fail task you do by hand. Paste the charter, fill the brackets, run it once while you watch.
Give it an objectively checkable stop condition. Confirm a grader can verify "done" from the output. Run it once, watching.
Add the builder + reviewer sub-agents and a memory file. Now the loop checks its own work and remembers across runs.
Add worktrees so parallel runs don't collide, and connectors so "find the work" can reach your real inbox, board, or repo.
Put it on a schedule, set the hard budget cap, and let it run unattended — spot-checking the traces daily, not babysitting every turn.