Development Methodology & Culture
What Is Loop Engineering? From Writing Prompts to Designing Agent Loops¶
-- Autonomous agent work is only as good as its checks and stopping conditions
For / Key Points
For: Developers and technical leads using Claude Code or Codex who need to move beyond one-off prompting into autonomous, scheduled, or event-driven agent workflows.
Key Points:
- Loop engineering means designing the system that prompts, checks, remembers, and re-runs AI agents instead of typing every next prompt yourself.
- In June 2026, Peter Steinberger's X post sparked the discussion, and Addy Osmani framed the discipline as five building blocks plus memory.
- The hard part is not autonomy itself. It is verification, stopping conditions, and Human in the Loop escalation.
On June 8, 2026, a two-sentence X post from Peter Steinberger spread across the AI coding community. The message was simple: developers should stop merely prompting coding agents and start designing the loops that prompt those agents instead1.
This article answers one question. What does loop engineering actually design, and why do verification and stopping conditions matter more than the loop itself?
The Spark Was a Two-Sentence Post¶
The trigger was short, but the reaction was large. ExplainX reported that Steinberger's post reached 6.5 million views and dominated discussion around AI coding agents2. In the same context, Boris Cherny, who leads Claude Code, described no longer prompting Claude directly. Instead, he writes loops that prompt Claude and decide what to do next3.
Addy Osmani then organized this discussion under the name "Loop Engineering"4. His framing is direct: replace yourself as the person who prompts the agent, and design the system that does it instead. Japanese explainers and field reports appeared quickly after that, which helped the term settle into local AI development discourse56.
The point is not that a new buzzword appeared. The real shift is that the human's repeated "what next?" judgment is moving into a designed control system.
Definition: Replace the Human Prompter With a System¶
Loop engineering is a role shift. In the old pattern, a human gave an instruction, read the agent's answer, then typed the next instruction. The human was inside the loop. Loop engineering places a small system in that seat.
Osmani describes a loop as a recursive goal4. Once the purpose is defined, the AI iterates toward completion. It finds work, assigns it, checks the result, records what happened, and decides the next step. The system keeps poking the agent instead of relying on a human to do it every turn.
MAKE A CHANGE breaks the design into six elements6.
| Element | Design Question |
|---|---|
| Trigger | What starts the loop: a schedule, an event, or something else? |
| Context | What information does the agent receive? |
| Action | What is the agent allowed to do? |
| Verification | How is success checked? |
| Memory | Where are results and lessons recorded? |
| Escalation | When does the loop return control to a human? |
"Do something useful" is not a loop. A real loop includes success criteria and stopping conditions, because that is what makes repeated agent work governable.
The Lineage: Prompt, Context, Harness, Loop¶
Loop engineering did not appear from nowhere. Over the last few years, the design target has moved from a single prompt to the execution environment around the model.
Japanese coverage summarizes the progression this way5.
| Layer | Period | Design Target |
|---|---|---|
| Prompt engineering | Until around 2024 | The quality of one exchange |
| Context engineering | 2025 | The full token environment the model sees |
| Harness engineering | Early 2026 | The execution environment around one agent |
| Loop engineering | June 2026 onward | The system that repeatedly drives the harness |
Osmani also places loop engineering one floor above harness engineering4. If a harness is the control structure around one acting agent, a loop adds timing, helper agents, memory, and repeated self-feeding work. The layers do not replace each other. They stack.
The question therefore moves away from better wording. It becomes: when does the system start, what can it see, what can it change, and what proves it should stop?
It Is Not Just Scheduled Execution¶
Loop engineering includes scheduled execution. Running Claude Code or Codex every five minutes, every hour, or every morning is part of the pattern6. But schedule alone is not enough.
Triggers can be event-driven. A Slack message, Gmail arrival, GitHub pull request update, or Stripe payment can all start the loop6. Claude Code documents /loop, cron-style scheduled tasks, cloud routines, and desktop scheduled tasks as scheduling options7. Codex provides Automations that let users choose the project, prompt, cadence, and execution environment for a recurring task8.
The decisive difference is the stopping condition. Claude Code's /goal keeps a session working until a completion condition is met, with a small fast model evaluating that condition after each turn9. Codex has a same-named /goal primitive for long-running work toward a verifiable stopping condition10.
This is where trust is won or lost. A loop needs something that can say no: a test, type check, real error, review queue, or budget limit2. Without that pressure, the loop can become an agent agreeing with itself at high speed.
Five Building Blocks, Plus Memory¶
Osmani organizes the loop into five building blocks plus memory4. The notable change is that these pieces are moving from hand-rolled scripts into product features across Codex and Claude Code.
| Building Block | Role in the Loop | Codex | Claude Code |
|---|---|---|---|
| Automations | Scheduled discovery and triage | Automations, /goal | Scheduled tasks, /loop, /goal, hooks |
| Worktrees | Isolate parallel agent work | App worktrees | git worktree, --worktree, subagent isolation: worktree |
| Skills | Write down project knowledge | Agent Skills (SKILL.md) | Skills (SKILL.md) |
| Plugins / Connectors | Connect to external tools | MCP-based Connectors and Plugins | MCP servers and Plugins |
| Sub-agents | Separate makers from checkers | TOML definitions in .codex/agents/ | .claude/agents/, agent teams |
The sixth component is memory. It can be a Markdown file, a Linear board, or another external state store. What matters is that it survives outside one conversation and tracks what is done and what comes next. Models forget between runs, so durable loop state belongs on disk or in an external system4.
Sub-agents are especially important. Codex supports custom agents defined as TOML files under .codex/agents/11. Claude Code supports Markdown subagent definitions under .claude/agents/ and can isolate them in worktrees12. Both provide a way to split the maker from the checker.
The model that wrote the code is often too generous when grading it. A second agent, with different instructions, permissions, or even a different model, can catch failures the first one rationalized away. That separation matters most when the loop runs unattended.
What One Morning Loop Looks Like¶
When assembled, the pieces become a small work system. In Osmani's example, a morning automation reads repository state, CI failures, open issues, and recent commits4. It writes worthwhile findings to a state file or ticketing system, then creates a worktree for each item.
One sub-agent drafts a fix. Another sub-agent checks the draft against project skills and existing tests. Connectors open pull requests and update tickets. Anything uncertain stays in a triage inbox for a human.
The human did not type each next prompt. The human designed the loop once. At the heartbeat level, a morning triage trigger can look as simple as this:
0 7 * * * claude -p "$(cat triage-loop.md)" >> loop-state.log
That line is not the real system. The real system is the permission model, logs, checks, stopping conditions, and guardrails around it.
129 Successful Deletions and 43 Runaway Commits¶
Japanese field reports already show both sides of the pattern. MAKE A CHANGE runs several loops in production-like internal workflows, including Slack-to-Notion task capture, Notion task execution, and AI information gathering6.
One success case was clear. A loop that watched remote repositories and deleted unnecessary branches removed 129 stale branches automatically6. The scope was narrow and the deletion criteria were relatively easy to verify.
The failure case was equally useful. A pull request babysitter loop created 43 commits in one day, but its scope expanded until it changed unrelated areas and drifted away from the original PR purpose. Nearly all of the output was rejected6.
The lesson is compact.
- A poorly designed loop can mass-produce waste at high speed.
- Token use and cost can grow quickly.
- A loop without completion criteria, verification, and Human in the Loop escalation is not production-ready.
The more capable the loop, the larger the blast radius. Before asking whether it can run automatically, decide where it must stop.
What Loops Do Not Replace¶
Osmani names three problems that become sharper as loops improve4.
First, verification still belongs to humans. An unattended loop can also make unattended mistakes. Even with a separate verifier agent, "done" is a claim, not a proof.
Second, comprehension can decay. The faster code you did not write ships, the wider the gap becomes between what exists and what you understand. A smooth loop accelerates this debt unless someone reads and owns the output.
Third, there is cognitive surrender. Once loops run, it becomes tempting to stop having an opinion and accept whatever they return. The same loop can accelerate someone who understands the work deeply, or help someone avoid understanding it at all.
Loop design is therefore not easier than prompt engineering. The leverage point has moved from prompt wording to the design of verifiable work.
Enterprise Adoption Questions¶
In personal projects, starting with a small loop is often enough. In enterprise environments, unattended actors that commit code, open pull requests, and update tickets become part of change management.
Four questions matter early.
- Change management: Loop-generated changes need audit trails and approval flows, just like CI/CD pipeline actions.
- Permission separation: Loops should not reuse human credentials. Use service accounts, least privilege, and centralized logs.
- Documented agreement: Files such as
SKILL.mdandAGENTS.mdbecome team operating agreements because the loop reads them every run. - Staged rollout: Start with read-only loops, then grant write access only after checks can say no.
A loop without verification is not just inefficient in an enterprise setting. It is a source of unreviewed changes, budget surprises, and audit risk.
Summary: Place Responsibility Before the Loop¶
Loop engineering is a new design layer, as of June 2026, stacked on top of prompt, context, and harness engineering. The human role moves from typing instructions to designing the system that drives agents.
This is not only a productivity story. The responsibility to define verification, stopping conditions, and ownership gets heavier. Automations, Worktrees, Skills, Connectors, and Sub-agents are increasingly available in products.
The remaining constraint is operational discipline. Building a loop is becoming easy. The difficult part is deciding, before it runs, what responsibility the human will continue to hold.
Related Articles¶
Peter Steinberger's X post and ExplainX's quotation and explanation ↩
Yash Thakker, "Loop Engineering: How to Design Coding Agent Loops That Run While You Sleep (2026 Guide)", ExplainX ↩↩
WorkOS, "Key takeaways from Boris Cherny on building Claude Code" and Boris Cherny: Claude Code & the Future of Engineering ↩
AI-Driven Lab, "『プロンプトを打つ』のはもう古い?──ループエンジニアリングというAIエージェント設計の最前線" ↩↩
MAKE A CHANGE, inc., "Loop Engineering 概念と注意点を弊社実例を交えて解説" ↩↩↩↩↩↩↩
OpenAI Developers, "Best practices - Codex: Use automations for repeated work" ↩