Close the loop
Everyone is talking about building loops, operating a level up, running factories. But what does it all means and how do we put the concepts into practice? This is the third piece I’ve written in this arc, after the harness (the body around the model’s brain) and the factory (how an org runs a fleet of agents it never typed for). This one is about the loop itself: where it came from, and how you actually put one to work.
Where the loop came from
When ChatGPT launched it had no loop at all. Request, response, done. You asked, it answered, and if the answer was wrong you asked again. You were the loop.
The first real loop arrived with reasoning models. Instead of answering in one shot, the model was allowed to think, call a tool, look at the result, think again and running until it decided it had a response. Give that loop the ability to act, edit files, run commands, open PRs, and you get agents like Claude Code. That’s still the first loop. The halt condition is the model’s own sense that its reasoning is done.
The moment agents got useful we started handing them goals instead of steps. Not “write this function,” but “make this test pass,” “ship this feature.” That’s the second loop, and it’s the one we’re in now: the thing runs until the goal is met, not until the reasoning effort is spent. The halt condition moved up a level, from “I’m done thinking” to “the objective is achieved.”
It’s not hard to see where this goes. Each loop swallows the one below it and takes a more abstract goal. It looks a lot like climbing a company hierarchy: at the bottom someone is told exactly what to do, a level up they’re given a task and left to work out the steps, a level up from that they own an outcome and decide the tasks themselves. This is what the conversation about loops and factories is all about. Now let’s see how we can put it in practice.
Operate one level up
Here’s the whole method in one line: build the level of abstraction one step above where you’re working today, and do your work from there.
Make it concrete. Right now most engineering with AI happens at the keyboard. You, or your engineers, instruct a model to write code and correct it interactively through a CLI, prompt by prompt. That’s operating inside the first loop. The move up is to stop being the thing that prompts the agent.
You can call this way of working loopcraft, loop engineering but regardless of the name what it means is: replacing yourself as the person who prompts the agent. So instead of correcting output live, you build a coding agent that takes a brief and produces a PR, ideally correct in one shot or with minor feedback, and you stop iterating on the code. You iterate on the design of the agent. When a PR comes back wrong, you don’t fix the PR. You fix the thing that produced it, so the next hundred come back better.
This is different from a coding agent inside a CLI. You’re not sitting in front of it. It picks its work up from where the work already lives, a ticket in your project tracker, a message in your team channel and delivers a PR back. The best demonstration of this is the recently released Claude Tag: you @-mention it on an issue and it goes away, does the work, and opens the pull request. No terminal, no live prompting. The task comes in through the tools your team already uses and the result comes back the same way.
Once you have that, you do the same move again, one level up. Now the interesting question isn’t how the code gets written, that’s handled. It’s where the tickets come from. And a ticket is usually a by-product: something said in a meeting, a request buried in an email, a bug reported in a channel, an issue in the repo. So you build the layer above the coding agent, a system that reads those sources and drafts the tickets, which you review and feed to the agent that solves them. Now you’re not writing tickets either. You’re steering a pipeline that turns raw signal into shipped code, and can interact with the sustem in different levels of abstractions: the ticket created, the PR produced etc
As far up as it goes
You can keep climbing. Every function of a business is, in principle, some loop of signal in and work out, and each one can get an agent that does the work a level below where you sit while you move up to steer it. Taken to its end, you run the whole organisation this way: agents wired into every function, doing the work the business theoretically needs done, and you intervening to correct the system rather than to do the work yourself. This is the loop-and-factory idea taken to its extreme, not one agent in a loop, but the org itself as a stack of loops you supervise.
I’m not claiming that end state is reachable for a whole business, or that it should be. Plenty of the work won’t fit the transformation, e.g. the judgment is the job and can’t be handed down a level. The exercise is still valuable. Find the bottleneck, push the agent one level of abstraction up from where it sits today, and watch how many loops you can close in your organisation
Pay attention: GLM-5.2 — The new open model leader and once that many claim is good enough to replace your preferred closed ai model.
Pay attention: Loop engineering emerges — The shift from one-shot runs to standing loops that keep agents working just got a name, used publicly by Boris Cherny (Claude Code) and Peter Steinberger (OpenClaw).
Pay attention: Claude Tag: persistent agents in Slack — The async-agent form factor is crossing out of the terminal and into the surface teams already work in. Worth watching as the first mainstream place non-engineers meet a standing agent.
Pay attention: Claude Fable 5 / Mythos 5 — The first generally-available Mythos-class model, SOTA on nearly every benchmark (AA Index #1 at 64.9, SWE-Bench Pro 80.3%) and a leap ahead of competition.
Skip: OpenAI GPT-5.6 (Sol/Terra/Luna) — An expected frontier bump that matches rather than moves the story, hence skip.
Quick takes
Anthropic takes the lead with Fable - Fable 5 broke a year-long pattern. Instead of trading a couple of benchmark points, it jumped ~10 points above the trend line on SWE-Bench Pro, and led on revenue, distribution and for the first time raw capability at once.
Two monarchies, open vs closed - Closed AI is a US monopoly (Anthropic); open AI is effectively a Chinese one (DeepSeek, Kimi, GLM, MiMo), with the open crown rotating every few weeks. GLM-5.2 now leads the open pack, ~5 index points behind the frontier and for most real coding work, one generation behind already buys most of the value at a fraction of the cost, on weights you host.
We might be measuring intelligence wrong - Benchmarks reward pass@k, right at least once in k tries. But agents act once, so what matters is pass∧k, right every time. Capability has raced ahead; reliability is climbing behind it which is why the impressive monthly numbers don’t always translate to impressive agentic performance.


