<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Nick Sorros]]></title><description><![CDATA[Practical AI for tech leaders. Filter the hype, ship what works]]></description><link>https://newsletter.nsorros.com</link><image><url>https://substackcdn.com/image/fetch/$s_!SWBE!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b2cd49f-7e70-4cbb-a18b-ed077b01e8dc_1024x1024.png</url><title>Nick Sorros</title><link>https://newsletter.nsorros.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 03 Jul 2026 19:34:29 GMT</lastBuildDate><atom:link href="https://newsletter.nsorros.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Nick Sorros]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[nicksorros@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[nicksorros@substack.com]]></itunes:email><itunes:name><![CDATA[Nick Sorros]]></itunes:name></itunes:owner><itunes:author><![CDATA[Nick Sorros]]></itunes:author><googleplay:owner><![CDATA[nicksorros@substack.com]]></googleplay:owner><googleplay:email><![CDATA[nicksorros@substack.com]]></googleplay:email><googleplay:author><![CDATA[Nick Sorros]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Close the loop]]></title><description><![CDATA[Everyone is talking about building loops, operating a level up, running factories.]]></description><link>https://newsletter.nsorros.com/p/close-the-loop</link><guid isPermaLink="false">https://newsletter.nsorros.com/p/close-the-loop</guid><dc:creator><![CDATA[Nick Sorros]]></dc:creator><pubDate>Fri, 03 Jul 2026 12:50:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3013349c-b2ea-467a-a3ae-2e304c093894_1148x796.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Everyone is talking about building loops, operating a level up, running factories. But what does it all means and how do we put the concepts into practice? This is the third piece I&#8217;ve written in this arc, after the harness (the body around the model&#8217;s brain) and the factory (how an org runs a fleet of agents it never typed for). This one is about the loop itself: where it came from, and how you actually put one to work.</p><h2>Where the loop came from</h2><p>When ChatGPT launched it had no loop at all. Request, response, done. You asked, it answered, and if the answer was wrong you asked again. You were the loop.</p><p>The first real loop arrived with reasoning models. Instead of answering in one shot, the model was allowed to think, call a tool, look at the result, think again and running until it decided it had a response. Give that loop the ability to act, edit files, run commands, open PRs, and you get agents like Claude Code. That&#8217;s still the first loop. The halt condition is the model&#8217;s own sense that its reasoning is done.</p><p>The moment agents got useful we started handing them goals instead of steps. Not &#8220;write this function,&#8221; but &#8220;make this test pass,&#8221; &#8220;ship this feature.&#8221; That&#8217;s the second loop, and it&#8217;s the one we&#8217;re in now: the thing runs until the goal is met, not until the reasoning effort is spent. The halt condition moved up a level, from &#8220;I&#8217;m done thinking&#8221; to &#8220;the objective is achieved.&#8221;</p><p>It&#8217;s not hard to see where this goes. Each loop swallows the one below it and takes a more abstract goal. It looks a lot like climbing a company hierarchy: at the bottom someone is told exactly what to do, a level up they&#8217;re given a task and left to work out the steps, a level up from that they own an outcome and decide the tasks themselves. This is what the conversation about loops and factories is all about. Now let&#8217;s see how we can put it in practice.</p><h2>Operate one level up</h2><p>Here&#8217;s the whole method in one line: build the level of abstraction one step above where you&#8217;re working today, and do your work from there.</p><p>Make it concrete. Right now most engineering with AI happens at the keyboard. You, or your engineers, instruct a model to write code and correct it interactively through a CLI, prompt by prompt. That&#8217;s operating inside the first loop. The move up is to stop being the thing that prompts the agent.</p><p>You can call this way of working loopcraft, loop engineering but regardless of the name what it means is: replacing yourself as the person who prompts the agent. So instead of correcting output live, you build a coding agent that takes a brief and produces a PR, ideally correct in one shot or with minor feedback, and you stop iterating on the code. You iterate on the design of the agent. When a PR comes back wrong, you don&#8217;t fix the PR. You fix the thing that produced it, so the next hundred come back better.</p><p>This is different from a coding agent inside a CLI. You&#8217;re not sitting in front of it. It picks its work up from where the work already lives, a ticket in your project tracker, a message in your team channel and delivers a PR back. The best demonstration of this is the recently released Claude Tag: you @-mention it on an issue and it goes away, does the work, and opens the pull request. No terminal, no live prompting. The task comes in through the tools your team already uses and the result comes back the same way.</p><p>Once you have that, you do the same move again, one level up. Now the interesting question isn&#8217;t how the code gets written, that&#8217;s handled. It&#8217;s where the tickets come from. And a ticket is usually a by-product: something said in a meeting, a request buried in an email, a bug reported in a channel, an issue in the repo. So you build the layer above the coding agent, a system that reads those sources and drafts the tickets, which you review and feed to the agent that solves them. Now you&#8217;re not writing tickets either. You&#8217;re steering a pipeline that turns raw signal into shipped code, and can interact with the sustem in different levels of abstractions: the ticket created, the PR produced etc</p><h2>As far up as it goes</h2><p>You can keep climbing. Every function of a business is, in principle, some loop of signal in and work out, and each one can get an agent that does the work a level below where you sit while you move up to steer it. Taken to its end, you run the whole organisation this way: agents wired into every function, doing the work the business theoretically needs done, and you intervening to correct the system rather than to do the work yourself. This is the loop-and-factory idea taken to its extreme, not one agent in a loop, but the org itself as a stack of loops you supervise.</p><p>I&#8217;m not claiming that end state is reachable for a whole business, or that it should be. Plenty of the work won&#8217;t fit the transformation, e.g. the judgment is the job and can&#8217;t be handed down a level. The exercise is still valuable. Find the bottleneck, push the agent one level of abstraction up from where it sits today, and watch how many loops you can close in your organisation</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-Nv6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-Nv6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-Nv6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-Nv6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-Nv6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-Nv6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/204903763?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-Nv6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-Nv6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-Nv6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-Nv6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b24e200-bb48-4172-adbb-8ad63872e611_2400x800.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Pay attention: <a href="https://z.ai/blog/glm-5.2">GLM-5.2</a> &#8212; </strong>The new open model leader and once that many claim is good enough to replace your preferred closed ai model. </p></li><li><p><strong>Pay attention: <a href="https://www.deeplearning.ai/the-batch/">Loop engineering emerges</a></strong> &#8212; The shift from one-shot runs to standing loops that keep agents working just got a name, used publicly by Boris Cherny (Claude Code) and Peter Steinberger (OpenClaw). </p></li><li><p><strong>Pay attention: <a href="https://www.anthropic.com/news/introducing-claude-tag">Claude Tag: persistent agents in Slack</a></strong> &#8212; The async-agent form factor is crossing out of the terminal and into the surface teams already work in. Worth watching as the first mainstream place non-engineers meet a standing agent.</p></li><li><p><strong>Pay attention: <a href="https://www.latent.space/p/ainews-anthropic-claude-fable-5-mythos">Claude Fable 5 / Mythos 5</a></strong> &#8212; The first generally-available Mythos-class model, SOTA on nearly every benchmark (AA Index #1 at 64.9, SWE-Bench Pro 80.3%) and a leap ahead of competition.</p></li><li><p><strong>Skip: <a href="https://duckduckgo.com/?q=openai+5.6&amp;t=osx&amp;ia=web">OpenAI GPT-5.6 (Sol/Terra/Luna)</a></strong> &#8212; An expected frontier bump that matches rather than moves the story, hence skip.</p></li></ul><h2>Quick takes</h2><ul><li><p><strong><a href="https://nsorros.com/writing/anthropic-takes-the-lead-with-fable/">Anthropic takes the lead with Fable</a></strong> - Fable 5 broke a year-long pattern. Instead of trading a couple of benchmark points, it jumped ~10 points above the trend line on SWE-Bench Pro, and led on revenue, distribution and for the first time raw capability at once.</p></li><li><p><strong><a href="https://nsorros.com/writing/two-monarchies-open-vs-closed/">Two monarchies, open vs closed</a></strong> - Closed AI is a US monopoly (Anthropic); open AI is effectively a Chinese one (DeepSeek, Kimi, GLM, MiMo), with the open crown rotating every few weeks. GLM-5.2 now leads the open pack, ~5 index points behind the frontier and for most real coding work, one generation behind already buys most of the value at a fraction of the cost, on weights you host.</p></li><li><p><strong><a href="https://nsorros.com/writing/we-might-be-measuring-intelligence-wrong/">We might be measuring intelligence wrong</a></strong> - Benchmarks reward pass@k, right at least once in k tries. But agents act once, so what matters is pass&#8743;k, right <em>every</em> time. Capability has raced ahead; reliability is climbing behind it which is why the impressive monthly numbers don&#8217;t always translate to impressive agentic performance.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Building the factory that builds software]]></title><description><![CDATA[A lot of organisations are quietly transitioning from using AI-assisted coding to building the factory that produces the software.]]></description><link>https://newsletter.nsorros.com/p/building-the-factory-that-builds</link><guid isPermaLink="false">https://newsletter.nsorros.com/p/building-the-factory-that-builds</guid><dc:creator><![CDATA[Nick Sorros]]></dc:creator><pubDate>Mon, 08 Jun 2026 12:42:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UE7f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A lot of organisations are quietly transitioning from using AI-assisted coding to building the factory that produces the software. <a href="https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-coding-agents">Stripe</a> merges over 1,300 pull requests a week that contain no human-written code. <a href="https://engineering.atspotify.com/2025/11/spotifys-background-coding-agent-part-1">Spotify&#8217;s</a> background agents merge 650+ PRs a month into production. Around half of <a href="https://builders.ramp.com/post/why-we-built-our-background-agent">Ramp&#8217;s</a> merged PRs start from their in-house agent, and a three-person team at <a href="https://openai.com/index/harness-engineering/">OpenAI</a> shipped a million-line product by driving agents instead of typing. So how is the factory set up in these organisations?</p><h2>It is not a multi-agent system</h2><p>When people picture the factory, they picture fleets of specialised agents, a planner, a few doers, a reviewer, a tester, passing work to each other like stations on an assembly line. In fact the opposite is happening for the precise reason that Cognition, the company behind Devin, articulated in <a href="https://cognition.ai/blog/dont-build-multi-agents">Don&#8217;t Build Multi-Agents</a>. In short, every action an agent takes carries implicit decisions, and parallel agents that cannot see each other&#8217;s full traces make conflicting decisions that compound into incoherent work.</p><p>Instead the factory consists of many independent but powerful agents. Stripe&#8217;s minions are one agent loop wrapped in &#8220;blueprints&#8221; state machines that interleave deterministic steps (lint, push, run the right slice of a 3M-test battery) with agentic ones (implement, fix CI). Spotify&#8217;s Honk is one deliberately minimal Claude Code agent whose git cannot even push. The only genuine second agent anyone runs in production is OpenAI&#8217;s <a href="https://alignment.openai.com/auto-review/">Auto-review</a>, and it is a safety reviewer gating sandbox escalations, not a quality reviewer. The factory is one agent and a lot of plumbing.</p><h2>The agent gets the environment you would give an engineer</h2><p>If the factory is one agent, the first practical question is where it works. The convergent answer: a sandboxed environment that looks remarkably like what you would hand a new engineer on day one, the repo, a shell, the tests, and access to the tools your engineers get. Stripe spins up pre-warmed devboxes in about ten seconds, identical to the ones human engineers use, cut off from production and the internet. Ramp restores Modal sandboxes from repo snapshots rebuilt every thirty minutes, so the agent starts near-instantly on code that is at most half an hour stale, with the same telemetry, Sentry, Datadog, feature flags its engineers check.</p><p>This environment is not a new idea. Google&#8217;s engineers have worked in the cloud for over a decade. By 2023 around 80% of development on its main codebase happened in Cider, its web IDE, on top of a file system that streams the monorepo on demand. GitHub Codespaces made the dev environment a URL back in 2020, and plenty of startups run on that setup today. As it turns, this way of working is an even  better fit for agents than engineers since a repo, a shell and a test runner is all an agent needs, it spins up a fresh one per task, and it does not keep your hours.</p><h2>A hybrid, not a handover</h2><p>The trending setup, on the other hand, is not &#8220;agents slowly replacing developers&#8221;, it is a hybrid. A background fleet independently completes the delegable work, while every developer keeps an agentic coding tool like Claude Code or Cursor in hand for the rest. This has two properties I like. First, you are permanently testing the frontier: when an autonomous run fails, the task degrades gracefully into an assisted one rather than a rewrite, and when the models improve, tasks graduate the other way. Second, it is assistive to the developer&#8217;s flow rather than disruptive, the developer and the agent work the same way, in the same environment, against the same verification, so nothing about the codebase has to be split into &#8220;human code&#8221; and &#8220;agent code&#8221;.</p><h2>Which tasks go to the factory</h2><p>Not everything belongs on the conveyor belt yet so it&#8217;s natural to ask which tasks do you send towards the autonomous agents vs your developer team. The tasks organisations actually delegate are well-scoped and mechanically verifiable: migrations, dependency upgrades, bug fixes, cleanup. Spotify&#8217;s Honk is openly a migration factory, its dataset migration shipped 240 automated PRs that the company estimates saved ten engineering weeks. Stripe&#8217;s minion PRs skew towards config changes, upgrades and small refactors. OpenAI runs recurring cleanup agents whose refactoring PRs are small enough to review in under a minute.</p><p>These task classes serve as an excellent testing ground as you keep exploring a boundary that keeps moving. Anthropic&#8217;s <a href="https://www.anthropic.com/institute/recursive-self-improvement">When AI Builds Itself</a> charts exactly this: success on well-defined tasks saturated first, while success on open-ended coding tasks climbed to 76% by May 2026, which is how 80% of the code merged at Anthropic ends up written by Claude.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UE7f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UE7f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 424w, https://substackcdn.com/image/fetch/$s_!UE7f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 848w, https://substackcdn.com/image/fetch/$s_!UE7f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 1272w, https://substackcdn.com/image/fetch/$s_!UE7f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UE7f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic" width="1456" height="844" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:116223,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/201121265?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UE7f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 424w, https://substackcdn.com/image/fetch/$s_!UE7f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 848w, https://substackcdn.com/image/fetch/$s_!UE7f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 1272w, https://substackcdn.com/image/fetch/$s_!UE7f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c612c18-186c-465d-9d4f-06f15243a3ef_2200x1276.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The practical rule: delegate where a pass/fail check exists, keep judgment work hybrid, and re-draw the line every quarter, because the open-ended curve is the one moving.</p><h2>Keeping the agent on task</h2><p>The main steering mechanism remains unglamorous: a CLAUDE.md or AGENTS.md file plus skills. It is a compact, inspectable way to inject specific instructions, org policies and practices, and a working memory into every run. The pattern that works is now well documented, OpenAI tried the one big AGENTS.md and reports it failed in predictable ways; what survives is a ~100-line table of contents pointing into real docs, kept fresh mechanically. GitHub&#8217;s <a href="https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/">analysis of 2,500+ repos</a> lands on the same size, and finds &#8220;never commit secrets&#8221; the most common instruction in the wild.</p><p>All of this is scaffolding the labs are actively trying to absorb into the models, better memory, better context management, agents choosing what they need. Until they do, these files and loops are the interface to your factory.</p><h2>So what do you do?</h2><p>If you are an engineering leader, the factory reframes your questions. Not &#8220;should we use AI coding&#8221;, that ship has sailed, but three concrete ones. What environment do agents get? Give them what you would give an engineer, sandboxed and scoped. How do you keep them on task? An AGENTS.md worth maintaining, a verification loop the agent cannot talk its way around, and hard caps on iteration. What goes to the agent versus the human? Start with migrations, upgrades and bug fixes, run everything else hybrid, and move the line as the frontier moves.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CTvG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CTvG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 424w, https://substackcdn.com/image/fetch/$s_!CTvG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 848w, https://substackcdn.com/image/fetch/$s_!CTvG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!CTvG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CTvG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/201121265?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CTvG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 424w, https://substackcdn.com/image/fetch/$s_!CTvG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 848w, https://substackcdn.com/image/fetch/$s_!CTvG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!CTvG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F678ebbad-12f7-4734-8c9e-2aee97e84988_2400x800.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>What To Pay Attention To</h2><ul><li><p><strong>Pay attention: <a href="https://www.anthropic.com/institute/recursive-self-improvement">When AI builds itself</a></strong> &#8212; Pay attention as a long-term trend. In Anthropic&#8217;s own terms, they are unsure whether progress will stagnate and humans&#8217; high-level direction will still be needed, or whether progress continues and we end up with recursive self-improvement. What&#8217;s worth keeping in mind is that at the frontier 80% of the code is already being written by AI, and there is a real possibility of the tech department in an org being completely overtaken by AI.</p></li><li><p><strong>Pay attention: <a href="https://www.latent.space/p/cognition">The age of async agents</a></strong> &#8212; In the meantime, agentic coding is moving to the cloud, which has many benefits. This is not a new trend &#8212; see GitHub Codespaces, and of course orgs like Google where the coding environment always lived in the cloud &#8212; but this time it feels clearer and more pervasive because of how little setup a coding agent needs to get going. It means businesses need to start thinking about how they enable async agentic coding for their workforce, to get the maximum benefit from agents running 24/7.</p></li><li><p><strong>Pay attention: <a href="https://newsletter.pragmaticengineer.com/p/ai-impact-2026-part-2">Pragmatic Engineer&#8217;s 2026 AI survey</a></strong> &#8212; Three pieces in one, with super useful insights: senior engineers now delegate to AI instead of junior devs; AI coding quality reflects the seniority and knowledge of the user, which disproportionately hurts the inexperienced; and <a href="https://newsletter.pragmaticengineer.com/p/state-of-the-job-market-2026">hiring has never been stronger</a> even as more and more code is written by AI &#8212; Jevons paradox in practice.</p></li><li><p><strong>Skip: <a href="https://www.latent.space/p/ainews-anthropic-raises-965b-series">Opus 4.8 and Dynamic Workflows</a></strong> &#8212; Opus 4.8 is an iterative improvement, hence skip. Dynamic Workflows is worth keeping in mind, but it sits on the same linear trajectory of more and more work being pushed towards agents through skills, workflows and other constructs &#8212; constructs which may soon be less relevant as AI gets better at choosing what it needs by itself.</p></li><li><p><strong>Skip: <a href="https://lastweekin.ai/p/lwiai-podcast-246-gemini-35-omni">Cursor Composer 2.5</a></strong> &#8212; Cursor is having a quiet comeback, slowly climbing the agentic coding rankings with its own model (fine-tuned on Moonshot&#8217;s open-weight Kimi K2.5), but it&#8217;s not something you should pay too much attention to &#8212; not important enough to even think about switching away from Claude Code.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Harness engineering is not a moat]]></title><description><![CDATA[it's good software engineering foundations]]></description><link>https://newsletter.nsorros.com/p/harness-engineering-is-not-a-moat</link><guid isPermaLink="false">https://newsletter.nsorros.com/p/harness-engineering-is-not-a-moat</guid><dc:creator><![CDATA[Nick Sorros]]></dc:creator><pubDate>Tue, 26 May 2026 05:20:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XMkL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XMkL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XMkL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 424w, https://substackcdn.com/image/fetch/$s_!XMkL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 848w, https://substackcdn.com/image/fetch/$s_!XMkL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 1272w, https://substackcdn.com/image/fetch/$s_!XMkL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XMkL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic" width="1456" height="922" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:922,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114159,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/199175956?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XMkL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 424w, https://substackcdn.com/image/fetch/$s_!XMkL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 848w, https://substackcdn.com/image/fetch/$s_!XMkL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 1272w, https://substackcdn.com/image/fetch/$s_!XMkL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f463e1-a573-42e7-9b74-394ae594062f_2400x1520.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>A couple of years ago, prompt engineering was going to be a career. Then the models got good enough that the tricks stopped mattering, and the job evaporated before it really arrived. Nowadays, it&#8217;s all about harness engineering, the scaffolding around the model: how it reasons, calls tools, manages context, remembers, checks its work. So it&#8217;s worth asking: will harness engineering have the same fate as prompt engineering?</p><p>This is not to say that the harness is not important because it is. It is what turns your AI model into an agent. However, it currently serves two distinct roles, one of which gets more publicity and is more important in the short term, with the other being the core of what stays. Let&#8217;s start with the first.</p><h2>Patching limitations of current models</h2><p>One of the main reasons, harness engineering is getting a lot of attention is because it makes an existing model perform better on a task. That on its surface seems valuable and can be mistaken for long term business value. However, I don&#8217;t think most of those fixes are here to stay. Here is a list of such fixes, some of which are already obsolete:</p><p>&#129300;Reasoning like chain-of-thought, self-reflection loops (Reflexion, Self-Refine). Before reasoning models, there were plenty of techniques that were making models better at thinking which had a downstream effect in real world tasks, coding included but nowadays models are trained to reason so these techniques got removed from the harness level.</p><p>&#128736;&#65039; Tools. Early harness code was also fixing issues with tool use, format the call, parse output, retry bad JSON. Models choose and call tools cleanly now, the skill is being internalised. Tools choice still matters but it plays an ever smaller role as models become better at making the right choice by themselves.</p><p>&#128191; Context. Pruning, compacting, pulling key facts back is where a lot of today&#8217;s lift sits. Manipulating the context is the latest art form. Bringing the right tools closer to the context, keeping the right information etc and suddenly your agent starts performing much better. However models are already getting better at managing their context so I am not so sure this will be an important part of future harnesses.</p><p>&#127981; Domain knowledge is the single biggest lever right now. The reason being that current models, even though excellent at most domains, they lack on the job experience in your, or any, company. This means that there is plenty of room for adjusting how the agent performs in the tasks that are important to you. I am under the impression though that this will also go away when companies devise agent onboarding materials that the agent can refer back to.</p><p>In any case, it is important to be sceptical about additions to the &#8220;intelligence&#8220; layer that will very likely go away at the next model iteration.</p><h2>The agent runtime</h2><p>The other role of the harness is more boring but more important in my opinion. It is the part that runs the AI model. This is the orchestration in some way, the software engineering engine around AI. It&#8217;s more the product than the intelligence itself. It contains things like</p><p>&#9881;&#65039; Runtime. The engine that actually executes the model and its tools, reliably and at a sane cost. A smarter model still has to run somewhere, and keeping it up, retrying what fails and not blowing the budget on tokens is plain infrastructure work that no model iteration takes away.</p><p>&#128190; Storage. Sessions and memory have to live somewhere, which gives you an auditable trail of what the agent did but also the substrate for it to improve over time. However much a model can hold in its head, it still needs a notebook outside of it.</p><p>&#128268; Integration surface. The access to your systems and data, and the ability to take real actions in the world. A brain with no hands does nothing, and wiring up those hands is ordinary software engineering that only grows as the agent is trusted with more.</p><p>&#128274; Permissions and policies. The guardrails around what the agent can and cannot do, what it can touch, spend or send. The more capable the model gets the more this matters, not less, because a smarter agent left unchecked can do more damage and do it faster.</p><p>&#9989; Independent verification. One model doing the work and another checking it, because a model grading its own output goes soft the same way you shouldn&#8217;t review your own PR. This is a product decision about trust, not a trick to squeeze more intelligence out of the model.</p><h2>So what do you do?</h2><p>I would start by separating the above two aspects of the harness. The intelligence uplift , context juggling, the domain rulebook which is a tune, not a moat: the same kind that prompt engineering and fine-tuning gave. Don&#8217;t over invest in it, as it is a depreciating asset.</p><p>The durable element of the harness is the other half, the body and the product around the model brain. . That&#8217;s just good software engineering, and good software engineering is here to stay.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DBbW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DBbW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 424w, https://substackcdn.com/image/fetch/$s_!DBbW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 848w, https://substackcdn.com/image/fetch/$s_!DBbW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!DBbW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DBbW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/199175956?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DBbW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 424w, https://substackcdn.com/image/fetch/$s_!DBbW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 848w, https://substackcdn.com/image/fetch/$s_!DBbW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!DBbW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86a682e-bc01-4b30-be27-066d55c768e3_2400x800.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Pay attention</strong>: LLMs are finding real, unknown vulnerabilities now - Google reported the first known in-the-wild case of attackers using an LLM to discover a previously unknown vulnerability, and the UK AI Security Institute now clocks top models executing attacks that would take a human ~3 hours (up from a 1-hour forecast, and 30 minutes at Opus 4.6&#8217;s debut). The patch-vs-exploit math has changed, and LLM-found logical flaws are an active threat rather than a forecast.</p><p><strong>Skip</strong>: All model labs are now agent labs - The whole industry capitulated to the harness in one news cycle (OpenAI, AI21, even DeepSeek), and the scary version of the story is that labs co-train models to only work inside their own agent, but I&#8217;m not convinced the lock-in bites, because even if the raw model stops being freely swappable, the agent layer on top of it probably still is.</p><p><strong>Skip</strong>: The labs are moving into the consulting lane - OpenAI bought a 150-person FDE shop and Google joined the hiring spree, but labs chasing services revenue is just gravity, and unless you&#8217;re a consultancy or a thin wrapper you&#8217;re fine. The more useful takeaway is that the AI part itself won&#8217;t be your differentiator for much longer, because everyone, the labs included, is racing to embed it everywhere.</p><p><strong>Skip</strong>: &#8220;Flash&#8221; is no longer the cheap tier - Gemini 3.5 Flash landed at frontier-class scores but 5.5&#215; the price of its predecessor, so the &#8220;Flash = cheap&#8221; heuristic is not relevant anymore. At the same time, if you specifically need the speed, Flash is the fastest model on its intelligence level</p><p><strong>Skip</strong>: An OpenAI model disproved an 80-year-old Erd&#337;s conjecture - A general-purpose model produced a genuinely novel result in under $1,000 of compute, a great story of AI scratching the surface of what&#8217;s possible, but it doesn&#8217;t change any practical use case you will build.</p>]]></content:encoded></item><item><title><![CDATA[The tale of the rabbit and the turtle]]></title><description><![CDATA[How AI might not be making us as productive as we think]]></description><link>https://newsletter.nsorros.com/p/the-tale-of-the-rabbit-and-the-turtle</link><guid isPermaLink="false">https://newsletter.nsorros.com/p/the-tale-of-the-rabbit-and-the-turtle</guid><dc:creator><![CDATA[Nick Sorros]]></dc:creator><pubDate>Mon, 18 May 2026 13:06:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9IbD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We feel more productive with AI. We&#8217;re shipping thousands of lines a day. Whole apps and complex features are built in front of our eyes in a few hours, work that would have taken weeks pre-2024. The rabbit is fast. But how much faster are we actually going from A to B? And what if we&#8217;re moving faster down a longer road?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9IbD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9IbD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!9IbD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!9IbD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!9IbD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9IbD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:377079,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/198248286?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9IbD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!9IbD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!9IbD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!9IbD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8951fb3-82f5-4b3d-a771-a01193bc207b_2400x2400.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The mental model is the product.</h3><p>When you program, a big part of the work isn&#8217;t typing code. It&#8217;s fitting what already exists, and what needs to exist, into your head, and coming up with a solution that delivers the result in a way that is clean, extensible, and consistent with what the owner of the ticket actually wants. The output you&#8217;re really producing is the mental model of the system. The code on the screen is a side effect of having that model in place. Asking an AI agent to do the work, while under specifying how or what needs to be done, breaks that. The mental model isn&#8217;t getting built, the AI is filling in the gaps with its own assumptions about your intent. And that doesn&#8217;t speed you up, because you&#8217;ll eventually need to sync mental models with the system anyway. You&#8217;ll still have to decipher what got built and confirm it matches what you wanted.</p><p>The difference is that now you do this iteratively. The AI produces something, you look at it, you realize you actually wanted X not Y, you tell it to adjust, it produces a new version, you refine again until you converge. Every individual step feels fast. But are you faster, or are you just moving fast down a longer road?</p><p>This isn&#8217;t speculation. Researchers and engineers building with AI are converging on the same observation, and proposing different responses, spec-driven development, agent scaffolding, treating the LLM as a junior engineer who needs a tight brief. What matters in practice is being able to inspect whether you&#8217;re in this situation and mitigate when you are. There&#8217;s no clean, well-proven solution yet. So here are some thoughts and observations from my own coding journey with AI.</p><h3>How can we measure this?</h3><p>The question is whether we are under specifying and paying the price by rewriting too much of our code too often. At the same time we want to know if some metric of business value or product velocity has meaningfully ticked upwards. Here is one way to quantify them (by no means the only or the right one)</p><ol><li><p><strong>Code half-life</strong>. Of the code you wrote N weeks ago, what percentage is still alive in the codebase today? A high half-life means the code you ship sticks. A collapsing half-life means you&#8217;re rewriting too often.</p></li><li><p><strong>Adjustment rate</strong> in your AI sessions. What fraction of your AI coding sessions are refinements of prior AI output i.e. &#8220;remove what you added,&#8221; &#8220;different approach,&#8221; &#8220;also do X&#8221;, versus brand-new scope or status checks? A pre-AI analog exists too: PR descriptions and commit messages with fix: / refactor: / revert: prefixes.</p></li><li><p><strong>Story points (externally validated)</strong>. How many units of real value are landing per unit time? If you have a PM, a customer, or any external system that quantifies what was actually delivered, that&#8217;s one metric that speaks to increased product velocity or business value. This one is also harder to fake than the other two. </p><p></p></li></ol><p>The three together are one way of trying to measure if you are moving faster towards a straight line or doing circles to reach the same destination. Writing less code raises half-life but tanks tasks shipped. Accepting whatever the AI produces lowers adjustment rate but tanks half-life. Only &#8220;specify up front, ship deliberately&#8221; pulls all three in the right direction at once. Here is how those metrics look for two of my projects. Project A is a pre-AI codebase while Project B is post-AI.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J1DW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J1DW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!J1DW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!J1DW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!J1DW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J1DW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:359667,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/198248286?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J1DW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!J1DW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!J1DW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!J1DW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213c0105-3f27-4e38-aaf9-13428b86ce67_2400x2400.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Throughput is way up. No ambiguity 3&#8211;4&#215; more shipped per month, with similar effort per task. The rabbit is faster. But the durability collapsed. Code that should be settled by month four has been rewritten down to 11% of itself. And about a quarter of my prompts to Claude are refinements after the fact.</p><h3>So how do you make the rabbit win?</h3><p>What my data and others&#8217; suggest is that you should treat speed-of-generation as a tax base, not a finish line. The tax you pay is in churn and rewrites. The way to lower the tax is upstream, by spending more time building a mental model first, so that what the AI does isn&#8217;t something that will require ten passes of refinement.</p><p>The turtle isn&#8217;t slow because it types slowly. It&#8217;s slow because it thinks before it moves. The rabbit can move much faster but it only wins the race if it spends enough of its speed advantage on knowing where it&#8217;s going.</p><p>Three signals to start with: code half-life, adjustment rate, externally-validated points shipped. Measure them, watch the trend, and see whether your road is getting straighter or just longer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mEyl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mEyl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 424w, https://substackcdn.com/image/fetch/$s_!mEyl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 848w, https://substackcdn.com/image/fetch/$s_!mEyl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!mEyl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mEyl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/198248286?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mEyl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 424w, https://substackcdn.com/image/fetch/$s_!mEyl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 848w, https://substackcdn.com/image/fetch/$s_!mEyl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!mEyl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127c353-4e9e-4301-9ccf-095a40e4fa6a_2400x800.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Pay attention</strong>: OpenAI deprecates finetuning APIs. Fine tuning was an important lever for improving models till very recently. This marks the end of an &#8220;era&#8220; in some sense.</p></li><li><p><strong>Pay attention</strong>: Anthropic meters programmatic Claude usage. Anthropic, and other labs have been subsidising subscription usage but it looks like the end of this might be coming closer so definitely worth planning for a future where token pricing and subscription are more equal.</p></li><li><p><strong>Pay attention</strong>: METR - speed up, value up&#8230; less. Yet another study that speaks to the rabbit and turtle phenomenon of speeding up code development does not necessarily translating to real productivity gains</p></li><li><p><strong>Skip</strong>: DeepMind AI Co-Mathematician - I would not pay too much attention on harness and manual gains on top of latest models that might become irrelevant as soon as the next version launches.</p></li><li><p><strong>Skip</strong>: Interaction Models (Thinking Machines Lab) - A super interesting work with questionable real world use cases, at least not as many.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Embeddings Are Not A Search Strategy]]></title><description><![CDATA[Improving your product search using AI]]></description><link>https://newsletter.nsorros.com/p/embeddings-are-not-a-search-strategy</link><guid isPermaLink="false">https://newsletter.nsorros.com/p/embeddings-are-not-a-search-strategy</guid><dc:creator><![CDATA[Nick Sorros]]></dc:creator><pubDate>Mon, 11 May 2026 12:47:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mJLZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The first thing most people reach for when adding AI to search is embeddings, a semantic representation that promises to retrieve matches a keyword search would miss. That&#8217;s true, but harder to land than it sounds. The reason is that embeddings are mostly trained and benchmarked on document-like data. But many real search systems are not searching over documents. They are searching over structured or semi-structured records: products, suppliers, grants, legal matters, tickets, assets, contracts, or clinical trials.</p><p>Think of an Amazon-like catalogue. A product is not just a paragraph. It has a title, brand, category, price, colour, size, availability, reviews, tags, seller information, compatibility fields, and a free-text description. If you flatten all of that into one text blob and embed it, you may erase the structure that search actually depends on.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.nsorros.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Issue</h2><p>This is why embeddings and &#8220;semantic search&#8221; may underdeliver. The embedding may understand the general meaning of the product description, but fail to preserve distinctions that matter: category, compatibility, size, colour, availability, jurisdiction, date, or whether a field is a hard requirement rather than background context.</p><p>There is an easy way to diagnose the issue. Take items you know are similar and confirm their embeddings are close. Take items you know are different and confirm they are far apart. Look at whether similarity scores are meaningfully spread, or whether everything clusters in a narrow band. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mJLZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mJLZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!mJLZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!mJLZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!mJLZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mJLZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic" width="412" height="412" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:412,&quot;bytes&quot;:232126,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/196879610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mJLZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!mJLZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!mJLZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!mJLZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ced3c8-4383-4ce3-b246-89aa479fb4de_2400x2400.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When choosing an embedding model, the question is not only &#8220;is this model trained on my domain?&#8221;, it is also &#8220;was this model trained or benchmarked on data shaped like mine?&#8221; A retail embedding model may still be a poor fit for structured product records if it mostly learned document retrieval. A more general model tested on tables or semi-structured retrieval may be more relevant.</p><p>Benchmarks such as <a href="https://arxiv.org/abs/2404.13207">STaRK</a>, which evaluates retrieval over semi-structured knowledge bases, and <a href="https://target-benchmark.github.io/">TARGET</a>, which looks at table retrieval, are useful because they focus on format, not just domain. The lesson from this literature is clear: pure embeddings are often weak when the record structure matters. Hybrid approaches tend to do better.</p><p>Before we look in hybrid approaches into more detail though let&#8217;s take a step back and start from the issue: &#8220;what kinds of queries are users asking, and why do they fail today?&#8221;</p><h2>Query Types</h2><p>In many systems, queries typically fall into the following buckets.</p><ul><li><p><strong>keyword lookups</strong>. A user knows the exact product name, ID, model number, company, SKU, document, or phrase. These usually need strong lexical search, synonyms, analyzers, spelling tolerance, and sensible boosting.</p></li><li><p><strong>text-and-filter queries</strong>. The user writes natural language, but the query contains structured constraints: &#8220;wireless headphones under &#163;100 with noise cancelling,&#8221; &#8220;contracts from Germany after 2022,&#8221; &#8220;suppliers with ISO certification in Spain,&#8221; or &#8220;grants about youth mental health in Wales.&#8221; These should not be treated as one semantic blob. They should be decomposed into free text plus filters.</p></li><li><p><strong>semantic queries</strong>. The user describes an intent in language that may not appear in the indexed record: &#8220;something comfortable for long flights,&#8221; &#8220;a laptop for light video editing,&#8221; or &#8220;insurance policies that cover climate-related disruption.&#8221; These benefit from semantic expansion, dense retrieval, or both.</p></li><li><p><strong>assistant questions</strong>. &#8220;Which one should I buy?&#8221;, &#8220;Can I return this?&#8221;, &#8220;What is the policy?&#8221;, &#8220;Which supplier is safest?&#8221; Those may not belong in search at all. Route them to a chat surface that has access to the same data, but treat it as a separate product.</p></li></ul><p>Each type will require a slightly different approach to make it work.</p><h2>A Better Search</h2><h3>Query decomposition</h3><p>Take <strong>text-and-filter</strong> queries for example, the most useful move is query decomposition. A frontier model can turn a natural-language query into a text query, structured filters, and exclusions. For example, &#8220;wireless headphones under &#163;100, not refurbished, for running&#8221; contains a product category, a price constraint, an exclusion, and an intended use. A less capable model may extract the obvious terms and miss the negation. A frontier model can preserve the structure. This is powerful because it improves the existing search engine rather than replacing it. Most mature systems already have useful filtering, field boosting, and keyword matching. The model&#8217;s job is to translate messy user language into the shape the search engine already understands.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L1Dj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L1Dj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!L1Dj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!L1Dj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!L1Dj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L1Dj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic" width="456" height="456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:456,&quot;bytes&quot;:246952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/196879610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L1Dj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!L1Dj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!L1Dj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!L1Dj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee999b-64e5-453d-ad80-d02dbd79ff7e_2400x2400.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This pattern is supported by work on LLM-driven structured query extraction. Fine-tuned models can convert messy user input into a combination of semantic terms, numerical constraints, and categorical filters that an existing search engine already understands [<a href="https://arxiv.org/abs/2601.16492">see</a>]. The domain changes, e-commerce, enterprise search, internal tools but the pattern is the same: do not throw away structure when the user is asking for structure.</p><h3>Query expansion</h3><p>For <strong>semantic queries</strong>, there is an easier step before even looking at embeddings: query expansion. Use a frontier model to generate semantically similar reformulations or a hypothetical result description, then retrieve against those expanded queries. Take the query &#8220;running headphones that don&#8217;t fall out&#8221;. A short query like that has very few terms in common with a typical product description. A frontier model can expand it into &#8220;wireless sports earbuds with secure ear-hook fit, sweat-resistant, IPX4 or higher&#8221;, or even synthesise a hypothetical product description and embed that.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z8gn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z8gn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!z8gn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!z8gn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!z8gn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z8gn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic" width="498" height="498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:498,&quot;bytes&quot;:322636,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/196879610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z8gn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!z8gn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!z8gn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!z8gn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F769ef845-a425-4627-a70e-8d1465b27528_2400x2400.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Work such as <a href="https://arxiv.org/abs/2303.07678">Query2doc</a> and <a href="https://arxiv.org/abs/2212.10496">HyDE</a> were among the first to explore this strategy but it&#8217;s one that lasted the test of time. The way it helps is by improving recall, since it changes what enters the candidate set. A reranker cannot recover a result that was never retrieved in the first place.</p><h3>Reranking</h3><p>After decomposition and expansion, <strong>reranking</strong> becomes the layer that fuses and corrects. Retrieve a wider candidate set from lexical search, filters, expansion, and possibly embeddings. Then use a reranker to reorder the candidates based on the full query intent. In production, dedicated cross-encoder rerankers are often a practical starting point because they are cheaper and faster than frontier models. Frontier models are still useful for prototyping, calibration, and generating training data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q2MB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q2MB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!Q2MB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!Q2MB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!Q2MB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q2MB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic" width="485" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:485,&quot;bytes&quot;:328407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/196879610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Q2MB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!Q2MB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!Q2MB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!Q2MB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb850756e-ad2b-40ba-b7d8-2cf4620f1220_2400x2400.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There is plenty of <a href="https://arxiv.org/abs/2403.10407">research</a> showing that cross-encoders are &#8220;very competitive&#8221; with the latest models as rerankers and &#8220;way more efficient&#8221; across standard IR benchmarks, making them the right first production choice, while <a href="https://arxiv.org/abs/2304.09542">frontier models</a> earn their keep generating training data and listwise judgments that can be distilled into a custom reranker later.</p><h3>AI judge</h3><p>But none of this matters if you cannot measure whether search improved. This is where AI judges are useful, with one important caveat: do not ask a model for a vague relevance score. Decompose relevance into criteria. For a structured catalogue, criteria might be: does the product category match, do the key attributes match, does it satisfy the constraint, and how specific is the result? A result that fails a hard constraint should not score highly just because it is semantically related.</p><p>The judge should see structured fields, not a dumped text blob. It should explain each criterion before scoring. And it should be calibrated against human judgments. The LLM-as-judge literature increasingly supports this direction: <a href="https://arxiv.org/pdf/2507.09488">criteria-decomposed rubrics</a>, few-shot examples, <a href="https://arxiv.org/abs/2406.06519">compact grading scales</a>, and periodic human spot checks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wS26!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wS26!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!wS26!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!wS26!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!wS26!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wS26!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic" width="482" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:482,&quot;bytes&quot;:287487,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.nsorros.com/i/196879610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wS26!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 424w, https://substackcdn.com/image/fetch/$s_!wS26!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 848w, https://substackcdn.com/image/fetch/$s_!wS26!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 1272w, https://substackcdn.com/image/fetch/$s_!wS26!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4088d-6515-426d-9a77-250300dfb96d_2400x2400.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Then compare systems with proper search metrics: nDCG@10 for ranking quality, relevant@10 for how many useful results appear, and a gate-failure rate for hard mismatches. Keep a frozen baseline so you know whether a new version improved search or only changed it.</p><h2>Conclusion</h2><p>Adding AI into your search sounds simple but there are a couple of moving components you need to get right, starting from which embeddings or whether you use embeddings at all, to understanding the queries of your users and choosing the right method for the problem. This is without even mentioning fusing results and judging improvements at scale.</p><h1>What To Pay Attention To</h1><ul><li><p><strong>Pay attention</strong>: Mythos + Opus 4.7 &#8212; Mythos looks real after Mozilla used it to harden Firefox at scale, while Opus 4.7 is the deployable jump in Claude capabilities that matters because many teams already use Claude and Claude Code for serious work.</p></li><li><p><strong>Skip</strong>: GPT-5.5 &#8212; This feels like the expected OpenAI move: matching the frontier on agentic coding and knowledge work, but not changing the story enough to dwell on.</p></li><li><p><strong>Skip</strong>: Kimi K2.6 + DeepSeek V4 &#8212; These releases keep narrowing the open-model gap, especially on cost, but the evidence still points to them trailing the newest frontier models on the hardest real-world agentic work.</p></li><li><p><strong>Skip</strong>: Tokenmaxxing &#8212; Token consumption is a bad proxy for productivity, though it is worth watching because it may distort software engineering costs, team metrics, and hiring expectations.</p></li><li><p><strong>Pay attention</strong>: RIP PR / new coding practices &#8212; The important shift is that software work is moving upstream from writing and reviewing diffs to describing intent, setting architecture, defining checks, and supervising AI write-review loops.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.nsorros.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[An AI newsletter from Nick Sorros]]></title><description><![CDATA[Written by a human. About AI. Not the other way around.]]></description><link>https://newsletter.nsorros.com/p/an-ai-newsletter-from-nick-sorros</link><guid isPermaLink="false">https://newsletter.nsorros.com/p/an-ai-newsletter-from-nick-sorros</guid><dc:creator><![CDATA[Nick Sorros]]></dc:creator><pubDate>Wed, 06 May 2026 08:55:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sfS3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 424w, https://substackcdn.com/image/fetch/$s_!sfS3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 848w, https://substackcdn.com/image/fetch/$s_!sfS3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 1272w, https://substackcdn.com/image/fetch/$s_!sfS3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sfS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic" width="324" height="324" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:324,&quot;bytes&quot;:26918,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!sfS3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 424w, https://substackcdn.com/image/fetch/$s_!sfS3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 848w, https://substackcdn.com/image/fetch/$s_!sfS3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 1272w, https://substackcdn.com/image/fetch/$s_!sfS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb09ff936-be77-45d1-b22e-8488f51119cd_1024x1024.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This newsletter is for tech leaders under pressure to ship AI features while the ground keeps shifting beneath them. For over 12 years, and most recently through my own AI consultancy, I&#8217;ve helped organisations of all sizes make better use of AI. The goal here is simple: help you make sharper, more grounded decisions about building around AI.</p><p></p><p>Each week, one piece on a decision you&#8217;re facing. It might be architectural. How do you put an agentic layer on top of your product? It might be strategic. Train around today&#8217;s model limits, or wait for the next iteration? Plus a short read on the week&#8217;s AI news. What&#8217;s worth your attention. What&#8217;s noise.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.nsorros.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.nsorros.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item></channel></rss>