How an AI agency should describe what it builds you — and why most still can’t.
by Ygor Fonseca, Founder & Systems Lead
Two services firms walk a head of growth through the same engagement. Both pitch a six-month build, a senior lead on point, a junior on production, an AI system in the loop. Both will publish to her brand's social and email queues. Both invoice within a few hundred dollars of each other.
The first firm describes the work using words she has heard for fifteen years — now with AI added to the front of each one. AI-powered media. AI-enabled creative. AI-driven analytics. The deck is well-designed. By the end of the call she has a feeling about the team but no structural picture of what will sit in her stack on day ninety.
The second firm describes the work using six words she has never heard a services firm use. Prompts. Context. Tools. Examples. Memory. Instructions. Then it draws a small diagram with two boxes — one labeled workflow, one labeled agent — and explains which parts of the engagement go in which box and why. By the end of the call she can compare this proposal to the first one on six distinct axes she did not have a name for an hour ago. She also knows which questions she did not get to ask the first firm.
This post is about why the second firm wins audits that look like coin flips on the deck, and why the language is not a sales trick. The engineering vocabulary for AI-native services delivery has existed in public for more than a year. It came from inside the labs and from the AI research community, not from marketing departments. Most services firms still describe AI as a capability bolted onto existing services — media, creative, data, content — rather than as the substrate of how the work is actually built. The literal words vary. The framing pattern is the same. The buyer-side reason this matters is more important than the sales-side reason: when AI is described as a feature added to services a firm was already selling, the buyer has no way to audit how AI sits in the work, or to compare two firms on the same terms.
The vocabulary exists. It came from the engineers, not the agencies.
Andrej Karpathy — the AI researcher, former Tesla AI Director, OpenAI founding team member — gave the cleanest version of it at the Sequoia AI Ascent in April 2026. His framing is that programming with an AI system is built from six distinct primitives: prompts, context, tools, examples, memory, and instructions. He calls this Software 3.0, sitting in lineage with the older Software 1.0 (traditional code) and Software 2.0 (model weights).
He is one source. Anthropic, the AI lab that publishes the model many AI-native firms run their delivery on, published a companion engineering reference called Building Effective Agents. It does the other half of the job. Where he named the primitives, the lab named the two shapes an AI-native system actually takes in production: workflows and agents.
A workflow, in the lab's published definition, is a system where the AI and the tools around it move through predefined steps. Step one calls the model with one prompt and writes the output to a file. Step two reads that file and calls the model again. The route is fixed before the work runs. An agent, by contrast, is a system where the AI itself decides which tools to call, in what order, and when to stop. The route is decided live, by the model, based on what comes back from each step.
The distinction matters because it tells the buyer where the failure surface is. Workflows fail in predictable ways at predictable steps — you can put a check between step two and step three and catch the bad output. Agents fail in less predictable ways, because the next step is not known in advance, and the cost of a bad decision compounds across the loop. The lab's published posture, after working with named buyers like Coinbase, Intercom, and Thomson Reuters, is that you should use the simplest workflow that solves the problem and reach for an agent only when the work cannot be pre-specified. Most production AI work, in their published view, is workflows — the dynamism an agent adds is expensive, and most tasks don't need it.
Karpathy and the lab together gave the AI-native services tier two stable, public, engineering-grade dimensions: the six primitives a system is built from, and the two shapes the system can take. Most services firms still describe the work the way agencies described it in 2015 — media, creative, data, content as the service spine — with AI named as a capability that sits on top. The buyer has no way to tell, from that framing alone, whether there is a workflow underneath or a chat window and a hope.
How we use the vocabulary inside a brand-install
We had the diagnosis right before we had the engineering vocabulary. The per-engagement deliverable in Codified Engagement is a brand-install — an opinionated, machine-readable bundle that says here is what this brand is trying to do, here are the rules the AI must follow, here are the tasks the AI knows how to handle, here is what we have learned, here is what the AI cannot do without a human approving it first. That vocabulary lets us describe it precisely in the buyer's first conversation.
Inside the brand-install, the six primitives map cleanly to what the build actually contains. The prompts are the lead's questions when she sits down to do the work. The context is the working hypothesis at the top of the install — what this brand is trying to do, why now, what's blocking it. The tools are the systems the AI is allowed to read from and write to during the work — the brand's data warehouse, its content management system, its ad accounts. The examples are past tasks that landed, kept in the instructions files so the AI can pattern-match what good looks like. The memory is what the team has learned about this brand that nobody else on the team would remember a year from now. The instructions are the locked rules — brand voice, regulatory constraints, the founder's standing positions, off-limits topics. Six primitives. One file (and its directory) per brand we serve.
The workflow-versus-agent axis maps onto the install too. Most of the recurring work in a six-month engagement runs as workflow. A weekly content production pass is a workflow: research step, outline step, draft step, brand-voice review step, schedule step, publish step. Each step has its own instructions file and its own check between it and the next. We know what comes out the other end because the route is fixed.
A small number of surfaces inside the install run as agents. Anomaly response is one — when something in the brand's performance data breaks the way it usually moves, we don't know in advance which question the AI should ask first or where the chain of follow-ups will end. The work cannot be pre-specified. So it runs as agent, with a kill switch — a hard limit, set in the install, that stops the AI from acting without a human approving the next step. The kill switch is non-negotiable in any surface where the AI is operating with agent-style dynamism. The lab's published rule (simplest workflow first; agent only where the dynamism is required; human approval gate on the decisions that would be expensive to get wrong) is also our default.
Six primitives. Workflow or agent. Where the kill switch sits. That is the vocabulary the lead on the engagement uses in the first audit conversation with a head of growth. She can compare what we ship to what any other AI-services firm pitches, on the same terms, in the same call. She will know, by the end of the hour, whether what the other firm pitches is a workflow with seven named steps and approval gates, an agent with a real kill switch, or a chat window with a brand voice prompt and neither.
Why the vocabulary is the buyer's flashlight
The first firm in the open paragraph — the one pitching AI as a feature added on top of media, creative, or content — has told the buyer something a 2015 agency would have said about its quarterly deliverable, just with AI added to the marketing copy. The work the firm describes hasn't structurally changed; the marketing layer has. A buyer leaves the call with a sense of the team and the kind of work they ship, but without the language to ask precise follow-up questions about how AI actually sits inside the work.
The second firm — the one using the engineering vocabulary — has told the buyer something different. A workflow is a fixed route with named steps and a check between any two of them. An agent is dynamic direction with a kill switch on the calls where the dynamism is expensive to get wrong. Six primitives map to six engineering surfaces. A buyer leaves the call with the language to audit any part of the proposal, and to ask the same audit questions of any other firm pitching against it.
Worth a distinction inside this comparison: an analyst making a substrate claim — that AI should be foundational to a company, not adjacent — is making a posture-level claim. Diana Hu's framing of AI as the operating system a company runs on is exactly that move at the company-architecture level. The posture and the engineering precision underneath are different things. A firm can hold the substrate posture in its marketing and pitch the work as a bolt-on. Another firm can use plain operational language in its marketing and have substrate-level engineering underneath. The vocabulary on the deck signals one; the artifact a buyer can point at signals the other.
There is a real carveout: vocabulary is not methodology. A services firm can use the six primitives and the lab's two shapes and still ship a brand-install that runs as a chat window with a system prompt and no kill switch. The vocabulary is necessary; it is not sufficient. The audit conversation has to go past the language and into the artifact — show me the working hypothesis, show me the instructions index, show me the kill switch. We've ridden enough audits to know which way most firms break when the conversation goes there.
But the choice to leave the engineering vocabulary off the deck is a signal worth following up on. The parallel cost of using the right language is essentially zero — the six primitives are searchable in fifteen minutes, the lab's workflow-versus-agent distinction is on its public engineering blog. A firm that doesn't use the language is making a choice. Some firms keep the marketing layer compatible with what buyers have heard for fifteen years even though the engineering is real underneath. Others have less underneath the deck than the deck suggests. The vocabulary alone won't tell the buyer which; the audit conversation does.
The head of growth from the open paragraph asked both firms the same follow-up question after the calls. "Show me the workflow for the weekly content production pass and the kill switch on anomaly response." The vocabulary the second firm used made that question askable in the first place. Whether either firm can answer it is what the audit actually tests; the deck only signals which firm is already set up for the conversation.
Karpathy closed his April 2026 talk with a single line that has stuck around the AI research community since: "You can outsource thinking, but you can't outsource understanding." The engineering vocabulary is one tool the buyer has to keep the understanding while the work proceeds. When she uses the words workflow and agent and primitive in her own follow-up questions, she's doing thinking that any firm — first or second — will need to respond to. The vocabulary doesn't decide the audit; it makes the audit possible. Some firms welcome the questions and have answers ready. Others don't have language to meet them. The buyer learns which is which by asking.
If you are evaluating AI-services firms right now and the language on the deck doesn't tell you which of your work will run as workflow, which as agent, and where the kill switch sits, ask. The answer — or the absence of one — is most of what you needed to know.