Why AI-native CRO runs experiments on intent, not on pages — and what the research loop looks like inside a sprint.
by Luis Gomes, Founder & Growth Lead
A team runs a four-week CRO sprint. Week one, brainstorm. Week two, design and build the variants. Week three, push them live, wait for traffic to clear the noise. Week four, debrief — what won, what to ship, what to keep, what to kill. The sprint closes. The next one opens.
Now run the same sprint twice on two different teams. Team A spends week one in customer interviews, scroll-recording reviews, and a careful read of three months of "how did you hear about us?" responses on the inbound form. They write down one hypothesis the rest of the sprint will test. Team B spends week one staring at the homepage and the product page, generates twelve variant ideas off the top, and picks four that "feel like good tests." Both teams run the same week-two-through-four motion. Both teams ship at the same speed. Three months later, team A's tests have all moved a primary number; team B's tests have all "learned something" but the dashboard hasn't moved.
The thing team A had was not a better tool. It was a layer of work above the variant. Read the public framing on most experimentation platforms and you will not find that layer named — the unit of work on those sites is the page, the element, the variant, the experience. The cleanest counter-naming this year came from a small UK CRO product firm called Made With Intent, whose founder, David Mannheim, posted the framing at Speero's Circus 2026 conference in London earlier this month: Designing experiments around intent, not pages. That single substitution — intent for pages — is what the rest of this piece is about.
What "intent as the unit of work" actually means
Intent is what the visitor is trying to do. It is upstream of the page they happened to land on, the device they used, the channel they came from, and the variant the experimentation platform decided to show them. The page is the thing they encounter; intent is the thing they bring.
When the unit of experimentation is the page, the question the sprint answers is: between version A and version B of this layout, which converts higher? The answer can be statistically clean and operationally useless if the team didn't yet know which intents the page is being asked to serve. A product detail page that is being read as a subscription by half its visitors and as a one-time purchase by the other half has two arguments to win. Testing copy variants on the page without resolving the split is how teams accumulate tests without accumulating learnings.
When the unit is intent, the sprint asks the upstream question first: which intents are arriving at this surface, in what mix, and which one is the bottleneck? The variant comes second. Often the variant turns out to be simpler than the team expected, because intent-clarity collapses the search space of things worth testing.
This is the diagnosis Lukas Vermeer named some time ago as the Experimentation Gap — the distance between what a team thinks they learned from a test and what the test can actually support. A page-as-unit sprint widens the gap by default. An intent-as-unit sprint narrows it by design.
How the research-first camp gets read here
The research-first CRO firms have been pointing toward this layer for years. Speero publishes the methodology as an Experimentation Operating System. AGConsult leads with user research and "top task" surveys. GetUplift wraps the same shape in emotion-first framing. Each of these firms ships work that is upstream of the variant, and each of them is correct to do so. (For the full roster and the auto-tuning critique that sits underneath this piece, the prior column — auto-tuning is not experimentation — is the place to start.)
The thing Made With Intent's framing adds is a sharper public noun for the upstream layer. "User research" describes the activity. "Top tasks" describes one method. "Intent" describes the thing the activity is trying to model. A team can run user research and still produce a sprint where the unit of work is the page, because intent was never the artifact they built. The reframe is small on paper and surprisingly large in practice: it is the difference between week-one outputs that read as a research summary and week-one outputs that read as a hypothesis the sprint can test.
This is also where AI changes the math. Modelling intent at any scale used to require either deep qualitative work that didn't scale or quantitative segmentation that lost the texture. A language model is the first tool that can read the texture — read the verbatims, cluster the recordings, classify the form responses — at a speed that lets the team produce an intent map every week instead of every quarter. The model does the reading. The team picks which intents are worth testing. The work compounds because the map gets sharper each cycle.
The CRO research loop, codified
Most CRO teams already run a sprint. Most teams also keep the sprint's working memory in the head of one or two people, in a Slack channel and a few decks. The codified version writes the loop to disk — small files the team and the AI can both read — so the loop itself becomes the thing that gets sharper.
The flow — what a research-first CRO sprint looks like with AI in the room
The classic flow. Brainstorm. Pick variants. Build. Test. Debrief. The team holds the context in their heads. New team members learn the sprint by watching the senior person run it. The intent layer happens implicitly, in the senior person's pattern-matching from prior brands, or not at all.
The AI-native version.
- Week one — the intent map. Pull every "how did you hear about us?" form response, every support ticket touching the conversion event, every session recording flagged as high-engagement no-purchase, and every survey free-text from the last ninety days. The AI clusters the data into named intents — what visitors are trying to do, in their own words — and ranks them by frequency and revenue weight. The team reads the top five clusters and picks one as the sprint's bottleneck hypothesis.
- Week one, end — the hypothesis. One paragraph. The bottleneck intent, the surface where it's failing, the change the sprint will test, and the metric that will move if the hypothesis is right. Pre-declared.
- Week two — the build. Variant generation is now narrow because the hypothesis is narrow. Two to three variants of the change, not twelve. The AI can draft the copy and the layout against the hypothesis; the team picks and ships.
- Week three — the test. Pre-declared sample size, significance threshold, duration. The AI watches the data and flags anomalies; it does not call the result. The team calls the result against the pre-declared rule.
- Week four — the debrief that updates next week's brief. What did the test answer? What did it not answer? What does that mean for the intent map — which clusters got sharper, which got demoted, which new ones surfaced? The debrief edits the intent map directly, so week one of the next sprint starts with a sharper map.
The closing edge. Step five is the loop. The debrief doesn't just produce a slide; it writes back into the same intent map that started the sprint. Each cycle, the map gets sharper, the hypotheses get more specific, the variants get fewer, the tests answer cleaner questions.
Where it breaks. Two failure modes. First — the team treats the intent map as the deliverable instead of as the input to the hypothesis. A beautiful map that doesn't pick a fight produces zero tests. Second — the AI's clustering goes unread. If no one on the team sat with the source responses long enough to argue with the model's groupings, the map is generic, the hypothesis is generic, and the sprint is back to page-as-unit testing with extra steps.
Install note. We ship this as a cro/ skill set inside a per-brand AI brand-install — three small files the team and the AI both read each cycle: the intent-research brief, the hypothesis template, and the debrief that edits the brief for the next sprint. The flow above is the shape; the install is figuring out which intents matter for this brand in week one, which usually requires a real customer-interview pass before the AI clustering has anything worth reading.
What this changes about the AI-CRO conversation
The AI-CRO conversation in 2026 is mostly stuck at the wrong layer. Most of what the vendors are pitching is faster variant generation, faster bandits, faster personalization — execution-layer speed. Faster execution on the wrong question is the failure mode the prior column already covered, so this piece won't relitigate it.
What's new is that AI is also good at the layer above the variant. Reading the texture of a thousand open-ended responses, clustering them into named intents, surfacing the ones that haven't been tested yet, watching the debrief and feeding it back into next week's brief — this is work that used to live in the head of a senior person and now can live in a small set of files the team and the model both read. The compounding doesn't happen because the model is smart; it happens because the loop is written down.
The intent reframe is small in print. In a sprint, it changes which week the hard thinking happens in. Week one stops being "what should we test?" and becomes "what is this surface being asked to do, and by whom?" The variants get easier from there because the question is already narrow.
That's the loop. The unit of work is intent. The artifact is the map. The compounding is the debrief writing back to the map. AI is the reader; the team picks the fights. The vendors are still at the right layer for their tooling. They are wrong about the layer above being optional.