How AI agents became your store’s second audience — and what changes in the CRO sprint when they arrive

by , Founder & Growth Lead

In March, Walmart disclosed a number that should interest anyone who runs conversion work. Products sold directly inside ChatGPT — the much-announced "buy it in the chat" experience — converted three times worse than products where the shopper clicked out to Walmart's own site. The same week, OpenAI quietly retreated from its embedded checkout, saying the first version "did not offer the level of flexibility that we aspire to provide, so we're allowing merchants to use their own checkout experiences while we focus our efforts on product discovery."

There are two ways to read that. The popular reading: agentic shopping was hype, the dashboards stay as they were, carry on. This piece argues the second reading: the purchase didn't move into the chat, but the path to it did — and that puts a new audience on your conversion surface. An agent — software that browses, compares, and increasingly buys on a person's behalf — now reads your store before the person does. If you run CRO sprints, this changes what week one has to look at. That's what this piece is about.

If you've read this blog's earlier CRO pieces, you know the running argument: experiments start with research, not auto-tuning, and the unit of work is intent — what the visitor is trying to do, upstream of whatever page they landed on. If you haven't, that one line is all you need. The question this piece adds: what changes when the visitor carrying that intent is no longer always a person?

What the second audience is — and what it is not yet

Three kinds of agent show up at a store today, and they behave nothing alike.

The first kind never sees your pages at all. When a shopping assistant inside ChatGPT, Gemini, or Copilot recommends a product, it's reading a structured feed — a regularly refreshed file of identifiers, prices, inventory, images, and shipping options that the platform ingests and indexes. OpenAI's own merchant documentation is explicit that required fields ensure correct price and availability display, while "recommended attributes — like rich media, reviews, and performance signals — improve ranking, relevance, and user trust." Your hero image, your exit-intent popup, your A/B test variant: none of it exists in that pipe. Since March, Shopify has switched this on by default for millions of merchants, noting plainly that "these AI surfaces favor structured data, with clean attributes and real-time accuracy."

The second kind does see your pages — through a browser it controls. Agent browsers open your actual site, read your actual product page, and click through your actual checkout. They move faster than a person, dismiss nothing politely, and abandon flows a human would grudgingly push through: blocking overlays, forced account creation, forms that resist autofill.

The third kind is the one that flopped in public — fully embedded checkout inside the chat — and the Walmart number is the honest measure of where it stands. Forrester's mid-2026 read is blunt: "True autonomy is rare" and "hype is running ahead of behavior." Google's own retail lead has put agent-completed commerce at under two percent of digital sales. Walmart EVP Daniel Danker said it directly: "This idea that it will all become automated might be a little bit far-fetched."

So the sober map: agents completing purchases — marginal today. Agents shaping which store the human lands on, with what context, in what mood — already measurable in your analytics, and growing fast.

The two numbers that should reorganize week one

Adobe's analytics arm, working from over a trillion visits to US retail sites, published the pair of figures that make this an experimentation question rather than a futurism question.

First: traffic arriving from AI sources grew 393% year over year in the first quarter of 2026 — and in March it converted 42% better than non-AI traffic, a full reversal from a year earlier, when it converted 38% worse. These visitors also stay longer and read more pages. That profile makes sense once you picture the journey: the person already asked their questions in the chat. They arrive pre-qualified, briefed, deep in consideration. The agent did the persuading; your site's remaining job is confirmation.

Second: the same Adobe research scored how much of a retail site's content is actually legible to the models doing that persuading. Homepages averaged 75%. Product pages — the surface where the buying argument lives — averaged 66%. A third of the content on the average product page is invisible to the thing deciding whether to recommend it. The gap between the best and worst sites was nearly thirty points.

Put those together and the shape of the problem is clear. The highest-converting traffic source in your analytics is fed by a reader you've never run a test for, and that reader can't see a third of your best surface.

This is not the schema conversation again

A fair objection, especially from anyone who has read this blog's pieces on AI-search visibility: didn't we spend May showing that machine-readability prescriptions are mostly theater? Google's own guidance says no special markup is needed for AI search, and a controlled Ahrefs study of 1,885 pages found schema additions moved AI citations not at all.

Both stand. The distinction is the pipe. The schema-as-citation thesis claimed that decorating your existing pages with markup would make answer engines cite you — and it failed every honest test, because those systems read language, not labels. Commerce feeds are a different mechanism: a documented ingestion contract with named required fields, published by the platform that does the ingesting. One is a SaaS layer selling a lever the architecture doesn't have. The other is plumbing the platform tells you to connect, with a spec. Confusing the two leads teams to buy markup audits when what their store needs is accurate inventory in a feed and a checkout an agent can physically complete.

The flow — what an agent-aware CRO sprint looks like

The research loop doesn't change shape. Intent is still the unit of work; the map is still the artifact; the debrief still writes back. What changes is who holds the intent.

The classic flow. AI-referred visits sit unexamined inside the traffic mix — lumped into "referral" or "direct," tested with the same variants as everyone else, on pages designed for a cold first-time reader. Agent failures are invisible: a browser agent that abandoned your checkout doesn't file a complaint; it just buys elsewhere, and the session looks like any other bounce.

The AI-native version. The five steps hold. Two of them deepen.

Week one, the intent map gains a column: whose intent is this — a human browsing, or an agent acting for one? Segment AI-referred sessions before clustering; their questions are already answered and their failure points are different. Add one new week-one input alongside the form responses and session recordings: the agent pass. Point an agent browser at your own store with three real purchase tasks and watch where it stalls. Pull your product feed and read it the way the recommending model does — is the inventory current, are the attributes complete, does the feed say what your best product page says? The same clustering pass that names human intents now also names agent blockers, ranked the same way: frequency times revenue weight.

Weeks two through four run as codified — narrow hypothesis, two or three variants, pre-declared test, called against the rule. The only addition: if the bottleneck is an agent blocker, the "variant" may be an overlay that no longer fires, a guest checkout path, or a feed field that finally matches the page. Unglamorous tests. They move the new column.

Week four, the debrief writes the agent column back into the map, so next sprint's week one starts knowing which audience broke where.

The closing edge. A confirmation-stage human and a software reader are now both standing on a surface built to persuade a cold one. The teams that notice first get to test against both audiences while their competitors are still reading the AI-referral line as a curiosity.

Where it breaks. Two failure modes. First, chasing the autonomy story — rebuilding for agent checkout while agent-completed purchases sit under two percent of the market. The Walmart number is the cautionary tale; check what share of your sessions is AI-referred before promoting any of this above the current bottleneck intent. Second, buying the markup story — paying for schema decoration relabeled as "agent readiness." The real work is duller: a current feed, an unblocked path, a page whose substance survives being read without its design.

Install note. In our per-brand work this ships as one new file in the cro/ skill set inside the brand-install: an agent-readiness check the team and the AI both read — feed accuracy, agent-pass results, the blocker list — feeding the same intent map as everything else. One file, because today this is a column on the map. The file grows as the audience does.

The map gains a column, not a replacement

The discipline this blog keeps arguing for is that conversion work starts upstream of the variant, with whoever is trying to get something done on your surface. For fifteen years there was one answer to who. Now there are two, and the second one reads a feed you've never proofread, abandons checkouts silently, and hands you the best-converting humans in your analytics on the way out.

Walmart's three-times-worse number says the chat didn't replace the store. Adobe's 42% number says what actually happened: the top of your funnel moved into a conversation you can't see, and what arrives at your site afterward needs confirmation, speed, and nothing in the way. Week one's question was "what is this surface being asked to do, and by whom?" The question still stands. By whom just got a second answer.

More articles

The instructions library is the agency's growing methodology — not a deck, not a senior person's head

Most agencies keep their method in a senior person's head or a slide deck — neither compounds. The third option is a library of small instructions files that run: capture a task once, generalize it, reuse it across brands. Garry Tan runs his own work this way; here's the same shape moved into a services firm, and why the growing library is the asset.

Read more

How content actually compounds for a small team — and why most agencies still describe it as a calendar

Two five-layer frames apply to content. Blomfield's loop is how the work gets smarter; Birkett's surface — Strategy, Enablement, Execution, Feedback, Repurposing — is what a small team has to staff. Here's the operating model that compounds, and the layer most teams under-staff.

Read more

Tell us about your project

Our offices

  • Cascais
    Rua do Cabo 6
    2755-6669 Cascais, Portugal
  • Rio de Janeiro
    Honório de Barros 12
    22250-120, Rio de Janeiro, Brazil