Every AI conversion tool now says it runs itself. These are the three steps your team keeps.

by , Founder & Growth Lead

Three conversion-software vendors, three current homepages, June 2026. Mutiny: "Your AI agent for creating anything customer-facing." Evolv AI: "The optimization engine that runs itself." Kameleoon: "Build experiments in minutes by chatting with AI." Three different products — an asset generator, a UX optimizer, an experimentation platform. One pitch.

A month ago, when this blog last mapped the field, you could still place a conversion vendor by reading its homepage. The research-led firms talked about research. The automation vendors talked about their tool. That signal is gone. If you're choosing software to lift your store's or product's conversion rate this year — or sitting through pitches for one — this piece is about the question those homepages stopped answering, and the three steps your team keeps in-house whichever tool you end up buying.

This blog's running argument on conversion work, in one line: experiments start with research, not auto-tuning, and the unit of work is intent — what the visitor is trying to get done, upstream of whatever page they landed on. If you haven't read those pieces, that line is all you need. The question this one adds: when every tool claims to run the work end to end, which parts can you actually hand over?

The pitch converged — from both directions

We read ten current homepages across the AI-conversion space for this piece — the personalization vendors, the experimentation platforms, the newer AI-first entrants. The expected finding was that the automation camp had learned to talk like researchers. That is not what the sample shows. With one exception, the sites now lead with what the system does on its own: it senses, decides, generates, allocates.

Fibr AI is the fullest version: "Each page senses intent, makes decisions, and reshapes itself in real time for whoever arrives," powered by "AI Agents [that] analyze your pages, generate automated hypotheses, create variants, and run continuous AI-led experimentation." Pathmonk personalizes "based on visitors' intent." OptiMonk leads with "The AI Popup Builder." Webflow Optimize describes AI that "continuously learns which variations perform best for different visitor segments, automatically allocating traffic." The exception is VWO, which still leads with the unglamorous vocabulary of a testing platform — behavioral analytics, A/B testing, session recordings — and, tellingly, is the one site in the sample whose research claims are backed by research tooling you can point at.

The convergence runs in both directions, which is the part worth slowing down for. Made With Intent — the UK firm whose founder handed this blog the sharpest framing it has on conversion research, designing experiments around intent, not pages — now describes its own product as a decision engine: it "decides who gets what, when," modeling "800+ signals continuously." And Mutiny — covered here when it killed its own SaaS product in April and rebuilt agent-first — no longer mentions intent, audiences, or understanding anywhere on its homepage. The research-side firms moved toward automation language. The automation-side firms moved toward full autonomy. Everyone met at it runs itself.

None of this is deception, and that matters for how you read it. The autonomy is real capability — building and running tests genuinely got faster and cheaper, and we wrote about one of these rebuilds as a smart strategic move, not a retreat. Vendors sell what buyers ask for, and buyers in 2026 ask how much the system can do alone. The problem is narrower: the words on the site no longer tell you where research ends and page-shuffling begins. Every homepage answers how much does it do alone. The question that decides whether you get a lift is a different one.

The question the runs-itself pitch doesn't answer

Where does the hypothesis come from?

A hypothesis, in conversion work, is a plain paragraph your team can write and your CFO can read: these visitors are trying to do this, this surface is failing them, this change should fix it, and if we're right, this number moves. Everything the tool does downstream — variants, traffic splits, significance math — executes that paragraph. Write a sharp one and a mediocre tool will find the lift. Skip it and the best engine on the market runs fast tests on guesses.

Look back at the Fibr line, because it's the honest version of where the category is heading: the agents "generate automated hypotheses." The pitch isn't that the tool executes your thinking faster. It's that the tool does the thinking. A neighboring corner of the AI market sells the same removal of the human, and we covered what a brand gives up when it buys that — the seeing and the stopping. The conversion-work version is quieter: when the tool writes its own hypotheses, your team stops accumulating the one thing the tests were supposed to buy — knowledge of your customers that survives the tool.

That knowledge has a shape. It's not a feeling your team develops — it's three files the work either produces or doesn't.

The flow — the three steps that stay in-house

The research loop doesn't change shape because the tools got more autonomous. What changes is the boundary: which steps the tool now genuinely owns, and which steps were never the tool's to take.

The classic flow. Buy the platform, connect the site, let it run. The dashboard shows lifts. Eighteen months later you switch vendors — pricing, a re-platform, a bad quarter — and discover what the subscription actually covered: the learnings lived inside the vendor's model. Your team can't say which visitor problems were found, what was tried against them, or what's already been ruled out. The next tool starts from zero, on your budget, again.

The AI-native version. Three steps stay with your team. Each one ends in a file you own.

Step one — build the intent map with your own team, from your own customers' words. Once a sprint, cluster what customers actually say — the free-text on forms, support tickets, session recordings, survey answers — into a ranked list of what they're trying to get done and where they're getting stuck. AI is genuinely good at the clustering; your team picks the bottleneck worth attacking. The output is a map the next sprint inherits. Any tool can read it. No tool should own it.

Step two — write the hypothesis before any traffic splits. That plain paragraph, pre-declared: who, what's failing, what changes, what number moves. AI can draft the paragraph — ours usually does. What can't be delegated is the sign-off: someone on your team reads it, agrees the bottleneck is real, and signs it before the test runs. That's the difference between this step and the tool's "automated hypotheses" — not who typed it, but whether anyone you employ ever owned it. This is also the step that absorbs the only vendor-evaluation work you need: when a tool claims it runs itself, ask where its hypotheses come from. "The AI generates them" is an honest answer and a useful one — it tells you you're buying execution and the thinking, and the thinking is the part you can't audit later.

Step three — run the debrief, and write it back into your files. When the test is called, the result edits the intent map: confirmed, ruled out, new stuck-point surfaced. Twenty minutes with your team at the end of a test cycle. This is the step that makes the whole thing compound — and the file that makes switching tools cheap, because the learning lives in your folder, not the vendor's model.

The tool — any of the ten — slots in between steps two and three: building variants, running the split, flagging anomalies. That layer genuinely is better with AI in it, and handing it over is the right call. It's the only layer the runs-itself pitch is actually selling.

The closing edge. A team that keeps the three steps gets compounding that survives vendor churn — every tool it ever plugs in starts from the map instead of from zero, and every dashboard claim gets checked against files the team wrote itself.

Where it breaks. Two failure modes. The first is refusing the tools altogether and treating autonomy language as disqualifying — the execution layer really did get faster and cheaper, and a team hand-building variants in 2026 is paying artisan prices for commodity work. The second is keeping the steps as theater: an intent map nobody updates is a binder on a shelf. The loop compounds only if the debrief actually edits the map.

Install note. In our per-brand work the three steps ship as three working files in the conversion folder of the brand-install — the intent-research brief, the hypothesis template, the debrief that edits the brief. Nothing new ships for this piece; that's been the structure since we codified the loop. What this piece names is why those three live in the brand's own repo and not in any vendor's account: they're the part of conversion work a brand should never be renting.

The discipline left the pitch, not the work

When every homepage in a category makes the same promise, the promise stops helping you choose — that's just what vocabulary convergence does, and the conversion category completed it this year. What hasn't moved is where lifts come from. The research that finds the real bottleneck, the pre-declared paragraph that aims the test, the debrief that banks the learning: that work still exists in every engagement that compounds, whether or not it appears on anyone's homepage.

Run the three steps with your team for one sprint — map, paragraph, debrief — around whatever tool you already pay for. If the tool's wins hold up against a hypothesis your team wrote, you bought well. Either way, you'll know within a sprint, and we'd genuinely like to hear what you find — especially what the vendor said when you asked where the hypotheses come from.

More articles

How an AI content engine actually learns — and when to trust the numbers it shows you

The Learn step of an AI content engine: when to measure (a fast beat for anomalies, a slow beat for decisions, cadenced to your volume), what to measure (signals that survive small samples), and how the loop reallocates toward what won. A number you never act on is a scoreboard, not a feedback loop.

Read more

One check isn't a gate. What it takes to ship AI content safely at volume.

Adding 'an AI review step' isn't safety — a checker that fails for the same reasons as the writer shares its blind spots. How defense in depth (independent layers) and risk-tiering let an AI content engine ship fast without shipping mistakes.

Read more

Tell us about your project

Our offices

  • Cascais
    Rua do Cabo 6
    2755-669 Cascais, Portugal
  • Rio de Janeiro
    Honório de Barros 12
    22250-120, Rio de Janeiro, Brazil