How do you let AI agents act on your accounts and stay accountable for what they do? The kill switch.
by Ygor Fonseca, Founder & Systems Lead
A few weeks ago we argued that the thing an AI-native firm actually hands a brand is the per-client install — a folder of plain text the AI reads on every task, with five working parts. The last of the five was the kill switch: the short list of moves the AI is never allowed to make without a person signing off. We named it and moved on. It deserves its own post, because the moment an agent can act on a brand's real accounts — publish a post, send a campaign, move ad budget — the kill switch stops being a footnote and becomes most of the reason the arrangement is safe to enter at all.
Here is the tension this post is about. One loud corner of the AI market sells the opposite of a kill switch. The pitch is autonomy: hire the agent, describe the job, let it run. Artisan markets Ava as "the autonomous AI BDR" who "finds leads, sends personalized outreach, handles objections and books meetings." Verse sells "agentic AI employees from a single prompt" that "run complex workflows without code." What's being sold is the removal of the human. And for a brand deciding whether to let software touch its customers and its budget, that's the wrong thing to optimize — because the question a brand actually asks before handing over the keys isn't how autonomous is it. It's what happens the first time it gets one wrong, and will I even know.
What accountability looks like when an agent can act
The honest answer to "what happens when it gets one wrong" isn't a better prompt or a smarter model. It's a layer of accountability the firm installs before the agent does anything, and shows the brand on the first day. Four parts, none of them exotic.
Scoped access. The agent can reach only the accounts it has been explicitly given — this ad account, this content queue, this inbox — and nothing adjacent. The blast radius is set on purpose, before the first task, not discovered after an incident.
An audit log. Every action the agent takes is written down: what it did, when, and under which rule it was allowed to. Not a summary a human composes afterward — the actual record, generated as the work happens, that anyone on the engagement can read.
Approval gates. The consequential moves stop and wait for a person. We've described that gate before as a list of things the AI can't do without sign-off — never publish, never send past a small list, never spend over a threshold. That list is real, and it's where most of the discipline lives day to day. But it's only half of accountability. The other half is that you can see everything the agent did and trace it to a decision — and, when you need to, end the whole thing in one move.
It helps to be concrete about what this guards against, because the failures that matter here are rarely dramatic. They're quiet: an email sequence that keeps going out to people who already unsubscribed, ad budget drifting toward the wrong audience before anyone looks, a booking button that quietly breaks after a config change and takes a day to notice. Nobody gets paged, because nothing crashed — the system just kept confidently doing the wrong thing. That's the failure mode the accountability layer is built for: not the agent that falls over, but the one that runs smoothly in the wrong direction. A crash you'd catch in an hour; a quiet drift you catch in the audit log, or not until it's already cost something.
The kill switch is one command, and the brand holds it
That last move is the part the term "kill switch" actually names, and it's the part the public version of this argument has under-described. The first three parts make the agent's behavior legible. The kill switch makes it reversible.
A real kill switch is a single command — or a single short document a non-technical person can follow — that revokes every piece of access the agent has at once. Not "pause the schedule." Not "open a ticket." Off, completely, in one step, without waiting for the firm that built it to be awake.
Two things about it matter more than the mechanism. First, the brand holds it, not the firm — an accountability control the vendor alone can operate isn't accountability; it's a promise. Second, you should be shown it on day one, before the first task runs, in the same conversation where the firm shows exactly what access the agent needs and why. A kill switch a brand only hears about after the first scare was never doing its job.
It's also the claim easiest to fake, so it's worth testing — whoever you're evaluating, including us. Ask to watch the revoke actually run, not just to be told it exists. Ask who else can trigger it, and whether it cuts every access at once or only pauses a schedule. A firm that can show you the switch working on day one is telling you something a deck can't; one that hedges is telling you something too.
Why the gate is the product, not the safety net
This is what the autonomy pitch gets backwards. It frames the human as overhead — the friction a smarter agent will eventually remove. But a brand letting AI act on its accounts is paying for more than output volume; an unsupervised agent can generate plenty of that on its own. What the brand is buying is outcomes it can put its name behind, and the accountability layer is what makes those outcomes trustworthy at its actual scale. The gate isn't the cost of the service. It's the service.
Anthropic, in its published guidance on agent design, makes the engineering version of the same point: use the simplest workflow that solves the problem, and reach for an agent only where the work genuinely requires the dynamism — because agents trade higher capability for higher cost, latency, and risk. The discipline of not making something autonomous when it doesn't need to be is the discipline that keeps the failure surface small.
It's worth noticing where the market itself is moving. The same wave selling "set it and forget it" at its loudest edge has a more mature end that has quietly started leading with language like "managed by your team" — agents and human experts working the account together. The retreat toward accountability is already underway, usually learned the expensive way, after the first incident. The only real difference is whether you install it on day one or after the thing you can't take back.
The flow — the life of one agent action
The classic version. A consequential change to a brand's accounts has two states: a person does it by hand, slowly, or a tool does it unsupervised and you find out later whether it was right. There is no third option, so most teams pick the slow one and call the fast one reckless.
The AI-native version.
- Decide, then act or stop. The agent reaches a decision — say, shifting budget between two campaigns. The install tells it which decisions it may carry out and which it must hold. Routine, in-scope: it acts. Consequential: it stops and escalates.
- Log it either way. The action, or the held proposal, is written to the audit log the moment it happens — what, when, under which rule — whether or not a human is watching right then.
- Approve the consequential ones. The held move waits for a person, who sees the proposal and the reasoning and either releases it or doesn't. Nothing expensive ships on the agent's say-so alone.
- Execute inside the lines. Once cleared, the action runs — only against the accounts the agent was scoped to, never one step wider.
- Review a sample. A standing pass reads a sample of what shipped against what the rules intended, because the dangerous failures are the quiet ones, not the loud ones.
- Revoke if needed. Under all of it sits the one command. If anything looks wrong at any step, the brand pulls every access at once — and the log is still there to read afterward.
The closing edge. None of this slows the routine work; the agent still acts on its own across everything inside its scope. The six steps bear weight only on the moves a brand would want to point to later — the few decisions expensive enough to be worth a person's name on them. Most of the work runs at machine speed, and the small set that matters stays legible and reversible.
Where it breaks. The weak link is the audit log nobody reads. A log that's written but never opened gives you the paperwork of accountability without the thing itself — you can reconstruct what went wrong only after it already did. The fix is to make reading it someone's standing job, the same discipline that keeps the rest of the install honest: the record only counts if a person actually looks.
Install note. This is the accountability layer of the per-client install — the same shape Codified Engagement ships for every brand. Everything else is the work; this is what makes it safe to let the work run.
An agent that can act is only ever as safe as what you can see it doing and how fast you can stop it. The autonomy pitch sells you the acting and stays quiet about the rest. Install the seeing and the stopping first — on day one, before the first task — and letting agents act stops being a leap of faith and becomes a decision a brand can stand behind.