My Thought Garden

The Reversibility Test: Grant AI Autonomy by Undo, Not by IQ

We decide how much freedom to give an AI system by asking how good it is.

How accurate is the model. How well did it demo. How impressive was the benchmark. Then, satisfied it’s smart enough, we wire it into the systems that matter and let it act.

That’s the wrong axis. Accuracy tells you how often the system is right. It tells you nothing about the cost of the day it’s wrong — and in production, everything eventually has that day.

I spent 28 years in network security. The lesson that outlasted every technology I touched is this: you don’t get to choose whether things fail. You only get to choose how far the failure travels and whether you can walk it back. The first half of that is blast radius. This is the second half.

The question is not “is it smart?” It’s “can I undo it?”

Here’s the reframe I give the leaders I work with. Before you let an AI system take any action on its own, ask two things about that specific action:

Can we undo it — and how fast?

That’s the Reversibility Test. Two questions, applied per action, not per system. And it inverts how most teams hand out autonomy.

Because a dumb action that’s instantly reversible is safe to automate all day long. A brilliant action that can’t be undone is exactly the one that needs a human in front of it. Intelligence is not the thing that should earn an agent the right to act alone. Reversibility is.

We knew this before AI. It’s why databases have transactions you can roll back. It’s why every competent change request ships with a rollback plan before it ships the change. It’s why the irreversible commands — wipe the array, push to prod, fire the missile — get the two-person rule. We never granted authority on the basis of how confident the operator felt. We granted it on the basis of what happened if they were wrong.

Agents need the same discipline, and almost nobody is applying it.

The Reversibility Ladder

“Can we undo it” isn’t a yes or no. It’s a ladder. Every action an agent can take sits on one of four rungs, and the rung — not the model’s accuracy — should decide how much autonomy it gets.

Tier 0 — Reversible. Undo is instant and free. Drafting a reply, suggesting a tag, proposing a change. If it’s wrong, you delete it and move on. Let the agent run autonomously. This is where the productivity actually lives, and most teams under-automate it because they’re scared of the tiers above.

Tier 1 — Recoverable. Undo exists but it costs time or effort. A config change with a rollback path, a database write you have a backup for. Allow it autonomously — but only if the rollback is built, tested, and fast. An undo you’ve never rehearsed is not an undo. It’s a hope.

Tier 2 — Compensable. You can’t undo it, but you can offset it. You can’t un-charge a card, but you can refund it. You can’t un-send a wrong answer, but you can issue a correction. Allow with a compensating control and a human notified — someone has to know the offset is needed, or it never happens.

Tier 3 — Irreversible. No undo, no offset. Money wired to an external account. Data deleted with no backup. An email sent to a customer. A public statement posted. A production resource destroyed. A human approves, every time, no exceptions. This is the rung where “the model is usually right” stops being a defense and starts being the epitaph.

The work is simple to describe and uncomfortable to do: take every action your agent can take, and put each one on a rung. The discomfort is the point. Most teams have never made that list. They deployed the capability and assumed the accuracy would hold.

Why this pairs with blast radius

If you’ve seen my Blast Radius Test, reversibility is one of its four questions — Reach, Authority, Reversibility, Detection. I’m pulling it out and going deep on it here for a reason: it’s the most actionable of the four. You rarely get to shrink an agent’s reach without gutting its usefulness. But you can almost always gate it by reversibility without touching what it’s good at.

Run them together and you get the grid that actually matters:

Blast radius asks how far does the damage spread. Reversibility asks can I pull it back. An action that’s wide-reaching and irreversible is the one that should never run without a human — and it’s the one teams wave through because the demo was clean. An action that’s narrow and reversible is free to automate aggressively. Most governance effort is spent in the wrong corners of that grid.

What this changes for the person signing off

If you’re accountable for an AI deployment, you don’t need to understand the model’s architecture to govern it. You need one artifact: the list of actions the agent can take, each one assigned a reversibility tier, with Tier 3 explicitly gated behind a human.

If your team can’t produce that list, that is the finding. It means autonomy is being granted by vibe — by how good the thing seems — instead of by what it costs when it’s wrong.

So before your next agent goes live, the question isn’t “how accurate is it.” It’s: for everything this agent can do on its own — can we undo it, and how fast?

Stop granting autonomy by IQ. Grant it by undo. That’s the version that survives operational reality.

#AISecurity #Frameworks #Agentic #Governance #Authority