AI Safety Has Never Worked a Change Window
The online AI-safety conversation is loud, sharp, and mostly theoretical. Alignment papers. Benchmark evals. Jailbreak threads. Red-team prompts fired at a model in a sandbox where nothing it does is real.
I’ve spent twenty-eight years on the other side of that line where it’s 2am, the change window closes in ninety minutes, the business is on the bridge call asking when service comes back, and the thing you shipped is doing something nobody predicted.
That’s where AI safety actually lives: in the change window, not the weights.
The sandbox crowd never has to answer the questions that decide the outcome at 2am. When the agent takes a write action against production, what’s the blast radius? Who approved it? Is the rollback clean, or does it need a human who’s now asleep? When it fails at the worst possible moment, does anything observable tell you why, or are you reading model output like tea leaves while the business screams?
A model that scores well on a safety benchmark and a system that’s safe to deploy are not the same object. One is a property of the model. The other is a property of the architecture around it: change control, blast radius, rollback, escalation, a human in the loop who can actually stop it.
The hard problems in AI safety aren’t philosophical. They’re operational. They look like every production incident you’ve ever run, except the thing making the decisions now moves faster than your ability to approve it.
Safety isn’t what the model does in the lab. It’s what survives the change window.
#AISecurity #EnterpriseSecurity #CISO #AgenticAI #ChangeManagement