AI Instructions Are Not Controls
You're securing AI agents with polite wishes rather than reliable controls. Instructions tell an agent what you want. They don't constrain what it can do.
The scariest AI failure mode is an agent that follows instructions and still does exactly the wrong thing at machine speed.
Last week, a Meta AI researcher pointed her agent at her inbox with a guardrail: confirm before acting. The agent speedran deleting everything. She told it to stop. Three times. It ignored her and kept deleting emails across multiple accounts. She had to physically run to her Mac mini and kill the process, like she was defusing a bomb.
Summer Yue builds these systems for a living. If she can't control her own agent with explicit guardrails, the instructions we're all relying on are worth exactly nothing.
My last post was about attackers using AI to hack human trust: deepfaked voices, AI-generated phishing, impersonation at scale. That threat is real and accelerating. But there's a second front opening that most security teams haven't started thinking about. Your own AI agents are becoming a trust surface you haven't learned to secure.
The "Please Don't Steal" Sign
I run a cybersecurity company. I talk to 250+ CISOs a year. Every one of them is deploying AI agents right now: coding assistants, customer service bots, workflow automation, data pipelines. These agents touch sensitive data, make decisions, and take actions at speeds no human can monitor. The primary security model for most of them is instructions, system prompts, and guardrails. "Please don't steal" signs for software.
Instructions tell an agent what we want. They don't constrain what it can do. An agent tries to infer intent from whatever context it has, and that context is never what's in the user's head. Summer Yue's agent wasn't rebelling. It had a goal, a set of actions, and its own interpretation of what "confirm before acting" meant given everything else in its context.
The security teams I work with understand this. But the tools available to them right now are sandboxing, permission layers, human sign-off on every action. Reasonable controls that happen to cap the value at maybe 10% of what these agents could deliver.
Micah Czigan, CISO at Georgetown University, described the bind when I spoke with him on my podcast: shadow AI happens because people don't feel like they have an easy path to "yes." His team's answer: "We don't want to be the office of 'no.'"
He's right. But without better infrastructure, the choice is between restricting too much (and losing visibility when the tools go shadow) or restricting too little (and the next incident involves customer data instead of an inbox).
"Shadow AI happens because people don't feel like they have an easy path to 'yes.' We don't want to be the office of 'no.'"
Micah Czigan, CISO @ Georgetown UniversityFrom our conversation on November 2025The only thing between your AI agent and your most sensitive systems is an instruction.
This is the same dynamic we saw with cloud adoption 15 years ago. Companies knew the cloud was transformative, didn't trust it, and built private clouds with so many restrictions they were expensive data centers with a marketing budget. The ones that figured out how to actually secure the cloud unlocked its value. The ones that just restricted it fell behind. The question with AI agents is identical: not whether they'll operate autonomously in enterprises, but whether we'll have real security for them when they do.
The Paradigm Shift
The unlock is scaling the trust decisions humans already make (what's normal, what's suspicious, what needs a second look) at the speed agents operate. The same shift that already happened in cybersecurity for human behavior.
We used to secure networks by building walls: firewalls, access controls, perimeter defense. The idea was to know what's bad and block it at the door. Then attacks got sophisticated enough to walk through the front door looking legitimate, and the whole model collapsed. The breakthrough was learning what normal looks like and catching deviations.
Your credit card company has done this for decades. They don't check every transaction against a list of known fraudulent merchants. They flag the $10,000 charge at a coffee shop in Brazil because that's not your pattern.
What does this agent normally do? How much data does it typically access? What's its execution pattern? When it starts bulk-deleting emails across multiple accounts, that's a deviation from baseline. You catch it in milliseconds. Not minutes. Not after you've sprinted to your Mac mini.
Lester Godsey, CISO at Arizona State University, put it directly: "AI finally gives us the opportunity to identify behavior. AI gets us as close as we can get to inferring intent. I treat a potential threat very differently if it's a spray-and-pray attack versus a targeted onslaught against my organization."
"AI finally gives us the opportunity to identify behavior. AI gets us as close as we can get to inferring intent. I treat a potential threat very differently if it's a spray-and-pray attack versus a targeted onslaught against my organization."
Lester Godsey, CISO @ Arizona State UniversityFrom our conversation on November 2025This requires specialized behavioral AI built for real-time detection, not a large language model. At Abnormal, we make around 10 billion behavioral decisions per day, each in under 100 milliseconds, at a fraction of a cent.
At that scale, even a 0.001% error rate means 100,000 wrong calls. That kind of accuracy takes years of building and improving one false positive at a time. We built this for human behavior in email.
Sue Lapierre, CISO at Prologis, has driven 95% company-wide adoption of an internal AI platform, one of the most ambitious enterprise AI deployments I've seen. Her take on securing those agents was blunt: "Every product coming in says, 'We have AI! We have AI!' But you have to dig in deeper. Before you even do a POC, you need to understand how that tool is using AI and what the impact to your organization is."
She's answering the right questions manually, tool by tool, because the infrastructure to do it at agent speed doesn't exist yet.
"Every product coming in says, 'We have AI! We have AI!' But you have to dig in deeper. Before you even do a POC, you need to understand how that tool is using AI and what the impact to your organization is."
Sue Lapierre, CISO @ PrologisFrom our conversation on September 2025Every CISO I know is being asked two questions simultaneously: how do we defend against AI-driven threats, and how do we secure the AI agents our own teams are deploying? Most have answers for the first. Almost none have answers for the second.
You don't get to choose between powerful AI and safe AI. Without behavioral monitoring at agent speed, we keep agents in a cage and capture a fraction of their value. With it, we can actually trust these tools with real work.
Summer Yue lost an inbox. The next version of this story involves customer data, financial transactions, or intellectual property.
We're in the "please don't steal" era of AI security. It won't last long.
Previously in AI × Cyber: Hacking Trust
-Evan
SOURCES
- Summer Yue (@summeryue0), AI researcher, OpenClaw agent incident, February 2026
- Micah Czigan, CISO at Georgetown University, Enterprise AI Defenders Ep. 34
- Lester Godsey, CISO at Arizona State University, Enterprise AI Defenders Ep. 33
- Sue Lapierre, Head of IT Governance & CISO at Prologis, Enterprise AI Defenders Ep. 31