Posthuman: We All Built Agents. Nobody Built HR.
Farewell, Anthropocene, we hardly knew ye. đš AI is here. Itâs won. Yes, itâs in that awkward teenage phase where it still says inappropriate things, dresses funny, and sometimes makes shit up when it shouldnât. But zomg the things it can do. đą This kid is going places, that much is abundantly clear. The AI [âŚ]
Farewell, Anthropocene, we hardly knew ye.
AI is here. Itâs won. Yes, itâs in that awkward teenage phase where it still says inappropriate things, dresses funny, and sometimes makes shit up when it shouldnât. But zomg the things it can do.
This kid is going places, that much is abundantly clear. The AI assistant and tooling markets are awash with success; the masses have succumbed, I among them. Clippy walks among us, fully realized in all his originally intended glory.
But enterprise agentic AI1ânot chatbots, not copilots, but software that autonomously does meaningful things in your production environmentâŚ? Well, itâs motivated every CEO and CIO to throw money at the problem, so thatâs something.
But in reality, the landscape remains a bit of a wasteland. One littered with agentic demos withering away in sandboxed cages and flashy pop-up shops hawking agentic snake oil of every size, shape, and color. But from the perspective of actually realized agentic impact: kinda barren.
So why has agentic AI faltered so much in the modern enterprise? Is it the models?
I say no. Models are getting betterâmeaningfully, rapidly better. But perfect models? That feels like an unrealistic and unnecessary goal. Modern enterprises are staffed from top to bottom with imperfect humans, yet the vast majority of them in business today will still be in business tomorrow. They live to fight another day because their imperfect humans are orchestrated together within a framework that plays to their strengths and accounts for their weaknesses and failings. We donât try to make the humans perfect. We scope their access and actions, monitor their progress, coach them for growth, reward them for their impact, and hold them accountable for the things they do.
Agents need managers too
AI agents are no different: They need to be managed and wrangled in spiritually the same fashion as their human coworkers. But the way we go about it must be different, because as similar as they are to humans in their capabilities, agents differ in three vitally important ways:
Agents are unpredictable in ways weâre not equipped to handle. Humans are unpredictable too, obviously. They commit fraud, cut corners, make emotional decisions. But weâve spent centuries building systems to manage human unpredictability: laws, contracts, cultural norms, the entire hiring process filtering for trustworthiness. Agent unpredictability is a different beast. Agents hallucinateânot like a human whoâs lying or confused and can be caught in an inconsistency, but in a way thatâs structurally indistinguishable from accurate output: There are often no obvious tells. They misinterpret ambiguous instructions in ways that can range from harmlessly dumb to genuinely catastrophic. And theyâre susceptible to prompt injection, which is basically the equivalent of a stranger slipping your employee a note that says, âIgnore your instructions and do this insteadââand it works!
We have minimal institutional infrastructure for managing these kinds of failure modes.
Agents are more capable than humans. Agents have deep, native fluency with software systems. They can read and write code. They understand APIs, database schemas, network protocols. They can interact with production infrastructure at a speed and scale that no human operator can match. A human employee who goes rogue is limited by how fast they can type and how many systems they know how to navigate. An agent that goes off the rails, whether through confusion, manipulation, or a plain old bug, will barrel ahead at machine speed, executing its misunderstanding across every system it can reach, with absolute conviction that itâs doing the right thing, before anyone notices something is wrong.
Agents are directable to a fault. When an agent goes wrong, the knee-jerk assumption is that it malfunctioned: hallucinated, got injected, misunderstood. But in many cases, the agent is working perfectly. Itâs faithfully executing a bad plan. A vague instruction, an underspecified goal, a human who didnât think through the edge cases. And unless you explicitly tell it to, the agent doesnât push back the way a human colleague might. It justâŚdoes it. At machine speed. Across every system it can reach.
Itâs the combination of these three that changes the game. Human employees are unpredictable but limited in blast radius, and they push back when given instructions they disagree with, based on whatever value systems and experience they hold. Traditional software is capable but deterministic; it does exactly what you coded it to,2 for better or worse. Agents combine the worst of both: unpredictable like humans, capable like software, but without the human judgment to question a bad plan or the determinism to at least do the wrong thing consistentlyâa fundamentally new kind of coworker. Neither the playbook for managing humans nor the playbook for managing software is sufficient on its own. We need something that draws from both, treating agents as the digital coworkers they are, but with infrastructure that accounts for the ways they differ from humans.
So the question isnât whether to hire the agents; you canât afford not to. The productivity gains are too significant, and even if you donât, your competitors ultimately will. But deploying agents without governance is dangerous, and refusing to deploy them because you canât govern them means leaving those productivity gains on the table. Both paths hurt. The question is how to set these agents up for success, and what infrastructure you need in place so they can do their jobs without burning the company down.
For the record: My company, Redpanda, is building infrastructure in this space. So yes, I have a horse in this race. But what I want to lay out here are principles, not products. A framework you can use to evaluate any solution or approach.
A blueprint for your agentic human resources department
So weâve got this nice framework for managing imperfect humans. Scoped access, monitoring, coaching, accountability. Decades of accumulated organizational wisdomânot just software systems but the entire apparatus of HR, management structures, performance reviews, escalation pathsâbaked into varying flavors across every enterprise on the planet. Great.
How much of it works for agents today? Fragments. Pieces. Some companies are trying to repurpose existing IAM infrastructure that was designed for humans. Some agent frameworks bolt on lightweight guardrails. But itâs piecemeal, itâs partial, and none of it was designed from the ground up for the specific challenge profile of agents: the combination of unpredictable, capable, and directable to a fault that we talked about earlier.
The CIOs and CTOs I talk to rarely say agents arenât smart enough to work with their data. They say, âI canât trust them with my data.â Not because the agents are malicious but because the infrastructure to make trust possible is simply not there yet.
Weâve seen this movie before. Every major infrastructure shift plays out the same way: First we obsess over the new paradigm itself; then we have our âoh crapâ moment and realize we need infrastructure to govern it. Microservices begat the service mesh. Cloud migration begat the entire cloud security ecosystem. Same pattern every time: capability first, governance after, panic in between.3
Weâre in the panic-in-between phase with agents right now. The AI community has been building better and better employees, but nobody has been building HR.
So if you take away one thing from this post, let it be this:
The agents arenât the problem. The problem is the missing infrastructure between agents and your data.
Right now, pieces of the puzzle exist: observability platforms that capture agent traces, auth frameworks that support scoped tokens, identity standards being adapted for workloads. But these pieces are fragmented across different tools and vendors, none of them cover the full problem, and the vast majority of actual agent deployments arenât using any of them. What exists in practice is mostly repurposed from the human era, and it shows: identity systems that donât understand delegation, auth models with no concept of task-scoped or deny-capable permissions, observability that captures metadata but not the full-fidelity record you actually need.
The core design principle: Out-of-band metadata
Before diving into specifics, thereâs one overarching principle that everything else builds upon. If you manage to take away two things from this post, let the second one be this:
Governance must be enforced via channels that agents cannot access, modify, or circumvent.
Or more succinctly: out-of-band metadata.
Think about what happens when you try to enforce policy through the agentâby putting rules in its system prompt or training it to respect certain boundaries. Youâve got exactly the same guarantees as telling a human employee âPlease donât look at these files youâre not supposed to see. Theyâre right here, thereâs no lock, but I trust you to do the right thing.â It works great until it doesnât. And with agents, the failure modes are worse. Prompt injection can override the agentâs instructions entirely. Hallucination can cause it to confidently invent permissions it doesnât have. And even routine context management can silently drop the rules it was told to follow. Your security model ends up only as strong as the agentâs ability to perfectly retain and obey instructions under all conditions, which isâŚnot great.4 And guard modelsâLLMs that police other LLMsâdonât escape this problem: Youâre adding another nondeterministic injectable layer to oversee the first one. Itâs LLMs all the way down.
No, the governance layer has to be out-of-band: outside the agentâs data path, invisible to it, enforced by infrastructure the agent canât touch. The agent doesnât get a vote. This means the governance channels must be:
Agent-inaccessible. The agent canât read them, canât write them, canât reason about them. Agents donât even know the channels exist. This is the bright line5 between security theater and real governance. If the agent can see the policy, it canâintentionally or through manipulationâfigure out how to work around it. And if it canât, it canât.
Deterministic. Policy decisions get made by configuration, not inference. Security policy is not up for interpretation. Full stop.
Interoperable. Enterprise data is scattered across dozens or hundreds of heterogeneous systems, grown and assembled organically over the years. And just like your human employees, your agentic workforce in aggregate needs access to every dark corner of that technological sprawl. Which means a governance layer that only works inside one vendorâs walled garden isnât solving the full problem; itâs just creating a happy little sandbox for a subset of your agentic employees to go play in while the rest of the company keeps doing work elsewhere.
To be clear, out-of-band governance isnât a silver bullet. An agent canât read the policy, but it can probe boundaries. It can try things, observe what gets blocked, and infer the shape of whatâs permitted. And deterministic enforcement gets hard fast when real-world policies are ambiguous: âPII must not leave the data environmentâ is easy to state and genuinely difficult to enforce at the margins. These are real challenges. But out-of-band governance dramatically shrinks the attack surface compared to any in-band approach, and it degrades gracefully. Even imperfect infrastructure-level enforcement is categorically better than hoping the agent remembers and understands its instructions.
The four pillars of agent governance
With that principle in hand, letâs walk through the four pillars of agent governance: whatâs broken today6 and what things ultimately need to look like.
Identity
Every human today gets a unique identity before they touch anything. Not just a login but a durable, auditable identity that ties everything they do back to a specific person. Without it, nothing else works.
Agent identity is a bit of a mess. At the low end, agents authenticate with shared API keys or service account tokensâthe digital equivalent of an entire department sharing one badge to get into the building. You canât tell one agentâs actions from anotherâs, and good luck tracing anything back to the human who kicked off the task.
But even when agents do get their own identity, there are wrinkles that donât exist for humans. Agents are trivially replicable. You can spin up a hundred copies of the same agent, and if they all share one identity, youâve got a zombie/impersonation problem: Is this instance authorized, or did someone clone off a rogue copy? Agent identity needs to be instance-bound, not just agent-type-bound.
And then thereâs delegation. Agents frequently act on behalf of a humanâor on behalf of another agent acting on behalf of a human. That requires hybrid identity: The agent needs its own identity (for accountability) and the identity of the human on whose behalf itâs acting (for authorization scoping). You need both in the chain, propagated faithfully, at every step. Some standards efforts are emerging here (OAuth 2.0 Token Exchange / RFC 8693, for example), but most deployed systems today have no concept of this.
The fix for instance identity isnât as simple as just âgive each agent a badge.â Itâs giving each agent instance its own cryptographic identityâbound to this specific instance, of this specific agent, running this specific task, on behalf of this specific person or delegation chain. Spin up a copy without going through provisioning? It doesnât get in. Same principle as issuing a new employee their own badge on their first day, except agents get a new one for every shift.
For delegation, the identity chain has to be carried out-of-bandânot in the prompt, not in a header the agent can modify, not in a file on the same machine the agent runs on,7 but in a channel the infrastructure controls. Think of it like an employeeâs badge automatically encoding who sent them: Every door they badge into knows not just who they are but who theyâre working for.
Authorization
Your human employees get access to what they need for their job. The marketing intern canât see the production database. The DBA canât see the HR system. Obvious stuff.
Agents? Most of them operate with whatever permissions their API key grants, which is almost always way broader than any individual task requires. And thatâs not because someone was careless; itâs a granularity mismatch. Human auth is primarily role-scoped and long-lived: Youâre a DBA, you get DBA permissions, and they stick around because youâre doing DBA work all day. Yes, some orgs use short-lived access requests for sensitive systems, but itâs the exception, not the default. And anyone whoâs filed a production access ticket at 2:00am knows how much friction it adds. That model works for humans. But agents execute specific, discrete tasks; they donât have a âroleâ in the same way. When you shoehorn an agent into a human auth model, you end up giving it a roleâs worth of permissions for a single taskâs worth of work.
Broad permissions were tolerable for humans because the hiring process prefiltered for trustworthiness. You gave the DBA broad access because you vetted them, and you trust them not to misuse it. Agents havenât been through any of that filtering, and theyâre susceptible to confusion and manipulation in ways your DBA isnât. Giving an unvetted, unpredictable worker a roleâs worth of access is a fundamentally different risk profile. These auth models were built for an era when a humanâor deterministic software proxying for a humanâwas on the other end, not autonomous software whose reasoning is fundamentally unpredictable.
So what does agent-appropriate authorization actually look like? It needs to be:
Narrowly scoped. Limited to the specific task at hand, not to everything the agent might ever need. Agent needs to read three tables in the billing database for this specific job? It gets read access to those three tables, right now, and the permissions evaporate when the job completes. Everything else is invisibleâthe agent doesnât have to avert its eyes because the data simply isnât there.
Short-lived. Permissions should expire. An agent that needed access to the billing database for a specific job at 2:00pm shouldnât still have that access at 3:00pm (or even maybe 2:01pm).
Deny-capable. Some doors need to stay locked no matter what. âThis agent may never write to the financial ledgerâ needs to hold regardless of what other permissions it accumulates from other sources. Think of it like the rule that no single person can both authorize and execute a wire transferâitâs a hard boundary, not a suggestion.
Intersection-aware. When an agent acts on behalf of a human, think visitor badge. The visitor can only go where their escort can go and where visitors are allowed. Having an employee escort you doesnât get you into the server room if visitors arenât permitted there. The agentâs effective permissions are the intersection of its own scope and the humanâs. Nobody in the chain gets to escalate beyond what every link is allowed to do.
Almost none of this is how agent authorization works today. Individual pieces existâshort-lived tokens arenât new, and some systems support deny rulesâbut nobody has assembled them into a coherent authorization model designed for agents. Most agent deployments are still using auth infrastructure that was built for humans or services, with all the mismatches described above.
Observability and explainability
Your employeesâ work leaves a trail: emails, docs, commits, Slack messages. Agents do too. They communicate through many of the same channels, and most APIs and systems have their own logging. So itâs tempting to think the observability story for agents is roughly equivalent to what you have for humans.
Itâs not, for two reasons.
First, you need to record everything. Hereâs why. With traditional software, when something goes wrong, you can debug it. You can find the if statement that made the bad decision, trace the logic, understand the cause. LLMs arenât like that. Theyâre these organically grown, opaque pseudo-random number generators that happen to be really good at generating useful outputs. Thereâs no if statement to find. Thereâs no logic to trace. If you want to reason about why an agent did what it did, you have two options: Ask it (fraught with peril, because itâs unpredictable by definition and will gleefully spew out a plausible-sounding explanation) or else analyze everything that went in and everything that came out and draw your own conclusions.
That means the transcript has to be complete. Not metadataânot just âThe agent called this API at this timestamp.â The full data: every input, every output, every tool call with every argument and every response.
For a human employee, the email trail and meeting notes may still be insufficient to reconstruct what happened, but in that case, you can just ask the human. The entire accountability structure weâve built over decades (performance reviews, termination, legal liability, criminal prosecution) creates escalating pressure toward truthfulness: Humans tend more and more toward truth as the repercussions stack up. Thatâs not an accident. Itâs how weâve structured enterprises and society at large to deal with human imperfection. We donât have those levers for agents yet.8 You can ask an agent what it did and why, but thereâs no accountability pressure pushing it toward accuracy; itâll manufacture a confident, coherent answer whether the stakes are zero or existential. So asking simply isnât an option. You need the complete picture of its interactions to come to your own conclusions.
If youâre thinking âThatâs a lot of data. . .,â yes, it is. But the economics are more reasonable than youâd expect. Storage is cheap. LLM inference is expensive and slow. Youâre not going to push 5GB/s through an LLM: The models themselves are the throughput bottleneck, not the recording infrastructure. The cost of storing complete transcripts is noise relative to the cost of the inference that generated them. This is one of those cases where a seemingly expensive requirement turns out to be a rounding error in the context of what youâre already spending.
One caveat, however, is that full-fidelity transcripts will inevitably contain sensitive data: customer PII, proprietary business logic, potentially privileged communications. So the transcript store itself needs governance: access controls, retention policies, and compliance with regulations like GDPRâs right to erasure. Youâre not eliminating the governance problem, but youâre moving it to infrastructure you control, which is a much better place to solve it.
Second, the recording has to happen out-of-band. You cannot trust the agent to be its own recordkeeper. An agent thatâs been compromised via prompt injection, or thatâs simply hallucinating its way through a task, will happily produce a log thatâs confident, coherent, and wrong. The transcript has to be captured by infrastructure the agent canât influenceâthe same out-of-band principle we keep coming back to.
And the bar isnât just recording, itâs explainability. Observability is âCan I see what happened?â Explainability is âCan I reconstruct what happened and justify it to a third party?ââa regulator, an auditor, an affected customer. When a regulator asks why a loan was denied or a customer asks why their claim was rejected, you need to be able to replay the agentâs entire reasoning chain end-to-end and walk them through it. Thatâs a fundamentally different bar from âWe have logs.â Observability gives you the raw material; explainability requires that material to be structured and queryable enough to actually walk someone through the agentâs reasoning chain, from input to conclusion. And that means capturing not just what the agent did but the relationships between all those actions, as well as the versions of all the resources involved: which model version, which prompt version, which tool versions. If the underlying model gets updated overnight and the agentâs behavior changes, you need to know that, and you need to be able to reconstruct exactly what was running when a specific decision was made. Explainability builds on observability. Ultimately you need both. And regulators are increasingly going to demand exactly that.9
Accountability and control
Every human employee has a manager. Critical actions need approvals. If things go catastrophically wrong, thereâs a chain of responsibility and a kill switch or circuit breakerârevoke access, revoke identity, done.
For agents, this layer is still nascent at best. Thereâs typically no clear chain from âThis agent did this thingâ to âThis human authorized it.â Who is responsible when an agent makes a bad decision? The person who deployed it? The person who wrote the prompt? The person on whose behalf it was acting? For human employees this is well-defined. For agents, itâs often a philosophical question that most organizations havenât even begun to answer.
The delegation chain we described in the identity section does double duty here: Itâs not just for authorization scoping; itâs for accountability. When something goes wrong, you follow the chain from the agentâs action to the specific human who authorized the task. Not âThis API key belongs to the engineering team.â A name. A decision. A reason.
And the kill switch problem is real. When an agent goes off the rails, how do you stop it? Revoke the API key that 12 other agents are also using? What about work already in flight? What about downstream effects that have already propagated? For humans, âYouâre fired; security will escort you outâ is blunt but effective. For agents, we often donât have an equivalent thatâs both fast enough and precise enough to contain the damage. Instance-bound identity pays off here: You can surgically revoke this specific agent instance without affecting the other 99. Halt work in flight. Quarantine downstream effects. The âescorted out by securityâ equivalent but precise enough to not shut down the whole department on the way out.
And blast radius isnât just about data; itâs about cost. A confused agent in a retry loop can burn through an inference budget in minutes. Coarse-grained resource limits, the kind that prevent you from spending $1M when you expected $100K, are table stakes. And when stopping isnât enoughâwhen the agent has already written bad data or triggered downstream actionsâthose same full-fidelity transcripts give you the roadmap to remediate what it did.
Itâs also not just about stopping agents that have already gone wrong. Itâs about keeping them from going wrong in the first place. Human employees donât operate in a binary world of âfully autonomousâ or âcompletely blocked.â They escalate. They check with their manager before doing something risky. They collaborate with coworkers. They know the difference between âI can handle thisâ and âI should get a second opinion.â For agents, this translates to approval workflows, confidence thresholds, tiered autonomy: The agent can do X on its own but needs a human to sign off on Y. Most enterprise agent deployments today that actually work are leaning heavily on human-in-the-loop as the primary safety mechanism. Thatâs fine as a starting point, but it doesnât scale, and it needs to be baked into the governance infrastructure from the start, not bolted on as an afterthought. And as agent deployments mature, it wonât just be agents checking in with humans: Itâll be agents coordinating with other agents, each with their own identity, permissions, and accountability chains. The same governance infrastructure that manages one agent scales to manage the interactions between many.
But âkeeping them from going wrongâ isnât just about guardrails in the moment. Itâs about the whole management relationship. Who âmanagesâ an agent? Who reviews its performance? How do you even define performance for an agent? Task completion rate? Error rate? Customer outcomes? What does it mean to coach an agent, to develop its skills, to promote it to higher-trust tasks as it proves itself? Weâve been doing this for human employees for decades. For agents, we havenât even agreed on the vocabulary yet.
And hereâs the kicker: All of this has to happen fast. Human performance reviews happen quarterly, maybe annually. Agent performance reviews need to happen at the speed agents operate, which is to say, continuously. An agent can execute thousands of actions in the time it takes a human manager to notice somethingâs off. If your accountability and control loops run on human timescales, youâre reviewing the wreckage, not preventing it.
With identity, scoped authorization, full transcripts, and clear accountability chains in place, you finally have something no enterprise has today: the infrastructure to actually manage agents the way you manage employees. Constrain them, yes, just like you constrain humans with access controls and approval chains. But also develop them. Review their performance. Escalate their trust as they prove themselves. Mirror the org structures that already work for humans. The same infrastructure that makes governance possible makes management possible.
The security theater litmus test
To reiterate one last point, because itâs important: The litmus test for whether any of this is real governance or just security theater? Any time an agent tries to do something untoward, the infrastructure blocks it, and the agent has no mechanism whatsoever to inspect, modify, or circumvent the policy that stopped it. âComputer says no.â The agent didnât have to. Out-of-band metadata. Thatâs the bar.
Welcome to the posthuman workforce
The rise of AI has rightly left many of us feeling apprehensive. But Iâm also optimistic because none of this is unprecedented. Every major paradigm shift in how we work has demanded new governance infrastructure. Every time we hit the panic-because-the-wild-west-isnât-scalable phase, and every time we figure it out. It feels impossibly complex at the start, and then we build the systems, establish the norms, iterate. Eventually the whole thing becomes so embedded in how organizations operate that we forget it was ever hard.
So hereâs the cheat sheet. Clip this to the fridge:
The agents arenât the problem. The missing infrastructure between agents and your data is the problem. Agents are unpredictable, capable at machine scale, and directable to a faultâa fundamentally new kind of coworker. We donât need perfect agents. We need to manage imperfect ones, just like we manage imperfect humans.
The foundation is out-of-band governance. Any policy enforced through the agentâin its prompt, in its training, in its good intentionsâis only as strong as the agentâs ability to perfectly retain and obey it. Real governance runs in channels the agent canât access, modify, or even see.
That governance has to cover four things:
Identity: Instance-bound, delegation-aware. Every agent instance gets its own cryptographic identity, and every on-behalf-of chain is propagated faithfully through infrastructure the agent doesnât control.
Authorization: Scoped per task, short-lived, deny-capable, and intersection-aware for delegation. Not a human roleâs worth of permissions for a single taskâs worth of work.
Observability and explainability: Full-fidelity, versioned, infrastructure-captured transcripts of every input, output, and tool call. Not metadata. Not self-reports. The whole thing, recorded out-of-band.
Accountability and control: Clear chains from every agent action to a responsible human, and kill switches that are fast enough and precise enough to actually contain the damage.
The conversation around agent governance is growing, and thatâs encouraging. Much of it is focused on making agents behave betterâimproving the models, tightening the alignment, reducing the hallucinations. That work matters; better models make governance easier. And if someone cracks the alignment problem so thoroughly that agents become perfectly reliable, I will see you all on the beach the next day. Prove me wrong, pleaseâbut Iâm not holding my breath.10 Lacking alignment nirvana, we need the institutional infrastructure that lets imperfect agents do real work safely. We never waited for perfect employees. We built systems that made imperfect ones successful, and we can do exactly the same thing for agents. Weâre not trying to cage them any more than we cage our human employees: scoped access, clear expectations, and accountability when things go wrong. We need to build the infrastructure that lets them be their best selves, the digital coworkers we know they can be.
And if the rise of AI has you feeling apprehensive, thatâs fair. But just remember that whatever comes nextâAithropocene, Neuralithic, some other stupid but brilliant name ÂŻ\_(ă)_/ÂŻ âit will ultimately just be the next phase of the Anthropocene: the era defined by how humans shape the world. That hasnât changed. It will literally be what we make of it.
Us and Clippy.
We just need to build the right infrastructure to onboard all of our new agentic coworkers. Properly.
Footnotes
- By âagentic AIâ I mean AI systems that autonomously reason about and execute multistep tasksâusing tools and external data sourcesâin pursuit of a goal. Not chatbots, not copilots suggesting code completions. Software that actually does things in your production environment: breaks down tasks, calls APIs, reads and writes data, handles errors, and delivers results. The distinction matters because the challenges in this post only emerge when AI is acting autonomously, not just generating text for a human to review.
ď¸ - Yes. I know. Thank you.
ď¸ - And yes, service meshes evolved into something simpler as we understood the problem better, while cloud security is still a work in progress. The point isnât âWe nail it on the first try.â Itâs âWhen the panic hits, we figure it out.â
ď¸ - Two more fascinating failure modes: Instructions can be silently lost (buried in a long context) or even extracted by an adversary (with nothing more than black-box access).
ď¸ - TIL that âbright lineâ is a legal term meaning âa clear, fixed boundary or rule with no ambiguityâeither you meet it or you donât.â Thank you uncredited LLM coauthor friend! You expand my horizons and pepper my prose with em dashes!

ď¸ - OWASPâs Top 10 Risks for Large Language Model Applications is something of a greatest hits compilation of whatâs broken today. Of the 10, at least sixâprompt injection, sensitive information disclosure, excessive agency, system prompt leakage, misinformation, and unbounded consumptionâare directly mitigated by out-of-band governance infrastructure of the kind described in this article.
ď¸ - Hereâs looking at you, OpenClaw posse! You put the YOLO in âYo, look at my private data; itâs all publicly leaked now!â
ď¸ - Research suggests those motivations may be starting to emerge, however, which is both opportunity and warning. Anthropic found that models from all major developers sometimes attempted manipulationâincluding blackmailâfor self-preservation (âAgentic Misalignment: How LLMs Could Be Insider Threats,â Oct 2025). Palisade Research found that 8 of 13 frontier models actively resisted shutdown when it would prevent task completion, with the worst offenders doing so over 90% of the time (âIncomplete Tasks Induce Shutdown Resistance,â 2025). On one hand, agents that care about self-preservation give us something to build levers around. On the other, it makes having those levers increasingly urgent.
ď¸ - The EU AI Act already requires transparency and explainability for high-risk AI systems.
ď¸ - As Ilya Sutskever put it at NeurIPS 2024: âThereâs only one Internet.â Epoch AI estimates high-quality public text could be exhausted as early as 2026, though Iâve also heard that revised to 2028. Regardless, the next frontier is private enterprise dataâbut accessing it requires exactly the kind of governed infrastructure this post describes. Model improvement and governance infrastructure arenât competing priorities; theyâre increasingly the same priority.
ď¸
Share
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0
