Microsoft’s next big thing for the cloud: an agent that keeps its cool when everything falls apart
Microsoft is promising relief to engineers who get woken up at 3 a.m. for outages and other cloud glitches: an… Read More

Microsoft is promising relief to engineers who get woken up at 3 a.m. for outages and other cloud glitches: an agent informed by its years of experience running Azure, designed to diagnose whatever’s going wrong and recommend potential fixes.
One big benefit over humans: the agent can operate without the stress, fatigue, or tunnel vision that often hampers people doing it on little sleep.
“Agents are a little bit less emotionally attached,” said Brendan Burns, a Microsoft technical fellow and corporate vice president who was one of the creators of Kubernetes. He pointed out that agents don’t feel the pressure when a manager asks for a rapid root-cause analysis.
The Azure Copilot Observability Agent, in preview since late last year, was made generally available Tuesday. It investigates incidents by connecting the logs, metrics, traces and other signals scattered across a company’s systems, then points engineers toward the likely cause.
At this point, the agent does not fix problems on its own. Microsoft also introduced what it calls autonomous operations, in preview, letting the agent triage and investigate alerts without a person prompting it. But it still stops short of acting. It won’t restart a resource or change a configuration, for example, instead leaving it to humans to decide and execute.
Microsoft is joining a crowded field. Datadog made its Bits AI SRE agent generally available in December, and Amazon’s AWS followed with a comparable DevOps Agent this spring. Microsoft said the agent is priced based on usage rather than a flat per-seat license, which is the same model AWS uses for its DevOps Agent.
Established observability players including Dynatrace, Splunk, New Relic and Grafana are moving quickly in the same direction, alongside a wave of AI-focused startups.
In an interview with GeekWire this week, Burns said he believes Microsoft’s breadth is one of its advantages, seeing more of a customer’s software than rivals do, from GitHub to Azure deployments to the signals systems generate. Knowing how those connect, he said, helps the agent trace a problem back to the line of code behind it.
More than a decade ago, Burns and his then-Google colleagues Joe Beda and Craig McLuckie created Kubernetes, the open-source software that lets companies run applications across large, constantly changing infrastructure. It became foundational to cloud computing, and added to the complexity teams now have to manage.
Kubernetes brought a kind of self-repair to that world: when something breaks, it works automatically to restore the system to a healthy state. But it follows fixed rules, Burns said. It’s “very deterministic” — it “can’t make hypotheses, it can’t investigate solutions.”
AI tools like the Azure observability agent are meant to add that missing layer: forming a theory about what went wrong, testing it against the data, and continuing to work to find a solution.
Full autonomy — letting the agent act, not just investigate — is still down the road. In a blog post Tuesday, Burns framed the launch as part of a broader shift toward “agentic operations,” which reason across signals and will someday be able to act on them.
For now, the agent can do a lot of the digging, even if a human still makes the call.
Burns, who recalled once pulling a 36-hour on-call shift, said he can think of “a lot of late nights that would have been a lot nicer if I’d had this 10 years ago.”
Share
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0
