KubeCon + CloudNativeCon NA 2025 Recap

As to be expected, AI was everywhere at KubeCon + CloudNativeCon in Atlanta this year—but the real energy was focused on something less headline-grabbing and more foundational: solving everyday operational challenges. Amid the buzz about intelligent systems and futuristic workflows, practitioners remained grounded in urgent, practical work—managing tool sprawl, tackling Kubernetes complexity, and confronting the […]

Dec 8, 2025 0 2

As to be expected, AI was everywhere at KubeCon + CloudNativeCon in Atlanta this year—but the real energy was focused on something less headline-grabbing and more foundational: solving everyday operational challenges. Amid the buzz about intelligent systems and futuristic workflows, practitioners remained grounded in urgent, practical work—managing tool sprawl, tackling Kubernetes complexity, and confronting the chaos of “day two” operations.

Operations Remains Human Centered

There’s real promise in AI, especially in areas like automation and observability. But many teams are still figuring out how to integrate AI into legacy systems that are already under pressure. What stood out most was how human-centered the cloud native community remains—committed to reducing toil, improving developer experience, and building resilient platforms that work when the pager goes off at 3am.

A prime example of this grounded perspective came from Adobe’s Joseph Sandoval. In his keynote, ”Maximum Acceleration: Cloud Native at the Speed of AI,” Sandoval acknowledged the dramatic potential of AI-native infrastructure—but made clear it’s not just a tooling revolution. “We’ve entered the agent economy,” he said, describing systems that can “observe, reason, and act.” But to support those workloads, we must evolve Kubernetes itself: “We’re moving from tracing requests to tracing reasoning—from metrics to meaning.” Kubernetes, he argued, has become the foundation for AI, if unintentionally, offering the flexibility and control these systems demand.

This potential is already visible in the real world: Niantic’s Pokémon GO team, for example, demonstrated how they use Kubernetes and Kubeflow to run a global machine learning–powered scheduling platform that predicts player participation and orchestrates in-game events across millions of locations. But autonomy, Sandoval cautioned, only works when it’s built on operational trust—smarter scheduling, adaptive orchestration, and rock-solid security boundaries.

Andy Zhang at KubeCon 2025 — Niantic’s Andy Zhang shares “Scaling Geo-Temporal ML: How Pokemon Go Optimizes Global Gameplay With Kubernetes and Kubeflow” at KubeCon + CloudNativeCon NA 2025, November 11. Image courtesy of the Cloud Native Computing Foundation.

This call to reinforce foundational infrastructure echoed across the event, especially in platform engineering discussions. Abby Bangser’s keynote framed platform engineering not as yet another revolution but as a response to complexity: “We build platforms to reduce the complexity and scope for those building on top, not to give them new systems to learn.” Great platforms, she argued, are judged not by glossy architecture diagrams but by how effectively they empower developers. Internal platforms become an economy of scale—bespoke to a business yet broadly enabling. And most importantly: “The only success is a more effective and happier development team.” (If you’re interested in going deeper, check out her report, Platform as a Product, coauthored with Daniel Bryant, Colin Humphreys, and Cat Morris.)

Ambitious AI Requires Practical Engineering

Throughout the conference, this emphasis on developer experience and practical operations consistently overshadowed AI hype. That context made the CNCF’s launch of Kubernetes AI Conformance feel especially timely. “As AI moves into production, teams need consistent infrastructure they can rely on,” said Chris Aniszczyk, CNCF’s CTO. The goal is to create guardrails so AI workloads behave predictably across different environments. This maturity is already visible—KServe’s graduation to incubating status is a sign that foundational work is gradually catching up to AI ambition.

KubeCon 2025 registration — Registration at KubeCon + CloudNativeCon NA 2025, November 10. Image courtesy of the Cloud Native Computing Foundation.

Meanwhile, the hallway conversations were filled with a very real and immediate concern: the announced retirement of Ingress NGINX, which currently runs in nearly half of all Kubernetes clusters. Teams suddenly had to reckon with critical migration planning, a reminder that while we talk about building intelligent systems of the future, our operational reality is still deeply rooted in managing vital but aging components today.

There were really two converging stories being told. Platform engineering talks focused on hard-earned lessons and production-hardened architectures. Speakers from Capital One, for example, demonstrated how their internal platform, Dragon, evolved from thoughtful iteration and real-world adaptation over time to a scalable, resilient platform. Meanwhile, the complexities of the emerging AI space were highlighted in sessions like “Navigating the AI/ML Networking Maze in Kubernetes: Lessons from the Trenches,” which detailed how AI/ML workloads are pushing HPC networking concepts like RDMA and MPI into Kubernetes, creating a “new learning curve” and discussing the “intricacies of integrating specialized hardware.”

The real intrigue is watching these worlds collide in real time: platform engineers being asked to operationalize AI workloads they barely trust, and AI teams realizing their models require more than just compute—they still need to solve problems like traffic routing, identity, observability, and failure isolation.

The Ecosystem Continues to Mature

As the ecosystem evolves, some clear frontrunners are emerging. eBPF (especially via Cilium) has become the backbone of modern networking and observability. Gateway API has matured into a powerful next-generation alternative to Kubernetes Ingress, with broad support across modern ingress and service mesh providers. OpenTelemetry is becoming the standard for collecting signals at scale. Dynamic Resource Allocation (DRA) and Model Context Protocol (MCP) are two critical Kubernetes API extensions clearly emerging as key enablers for the new generation of AI-driven workloads. These aren’t just tools—they’re foundations for a future where infrastructure must be more intelligent and more manageable at once.

Solutions showcase at KubeCon 2025 — The Solutions Showcase exhibit hall at KubeCon + CloudNativeCon NA 2025, November 11. Image courtesy of the Cloud Native Computing Foundation.

It’s fitting that the CNCF marked its 10th birthday at this KubeCon—10 years of evolving an ecosystem shaped not by flashy trends but by consistent, collaborative tooling that quietly powers today’s most critical platforms. With over 200 projects under its umbrella, the foundation now turns toward the AI-native future with the same mindset: build stable layers first, then empower innovation on top. The path forward won’t come from yet another algorithm, agent, or abstraction layer but from the less glamorous, deeply important work: derisking complexity, stabilizing orchestration layers, and enabling the teams who live in production.

The teams slogging through ingress controller deprecations today are building the trust needed for tomorrow’s agent-native systems. Before we can hand over real responsibility to AI agents, we need platforms resilient enough to contain their failures—and flexible enough to enable their success. The next event, KubeCon & Cloud NativeCon Europe, takes place in Amsterdam March 23–26 in the new year, and we’re looking forward to seeing more sessions that further this conversation.