A proof of concept forgives a fragile data path. Operational AI does not.

Presented by F5When enterprises move AI workloads from pilot to production, data delivery often becomes the factor that determines whether those systems can scale reliably. Point-to-point architectures connecting storage directly to compute hold up under demonstration conditions, but they often break down under sustained, concurrent production traffic. The result is stalled inference pipelines, delayed RAG systems, underutilized GPUs, and SLA violations, all of which carry direct business consequences. "Organizations successfully operationalize AI when their infrastructure is built to handle real-world failures, not just controlled conditions," says Hunter Smit, senior manager of product marketing at F5. Production traffic exposes architectural weaknessesIn a pilot, a stalled transfer is an inconvenience, while in production, that same stall is an outage someone now owns. The underlying architecture is often identical in both cases: when a client is wired directly to storage, the syste

A proof of concept forgives a fragile data path. Operational AI does not.

Presented by F5


When enterprises move AI workloads from pilot to production, data delivery often becomes the factor that determines whether those systems can scale reliably. Point-to-point architectures connecting storage directly to compute hold up under demonstration conditions, but they often break down under sustained, concurrent production traffic. The result is stalled inference pipelines, delayed RAG systems, underutilized GPUs, and SLA violations, all of which carry direct business consequences.

"Organizations successfully operationalize AI when their infrastructure is built to handle real-world failures, not just controlled conditions," says Hunter Smit, senior manager of product marketing at F5.

Production traffic exposes architectural weaknesses

In a pilot, a stalled transfer is an inconvenience, while in production, that same stall is an outage someone now owns. The underlying architecture is often identical in both cases: when a client is wired directly to storage, the system becomes increasingly fragile under sustained, concurrent production traffic because that direct connection has no answer when a node fails or traffic spikes. From there, retries and timeouts cascade, and the entire pipeline backs up right at the moment the business is depending on the output.

"Point-to-point architectures, where the S3 client connects directly to S3 storage, are not resilient," says Paul Pindell, principal solutions architect for technology alliances at F5. "If a single storage node fails, all traffic to that cluster degrades, and in some cases the cluster can fail entirely."

The problem is that AI workflows, including RAG-based inference and agentic AI, increasingly treat S3 storage as a first-class citizen in the AI cluster. However, the network connectivity between that storage and the cluster was never designed for the high-throughput, uninterrupted data movement that's needed to keep GPUs running optimally.

The real cost of stalled pipelines and underutilized GPUs

"Enterprise leaders tend to frame AI infrastructure around GPU utilization, but what makes AI different from traditional deterministic workloads is that infrastructure continuously influences those outcomes at every interaction," says Tanu Mutreja, senior director of product management at F5. "In AI environments, infrastructure is no longer just a back-end concern. It shapes customer experience, quality, resilience, and cost with every transaction."

There can be significant business consequences. For instance, when inference pipelines stall, it becomes an SLA and customer experience issue. When RAG systems are delayed, models lose access to timely, relevant context, which results in inaccurate, outdated, or hallucinated responses, all of which create operational, compliance, and reputational risks. At the same time, the infrastructure issues that create those problems can also drive up costs by leaving expensive GPU resources idle or underutilized.

"When GPUs are underutilized, it signals infrastructure inefficiencies that inflate costs while limiting scalability and responsiveness," Mutreja says. "The leadership question is whether the end-to-end AI infrastructure consistently delivers reliable, secure, high-quality, and governed AI experiences at sustainable unit economics."

Building a production-ready data delivery layer

F5 treats data delivery as a first-class infrastructure layer rather than assuming the network path will simply work. Where application delivery optimized the flow of requests between users and applications, data delivery optimizes the flow of data between storage, networks, and compute, including AI compute.

Making data delivery a first-class layer means building three properties into it:

Observability provides real-time visibility into latency, throughput, and flow health.

Programmability enables policy-driven control over how data moves, through dynamic routing, traffic optimization, rate management, and automated failover.

Failure-awareness builds resilience for degraded networks, storage throttling, and service disruptions.

In the architecture F5 has developed for Dell ObjectScale, F5 BIG-IP sits between ObjectScale and AI compute as a programmable control point at the storage edge.

"We have seen cases where a misconfiguration in the AI compute layer effectively DDoS'd the S3 storage infrastructure, " Pindell says. "Not in a malicious way, more of an 'Oh no, what did I do?' moment, but it still took storage down for the entire organization."

Placing BIG-IP as the application delivery controller between the storage and compute layers protects storage with QoS, rate limits, and connection limits, keeping it resilient and operational under that kind of load. SecureIQLab-validated testing confirmed that this protection does not come at the cost of throughput, which matters architecturally, Pindell says.

"Preserving, and even improving, throughput is a must-have," he explains. "It's what lets you layer on the higher-level functionality, resilience and enhanced security, without giving up performance to get there."

The added complexity of hybrid and multicloud AI

AI deployments in hybrid multicloud environments have an even greater data delivery challenge because of the heterogeneity involved. In other words, data traversing these environments must contend with inconsistent policies, security controls, identity systems, governance requirements, fragmented visibility, and distinct failure boundaries.

Programmable traffic management and observability address this complexity together. Observability provides a unified view of application, network, and infrastructure health across otherwise disconnected environments. Programmable traffic management uses those insights to intelligently route, balance, and fail over traffic in real time. Together, they create a closed-loop feedback system that enforces consistent policies, improves resilience across failure domains, and ensures reliable, high-performance AI data delivery regardless of where applications, data, or users reside.

What separates production AI from perpetual pilots

The organizations that move beyond perpetual pilots share a specific engineering discipline, Smit says.

"They're the ones that reach for production design with failure as the normal state, not the exception," he explains. "They will assume latency, congestion, and partial outages will happen. And they build a data path observable and failure-aware enough to absorb them, with explicit mitigation for every degraded condition rather than a hope that the network will hold."

Organizations stuck in perpetual pilots are still optimizing for the perfect lab result and discovering the real-world gap only when a workload goes live. The issue is not model quality or GPU count, but whether the data delivery layer was engineered with the same rigor as the compute.

"Teams need to understand that a real-world network behaves very differently from an optimized lab network," Pindell says. "They need a mitigation plan for the failure states and performance bottlenecks they will hit in production."


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected].

Share

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0