MassMutual's AI strategy: 12-month contracts, 30% productivity gains, zero lock-in

Enterprise AI teams face a dilemma: The best models today might not be the best models a year from now. MassMutual's answer is to stop making long-term bets — and build infrastructure that can swap models as the market shifts.“The world of AI today is extremely dynamic,” Sears Merritt, MassMutual CIO, explained in a new VB Beyond the Pilot podcast. “We wanted to make sure we were positioned to ride that wave of dynamism.”The strategy appears to be paying off in a big way. MassMutual has measured a roughly 30% increase in developer productivity, while AI-powered contact center workflows have reduced resolution times from 10 minutes to one and cut costs from dollars to cents. But the broader lesson for IT leaders may be less about the results and more about how the company is thoughtfully building its AI infrastructure and keeping users at the center. Maintaining optionality for the possibilities of tomorrowMassMutual works with vendors at the leading edge, but keeps those relationships

MassMutual's AI strategy: 12-month contracts, 30% productivity gains, zero lock-in

Enterprise AI teams face a dilemma: The best models today might not be the best models a year from now. MassMutual's answer is to stop making long-term bets — and build infrastructure that can swap models as the market shifts.

“The world of AI today is extremely dynamic,” Sears Merritt, MassMutual CIO, explained in a new VB Beyond the Pilot podcast. “We wanted to make sure we were positioned to ride that wave of dynamism.”

The strategy appears to be paying off in a big way. MassMutual has measured a roughly 30% increase in developer productivity, while AI-powered contact center workflows have reduced resolution times from 10 minutes to one and cut costs from dollars to cents.

But the broader lesson for IT leaders may be less about the results and more about how the company is thoughtfully building its AI infrastructure and keeping users at the center.

Maintaining optionality for the possibilities of tomorrow

MassMutual works with vendors at the leading edge, but keeps those relationships on a clock. “Those relationships are capped so that we maintain optionality for best-of-breed tools as things mature in this space, and at some point, settle down and stabilize,” Merritt said. 

That philosophy extends to open-source models. Merritt says his team is “100%” looking at open-source tools, and sees the technology playing a big role in how MassMutual (and similar companies) use AI. 

“We're certainly going to need frontier models and leading edge capabilities to do what today is impossible, and tomorrow will be possible,” he said. 

Measuring outcomes from the start

MassMutual's AI efforts fall into two broad categories.

The first focuses on enablement: Putting productivity-enhancing tools such as Copilot and virtual assistants into the hands of all employees. The second involves what Merritt describes as “deepen and focus” initiatives, where teams target a specific workflow or business process that will have a strong impact on advisors, policyholders, or employees.

Rather than focusing on adoption metrics, these projects begin with predefined success criteria. “Everything we do is measured,” Merritt said. “There's always a success metric that we define upfront to determine whether or not we're going to scale up some of these things.”

The company is also deliberately encouraging experimentation, giving employees access to a range of best-in-class models, “token-consumptive workflows” and other possible capabilities so they can weigh the benefits relative to “simpler, lower cost” large language models (LLMs). 

At the same time, MassMutual is collecting increasingly detailed analytics around usage patterns, developer workflows, model performance, and costs. The goal is to reduce spending while also building operational intelligence to eventually route workloads to the right model based on cost, response quality, and user experience.

Those insights will eventually drive optimization decisions around model routing, prompt selection, response times, and infrastructure design.

“We're gaining access to analytics that let us, in a very granular way, look at usage patterns, developer workflows, and begin to make sense of who's using what, when, and for what types of tasks,” Merritt said.

Why MassMutual sometimes chooses the more expensive model

Another interesting aspect of MassMutual's approach is how it evaluates AI quality. Rather than focusing exclusively on benchmarks or token costs, the company uses what Merritt calls a “trust score” framework.

The process combines user feedback with operational metrics to understand how employees perceive AI-generated responses and whether those responses actually improve outcomes. 

The contact center rebuild put that framework to the test. During development, employees were given access to two different LLMs. One generated responses in near-real-time but the quality was noisier. The other more expensive option took several additional seconds to respond but consistently delivered higher-quality answers.

Conventional wisdom and the speed of business might suggest users would prefer the former; but they overwhelmingly chose quality. Merritt’s team asked users about the quality of response, their preferred model, and their overall thoughts on the experience. 

Most of the time, users said: “We want the more expensive one. We're willing to wait, but the quality difference is so high that the two extra seconds actually is worth it to us.” 

That feedback ultimately determined which model MassMutual deployed.

“We factored that experience piece into the decision-making, and that led us to say, on a relative basis, the costs were immaterial, so we're going to use the more complex model," Merritt said.  

Listen to the full podcast to hear more about: 

  • Why Mythos “completely changed” the cybersecurity landscape — not the type of threats, but the rate at which those threats appear; 

  • How a team of AI engineers modernized MassMutual’s mainframe in 7 days (a process that previously would have taken 3 months); 

  • Why MassMutual specifically avoided tokenmaxxing to rein in AI use and spending and has been going “unlimited,” to shield from cost blowups. 

  • How a “multi-harness type of environment” will support agentic AI. 

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Share

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0